Saturday, 15 February 2014

performance - How to do efficient k-nearest neighbor calculation in Matlab -



performance - How to do efficient k-nearest neighbor calculation in Matlab -

i'm doing info analysis using k-nearest neighbour algorithm in matlab. info consists of 11795 x 88 info matrix, rows observations , columns variables.

my task find k-nearest neighbors n selected test points. i'm doing next logic:

for test points

loop info , find k-closest neighbors (by euclidean distance)

in other words, loop n test points. each test point search info (which excludes test point itself) k-nearest neighbors euclidean distance. each test point takes approximately k x 11794 iterations. whole process takes n x k x 11794 iterations. if n = 10000 , k = 7, approximately 825,6 1000000 iterations.

is there more efficient way calculate k-nearest neighbors? of computation going waste now, because algorithm simply:

calculates euclidean distance other points, picks closest , excludes closest point farther consideration --> calculates euclidean distance other points , picks closest --> etc. --> etc.

is there smart way rid of 'waste calculation'?

currently process takes 7 hours in computer (3.2 ghz, 8 gb ram, 64-bit win 7)... :(

here of logic illustrated explicitly (this not code, part eats performance):

for = 1:size(testpoints, 1) % loop test points neighborcandidates = all_data_excluding_testpoints; % utilize rest of info excluding test points in search of k-nearest neighbors testpoint = testpoints(i, :); % test point find k-nearest neighbors kneighbors = []; % store k-nearest neighbors here. j = 1:k % find k-nearest neighbors bdist = inf; % distance of closest neighbour bind = 0; % index of closest neighbour n = 1:size(neighborcandidates, 1) % loop candidates if pdist([testpoint; neighborcandidates(n, :)]) < bdist % check euclidean distance bdist = pdist([testpoint; neighborcandidates(n, :)]); % update best distance far bind = n; % save best found index far end end kneighbors = [kneighbors; neighborcandidates(bind, :)]; % save found neighbour neighborcandidates(bind, :) = []; % remove neighbour farther consideration end end

using pdist2:

a = rand(20,5); %// 11795 x 88 b = a([1, 12, 4, 8], :); %// n-by-88 subset, i.e. n=4 in case n = size(b,1); d = pdist2(a,b); [~, ind] = sort(d); kneighbours = ind(2:2+k, :);

now can utilize kneighbours index row in a. note columns of kneighbours correspond rows of b

but since you're dipping stats toolbox pdist why not utilize matlab's knnsearch?

kneighbours_matlab = knnsearch(a,b,'k',k+1);

note kneighbours same kneighbours_matlab(:,2:end)'

performance algorithm matlab nearest-neighbor

No comments:

Post a Comment