performance - How to do efficient k-nearest neighbor calculation in Matlab -
i'm doing info analysis using k-nearest neighbour algorithm in matlab. info consists of 11795 x 88 info matrix, rows observations , columns variables.
my task find k-nearest neighbors n selected test points. i'm doing next logic:
for test points
loop info , find k-closest neighbors (by euclidean distance)
in other words, loop n test points. each test point search info (which excludes test point itself) k-nearest neighbors euclidean distance. each test point takes approximately k x 11794 iterations. whole process takes n x k x 11794 iterations. if n = 10000 , k = 7, approximately 825,6 1000000 iterations.
is there more efficient way calculate k-nearest neighbors? of computation going waste now, because algorithm simply:
calculates euclidean distance other points, picks closest , excludes closest point farther consideration --> calculates euclidean distance other points , picks closest --> etc. --> etc.
is there smart way rid of 'waste calculation'?
currently process takes 7 hours in computer (3.2 ghz, 8 gb ram, 64-bit win 7)... :(
here of logic illustrated explicitly (this not code, part eats performance):
for = 1:size(testpoints, 1) % loop test points neighborcandidates = all_data_excluding_testpoints; % utilize rest of info excluding test points in search of k-nearest neighbors testpoint = testpoints(i, :); % test point find k-nearest neighbors kneighbors = []; % store k-nearest neighbors here. j = 1:k % find k-nearest neighbors bdist = inf; % distance of closest neighbour bind = 0; % index of closest neighbour n = 1:size(neighborcandidates, 1) % loop candidates if pdist([testpoint; neighborcandidates(n, :)]) < bdist % check euclidean distance bdist = pdist([testpoint; neighborcandidates(n, :)]); % update best distance far bind = n; % save best found index far end end kneighbors = [kneighbors; neighborcandidates(bind, :)]; % save found neighbour neighborcandidates(bind, :) = []; % remove neighbour farther consideration end end
using pdist2
:
a = rand(20,5); %// 11795 x 88 b = a([1, 12, 4, 8], :); %// n-by-88 subset, i.e. n=4 in case n = size(b,1); d = pdist2(a,b); [~, ind] = sort(d); kneighbours = ind(2:2+k, :);
now can utilize kneighbours
index row in a
. note columns of kneighbours
correspond rows of b
but since you're dipping stats toolbox pdist
why not utilize matlab's knnsearch
?
kneighbours_matlab = knnsearch(a,b,'k',k+1);
note kneighbours
same kneighbours_matlab(:,2:end)'
performance algorithm matlab nearest-neighbor
No comments:
Post a Comment