Breeding: performance - How to do efficient k-nearest neighbor calculation in Matlab -

Saturday, 15 February 2014

performance - How to do efficient k-nearest neighbor calculation in Matlab -

i'm doing info analysis using k-nearest neighbour algorithm in matlab. info consists of 11795 x 88 info matrix, rows observations , columns variables.

my task find k-nearest neighbors n selected test points. i'm doing next logic:

for test points

loop info , find k-closest neighbors (by euclidean distance)

in other words, loop n test points. each test point search info (which excludes test point itself) k-nearest neighbors euclidean distance. each test point takes approximately k x 11794 iterations. whole process takes n x k x 11794 iterations. if n = 10000 , k = 7, approximately 825,6 1000000 iterations.

is there more efficient way calculate k-nearest neighbors? of computation going waste now, because algorithm simply:

calculates euclidean distance other points, picks closest , excludes closest point farther consideration --> calculates euclidean distance other points , picks closest --> etc. --> etc.

is there smart way rid of 'waste calculation'?

currently process takes 7 hours in computer (3.2 ghz, 8 gb ram, 64-bit win 7)... :(

here of logic illustrated explicitly (this not code, part eats performance):

for = 1:size(testpoints, 1) % loop test points      neighborcandidates = all_data_excluding_testpoints; %  utilize rest of   info excluding test points in search of k-nearest neighbors      testpoint = testpoints(i, :); % test point find k-nearest neighbors     kneighbors = []; % store k-nearest neighbors here.     j = 1:k % find k-nearest neighbors         bdist = inf; % distance of closest  neighbour         bind = 0; % index of closest  neighbour         n = 1:size(neighborcandidates, 1) % loop candidates             if pdist([testpoint; neighborcandidates(n, :)]) < bdist % check euclidean distance                 bdist = pdist([testpoint; neighborcandidates(n, :)]); % update best distance far                 bind = n; % save best found index far             end         end         kneighbors = [kneighbors; neighborcandidates(bind, :)]; % save found neighbour         neighborcandidates(bind, :) = []; % remove  neighbour  farther consideration      end end

using pdist2:

a = rand(20,5);             %// 11795 x 88 b = a([1, 12, 4, 8], :);    %// n-by-88 subset, i.e. n=4 in case n = size(b,1);  d = pdist2(a,b); [~, ind] = sort(d); kneighbours = ind(2:2+k, :);

now can utilize kneighbours index row in a. note columns of kneighbours correspond rows of b

but since you're dipping stats toolbox pdist why not utilize matlab's knnsearch?

kneighbours_matlab = knnsearch(a,b,'k',k+1);

note kneighbours same kneighbours_matlab(:,2:end)'

performance algorithm matlab nearest-neighbor

Breeding

Saturday, 15 February 2014

performance - How to do efficient k-nearest neighbor calculation in Matlab -

No comments:

Post a Comment