Breeding: python - How to create colormap of confidence estimates for k-Nearest Neighbor Classification -

python - How to create colormap of confidence estimates for k-Nearest Neighbor Classification -

what want:

to display results of simple classification algorithm (see below) colormap in python (the info in 2d), each class assigned color, , confidence of prediction anywhere on 2d map proportional saturation of color associated class prediction. image below sort of illustrates want binary (two class problem) in reddish parts might suggest strong confidence in class 1, whereas bluish parts speak class 2. intermediate colors suggest uncertainty either. want color scheme generalize multiple classes, need many colors , scale go white (uncertainty) colorful color associated class.

some sample code:

my sample code uses simple knn algorithm nearest k info points allowed 'vote' on class of new point on map. confidence of prediction given relative frequency of winning class, out of k voted. haven't dealt ties , know there improve probabilistic versions of method, want visualize info show viewer chances of class beingness in particular part of 2d plane.

import numpy np import matplotlib.pyplot plt   # generate training   info 3 classes n = 100 # number of covariates (sample points) each class in training set.  mean1, mean2, mean3 = [-1.5,0], [1.5, 0], [0,1.5] cov1, cov2, cov3 = [[1,0],[0,1]], [[1,0],[0,1]], [[1,0],[0,1]] x1 = np.asarray(np.random.multivariate_normal(mean1,cov1,n)) x2 = np.asarray(np.random.multivariate_normal(mean2,cov2,n)) x3 = np.asarray(np.random.multivariate_normal(mean3,cov3,n))   plt.plot(x1[:,0], x1[:,1], 'ro', x2[:,0], x2[:,1], 'bo', x3[:,0], x3[:,1], 'go' )  plt.axis('equal'); plt.show() #display training   info   # prepare   info set 3n*3 array each row   info point , associated class d = np.zeros((3*n,3)) d[0:n,0:2] = x1; d[0:n,2] = 1 d[n:2*n,0:2] = x2; d[n:2*n,2] = 2 d[2*n:3*n,0:2] = x3; d[2*n:3*n,2] = 3  def knn(x, d, k=3):     x = np.asarray(x)     dist = np.linalg.norm(x-d[:,0:2], axis=1)     = dist.argsort()[:k] #return k indices of smallest highest entries     counts = np.bincount(d[i,2].astype(int))     predicted_class = np.argmax(counts)      confidence = float(np.max(counts))/k      homecoming predicted_class, confidence   print(knn([-2,0], d, 20))

so, can calculate 2 numbers each point in 2d plane

confidence (0 .. 1) class (an integer)

one possibility calculate own rgb map , show imshow. this:

import numpy np import matplotlib.pyplot plt  # color vector n x 3 colors, n maximum number of classes , colors in rgb mycolors = np.array([   [ 0, 0, 1],   [ 0, 1, 0],   [ 1, 0, 1],   [ 1, 1, 0],   [ 0, 1, 1],   [ 0, 0, 0],   [ 0, .5, 1]])  # negate colors mycolors = 1 - mycolors   # extents of area x0 = -2 x1 = 2 y0 = -2 y1 = 2  # grid on area x, y = np.meshgrid(np.linspace(x0, x1, 1000), np.linspace(y0, y1, 1000))  # calculate classification , probabilities classes = classify_func(x, y) probabilities = prob_func(x, y)  # create basic color map class img = mycolors[classes]  # fade color probability (black 0 prob) img *= probabilities[:,:,none]  # reverse negative image img = 1 - img  # draw plt.imshow(img, extent=[x0,x1,y0,y1], origin='lower') plt.axis('equal')  # save plt.savefig("mymap.png")

the trick of making negative colors there create maths bit easier undestand. code can of course of study written much denser.

i created 2 simple functions mimic classification , probabilities:

def classify_func(x, y):      homecoming np.round(abs(x+y)).astype('int')  def prob_func(x,y):      homecoming 1 - 2*abs(abs(x+y)-classify_func(x,y))

the former gives given area integer values 0 4, , latter gives smoothly changing probabilities.

the result:

if not way colors fade towards 0 probability, may create non-linearity applied when multiplying probabilities.

here functions classify_func , prob_func given 2 arrays arguments, first 1 beingness x coordinates values calculated, , sec 1 y coordinates. works well, if underlying calculations vectorized. code in question not case, calculates single values.

in case code changes slightly:

x = np.linspace(x0, x1, 1000) y = np.linspace(y0, y1, 1000) classes = np.empty((len(y), len(x)), dtype='int') probabilities = np.empty((len(y), len(x))) yi, yv in enumerate(y):     xi, xv in enumerate(x):     classes[yi, xi], probabilities[yi, xi] = knn((xv, yv), d)

also confidence estimates not 0..1, need scaled:

probabilities -= np.amin(probabilities) probabilities /= np.amax(probabilities)

after done, map should extents -4,-4..4,4 (as per color map: green=1, magenta=2, yellow=3):

to vectorize or not vectorize - question

this question pops time time. there lot of info vectorization in web, quick search did not reveal short summaries, i'll give thoughts here. quite subjective matter, represents humble opinions. other people may have different opinions.

there 3 factors consider:

performance legibility memory use

usually (but not always) vectorization makes code faster, more hard understand, , consume more memory. memory utilize not big problem, big arrays think of (hundreds of megs ok, gigabytes troublesome).

trivial cases aside (element-wise simple operations, simple matrix operations), approach is:

write code without vectorizations , check works profile code vectorize inner loops if needed , possible (1d vectorization) create 2d vectorization if simple

for example, pixel-by-pixel image processing operation may lead situation end one-dimensional vectorizations (for each row). inner loop (for each pixel) fast, , outer loop (for each row) not matter. code may much simpler if not seek usable possible input dimensions.

i such lousy algorithmist in more complex cases verify vectorized code against non-vectorized versions. hence invariably first create non-vectorized code before optimizing @ all.

sometimes vectorization not offer performance benefit. example, handy function numpy.vectorize can used vectorize practically function, documentation states:

the vectorize function provided convenience, not performance. implementation loop.

(this function have been used in code above, well. chose loop version legibility people not familiar numpy.)

vectorization gives more performance if underlying vectorized functions faster. are, aren't. profiling , experience tell. also, not necessary vectorize everything. may have image processing algorithm has both vectorized , pixel-by-pixel operations. there numpy.vectorize useful.

i seek vectorize knn search algorithm above @ to the lowest degree 1 dimension. there no conditional code (it wouldn't show-stopper complicates things), , algorithm rather straight-forward. memory consumption go up, one-dimensional vectorization not matter.

and may happen along way notice n-dimensional generalization not much more complicated. if memory allows.

python algorithm numpy matplotlib

Breeding

Tuesday, 15 February 2011

python - How to create colormap of confidence estimates for k-Nearest Neighbor Classification -

No comments:

Post a Comment