Monday, 15 August 2011

matlab - Backpropagation for rectified linear unit activation with cross entropy error -



matlab - Backpropagation for rectified linear unit activation with cross entropy error -

i'm trying implement gradient calculation neural networks using backpropagation. cannot work cross entropy error , rectified linear unit (relu) activation.

i managed implementation working squared error sigmoid, tanh , relu activation functions. cross entropy (ce) error sigmoid activation gradient computed correctly. however, when alter activation relu - fails. (i'm skipping tanh ce retuls values in (-1,1) range.)

is because of behavior of log function @ values close 0 (which returned relus approx. 50% of time normalized inputs)? tried mitiage problem with:

log(max(y,eps))

but helped bring error , gradients real numbers - still different numerical gradient.

i verify results using numerical gradient:

num_grad = (f(w+epsilon) - f(w-epsilon)) / (2*epsilon)

the next matlab code presents simplified , condensed backpropagation implementation used in experiments:

function [f, df] = backprop(w, x, y) % w - weights % x - input values % y - target values act_type='relu'; % possible values: sigmoid / tanh / relu error_type = 'ce'; % possible values: se / ce n=size(x,1); n_inp=size(x,2); n_hid=100; n_out=size(y,2); w1=reshape(w(1:n_hid*(n_inp+1)),n_hid,n_inp+1); w2=reshape(w(n_hid*(n_inp+1)+1:end),n_out, n_hid+1); % feedforward x=[x ones(n,1)]; z2=x*w1'; a2=act(z2,act_type); a2=[a2 ones(n,1)]; z3=a2*w2'; y=act(z3,act_type); if strcmp(error_type, 'ce') % cross entropy error - logistic cost function f=-sum(sum( y.*log(max(y,eps))+(1-y).*log(max(1-y,eps)) )); else % squared error f=0.5*sum(sum((y-y).^2)); end % backprop if strcmp(error_type, 'ce') % cross entropy error d3=y-y; else % squared error d3=(y-y).*dact(z3,act_type); end df2=d3'*a2; d2=d3*w2(:,1:end-1).*dact(z2,act_type); df1=d2'*x; df=[df1(:);df2(:)]; end function f=act(z,type) % activation function switch type case 'sigmoid' f=1./(1+exp(-z)); case 'tanh' f=tanh(z); case 'relu' f=max(0,z); end end function df=dact(z,type) % derivative of activation function switch type case 'sigmoid' df=act(z,type).*(1-act(z,type)); case 'tanh' df=1-act(z,type).^2; case 'relu' df=double(z>0); end end

edit

after round of experiments, found out using softmax lastly layer:

y=bsxfun(@rdivide, exp(z3), sum(exp(z3),2));

and softmax cost function:

f=-sum(sum(y.*log(y)));

make implementaion working activation functions including relu.

this leads me conclusion logistic cost function (binary clasifier) not work relu:

f=-sum(sum( y.*log(max(y,eps))+(1-y).*log(max(1-y,eps)) ));

however, still cannot figure out problem lies.

every squashing function sigmoid, tanh , softmax (in output layer) means different cost functions. makes sense rlu (in output layer) not match cross entropy cost function. seek simple square error cost function test rlu output layer.

the true powerfulness of rlu in hidden layers of deep net since not suffer gradient vanishing error.

matlab machine-learning neural-network backpropagation

No comments:

Post a Comment