backpropagation == reverse-mode AD

May 27, 2014 at 5:23 AM
I've been working to understand how reverse-mode automatic differentiation is the same thing as backpropagation. I find the book here helpful: http://page.mi.fu-berlin.de/rojas/neural/ , especially chapter 7, but my math skills are not great. Can someone help me with an example or additional explanation? Do we need to expose the adjoint values in the library in order to build a neural network with this AutoDiff package?
Coordinator
May 27, 2014 at 6:09 AM
Edited May 27, 2014 at 6:16 AM
The way to compute the gradient of the error function known as "backpropogation" in neural networks is, indeed, a special case of reverse-mode AD.

In order to build a neural network you do not need to expose anything more than is already exposed. You build a function that represents the approximation error of your neural network. You can use AutoDiff variables to represent both the inputs and the weights and use the 'parametric differentiation' feature to compute your learning direction.

for example, let's say you have the following neural network, with 1 hidden layer:
      1
     / \
    /   \
   2     3
  / \   / \
 4   5 6   7
Nodes 4, 5, 6, 7 are input nodes. Nodes 2, 3 are hidden nodes, and node 1 is the output node.

The following code demonstrates how to use the library to build a neural network

Term Activation(Term input)
{
    var func = TermBuilder.Power(1 + TermBuilder.Exp(input), -1); // (1 + exp(-v))^-1
    return 0.5 + 0.5 * func; // normalize to be between 0 and 1
}

Variable w24 = new Variable(), w25 = new Variable(), 
         b2 = new Variable(); // weights + bias of node 2
Variable w36 = new Variable(), w37 = new Variable(), 
         b3 = new Variable(); // weigths + bias of node 3
Variable w12 = new Variable(), w13 = new Variable(), 
         b1 = new Variable(); // weights + bias of node 1

Variable i4 = new Variable(), i5 = new Variable(),
         i6 = new Variable(), i7 = new Variable(); // inputs

Variable o = new Variable(); // expected output


var o2 = Activation(w24 * i4 + w25 * i5 + b2); // the eoutput of node 2
var o3 = Activation(w36 * i6 + w37 * i7 + b3); // the output of node 3
var o1 = Activation(w12 * o2 + w13 * o3 + b1); // the output of node 1 (and the network itself)


var trainingError = TermBuilder.Power(o1 - o, 2); // squared error of the actual and expected output


var weights = new Variable[] { w12, w13, b1, w24, w25, b2, w36, w37, b3};
var data = new Variable[] { i4, i5, i6, i7, o }; 
var compiledTrainingError = trainingError.Compile(weights, data);

// here is an example of a simple learning step
double[] learningDiection = compiledTrainingError.Differentiate(
    currentWeights, // an array of the current weights, 
                    // exactly in the same order as "weigts" above
    data            // array with first 4 elements containing the input 
                    // and last element containing the expected output
                    // exactly the same oeder as in "data" above
);


for(int i = 0; i < learningDirection.Length; ++i)
    newWeights[i] = currentWeights[i] +  learningRate * learningDirection[i];
May 27, 2014 at 4:56 PM
Great answer. Thanks.