Table of Contents
Lecture 3
3.1 - Learning the weights of a linear neuron
Deriving the delta rule
3.2 - The error surface for a linear neuron
3.3 - Learning the weights of a logistic output neuron
Derivatives of a logistic neuron
3.4 - The backpropagation algorithm
3.5 - Using the derivatives computed by backpropagation
Lecture 3
3.1 - Learning the weights of a linear neuron
For multi-layers neural networks the average of two good solutions may be a bad solution ⇒ we cannot use the perceptron learning procedure.
Here instead of making the weights get closer to a set of good weights, we just try to make the output get closer to the expected output.
For linear neurons, the output is \(y = w^T x\)
To measure the error we use the squared error between Y and t.
We then use the “delta-rule” for learning: \(\Delta w_i = \epsilon x_i (t-y)\)
Deriving the delta rule
We start with \(E = \frac 12 \sum\limits_n (t^n - y^n)^2\)
Then we differentiate to get error derivatives for weights: \(\frac{\partial E}{\partial w_i} = \frac 12 \sum\limits_n \frac {\partial y^n}{\partial w_i} \frac{d E^n}{d y^n} = - \sum\limits_n x_i^n (t^n - y^n)\)
Then we use \(\Delta w_i = - \epsilon \frac{\partial E}{\partial w_i} = \sum\limits_n \epsilon x_i^n (t^n - y^n)\)
3.2 - The error surface for a linear neuron
For linear neuron, the error surface is a quadratic bowl
3.3 - Learning the weights of a logistic output neuron
Derivatives of a logistic neuron
\(z = b + \sum\limits_i x_i w_i\)
\(\frac{\partial z}{\partial w_i} = x_i\)
\(\frac{\partial z}{\partial x_i} = w_i\)
Then \(y=\frac{1}{1+e^{-z}}\), so \(\frac{dy}{dz} = y(1-y)\)
So we get \(\frac{\partial y}{\partial w_i} = \frac {\partial z}{\partial w_i} \frac{d y}{d z} = x_i y (1-y)\)
And \(\frac{\partial E}{\partial w_i} = \sum\limits_n \frac {\partial y^n}{\partial w_i} \frac{d E^n}{d y^n} = - \sum\limits_n x_i^n y^n (1-y^n) (t^n - y^n)\)
3.4 - The backpropagation algorithm
Randomly pertubing one weight and then checking if it improves the performances may work, but it is very inefficient compared to backpropagation.
: add the formulas for the backpropagation here (after understanding them!)
3.5 - Using the derivatives computed by backpropagation