VIII - Neural Networks: Representation
8.1 - Non-linear Hypothesis
- If we train a logistic regression algorithm with n features, including all the quadratic features \(x_ix_j\) we get approximately \(\frac{n^2}{2}\) features in total.
8.2 - Neurons and the Brain
- Origin of neural networks: try to mimic the brain.
- Was widely used in 80s and early 90s.
- Right now it is the state of the art of many applications.
- If we rewire the visual signal to the auditory cortex or somatosensory cortex, that that cortex learns to see! (these are called neuro-rewiring experiments).
- We can learn to see with our tongue.
8.3 - Model Representation 1
- Neuron inputs: the dendrites.
- Neuron output: the axon.
- Neurons communicate with pulses of electricity.
⇒ Add single neuron drawing here
- Usually when drawing the neuron inputs we only draw x1, x2, x3, etc, not x0. x0 is called the bias unit (x0 = 1).
- In neural networks, we sometimes use weights instead of parameters (\theta\).
⇒ Add neural network drawing here
- The Layer 1 is called the input layer and the final layer is called the ouput layer.
- Inbetween layers are called hidden layers.
- \(a_i^{(j)}\) = “activation” of unit i in layer j
- \(\Theta^{(j)}\) = matrix of weights controlling function mapping from layer j to layer j+1.
- So, on the previous drawing we have:
\[a_1^{(2)} = g(\Theta_{10}^{(1)}x_0+\Theta_{11}^{(1)}x_1+\Theta_{12}^{(1)}x_2+\Theta_{13}^{(1)}x_3)\] \[a_2^{(2)} = g(\Theta_{20}^{(1)}x_0+\Theta_{21}^{(1)}x_1+\Theta_{22}^{(1)}x_2+\Theta_{23}^{(1)}x_3)\] \[a_3^{(2)} = g(\Theta_{30}^{(1)}x_0+\Theta_{31}^{(1)}x_1+\Theta_{32}^{(1)}x_2+\Theta_{33}^{(1)}x_3)\] \[h_\Theta(x) = a_1^{(3)} = g(\Theta_{10}^{(2)}a_0^{(2)}+\Theta_{11}^{(2)}a_1^{(2)}+\Theta_{12}^{(2)}a_2^{(2)}+\Theta_{13}^{(2)}a_3^{(2)})\]
- If a network has \(s_j\) units in layer j and \(s_{j+1}\) units in layer j+1, then \(\Theta^{(j)}\) will be of dimension \(s_{j+1} \times (s_j+1)\).
8.4 Model Representation 2
- We define \(z_1^{(2)} = \Theta_{10}^{(1)}x_0+\Theta_{11}^{(1)}x_1+\Theta_{12}^{(1)}x_2+\Theta_{13}^{(1)}x_3\), we define \(z_2^{(2)}\) and \(z_3^{(2)}\) similarly.
- So we have the vectors: \(x = \begin{bmatrix} x_0 \\ x_1 \\ x_2 \\ x_3 \end{bmatrix}\) and we define \(z^{(2)} = \begin{bmatrix} z_1^{(2)} \\ z_2^{(2)} \\ z_3^{(2)} \end{bmatrix}\), we also define \(a^{(2)}\) similarly. Then we can use the vectorized computation: \(z^{(2)} = \Theta^{(1)}x\) and \(a^{(2)} = g(z^{(2)})\).
- Now to make things a bit easier, we can just define \(a^{(1)} = x\), so that we get \(z^{(2)} = \Theta^{(1)}a^{(1)}\).
- Also note that to compute the new layer we must also Add the component \(a_0^{(2)} = 1\).
- Then we compute \(z^{(3)} = \Theta^{(2)}a^{(2)}\)
- The process of computing \(h_\Theta(x)\) is called forward propagation.
⇒ Neural networks are learning their own features.
- The way the units are connected in a neural network is called the architecture.
8.5 Examples and Intuitions 1
- We consider here y = x1 XNOR x2 (eg. y = NOT (x1 XOR x2)).
- We can compute AND function and OR function with a single neuron (weights -30,20,20 for AND and -10,20,20 for OR).
8.6 Examples and Intuitions 2
- To compute Negation, we can also use a single neuron with the weights (10,-20).
8.7 Multiclass Classification
- We just used multiple output units where each output unit should be “1” when a specific class is found.