VIII - Neural Networks: Representation

8.1 - Non-linear Hypothesis

  • If we train a logistic regression algorithm with n features, including all the quadratic features \(x_ix_j\) we get approximately \(\frac{n^2}{2}\) features in total.

8.2 - Neurons and the Brain

  • Origin of neural networks: try to mimic the brain.
  • Was widely used in 80s and early 90s.
  • Right now it is the state of the art of many applications.
  • If we rewire the visual signal to the auditory cortex or somatosensory cortex, that that cortex learns to see! (these are called neuro-rewiring experiments).
  • We can learn to see with our tongue.

8.3 - Model Representation 1

  • Neuron inputs: the dendrites.
  • Neuron output: the axon.
  • Neurons communicate with pulses of electricity.

Add single neuron drawing here

  • Usually when drawing the neuron inputs we only draw x1, x2, x3, etc, not x0. x0 is called the bias unit (x0 = 1).
  • In neural networks, we sometimes use weights instead of parameters (\theta\).

Add neural network drawing here

  • The Layer 1 is called the input layer and the final layer is called the ouput layer.
  • Inbetween layers are called hidden layers.
  • \(a_i^{(j)}\) = “activation” of unit i in layer j
  • \(\Theta^{(j)}\) = matrix of weights controlling function mapping from layer j to layer j+1.
  • So, on the previous drawing we have:

\[a_1^{(2)} = g(\Theta_{10}^{(1)}x_0+\Theta_{11}^{(1)}x_1+\Theta_{12}^{(1)}x_2+\Theta_{13}^{(1)}x_3)\] \[a_2^{(2)} = g(\Theta_{20}^{(1)}x_0+\Theta_{21}^{(1)}x_1+\Theta_{22}^{(1)}x_2+\Theta_{23}^{(1)}x_3)\] \[a_3^{(2)} = g(\Theta_{30}^{(1)}x_0+\Theta_{31}^{(1)}x_1+\Theta_{32}^{(1)}x_2+\Theta_{33}^{(1)}x_3)\] \[h_\Theta(x) = a_1^{(3)} = g(\Theta_{10}^{(2)}a_0^{(2)}+\Theta_{11}^{(2)}a_1^{(2)}+\Theta_{12}^{(2)}a_2^{(2)}+\Theta_{13}^{(2)}a_3^{(2)})\]

  • If a network has \(s_j\) units in layer j and \(s_{j+1}\) units in layer j+1, then \(\Theta^{(j)}\) will be of dimension \(s_{j+1} \times (s_j+1)\).

8.4 Model Representation 2

  • We define \(z_1^{(2)} = \Theta_{10}^{(1)}x_0+\Theta_{11}^{(1)}x_1+\Theta_{12}^{(1)}x_2+\Theta_{13}^{(1)}x_3\), we define \(z_2^{(2)}\) and \(z_3^{(2)}\) similarly.
  • So we have the vectors: \(x = \begin{bmatrix} x_0 \\ x_1 \\ x_2 \\ x_3 \end{bmatrix}\) and we define \(z^{(2)} = \begin{bmatrix} z_1^{(2)} \\ z_2^{(2)} \\ z_3^{(2)} \end{bmatrix}\), we also define \(a^{(2)}\) similarly. Then we can use the vectorized computation: \(z^{(2)} = \Theta^{(1)}x\) and \(a^{(2)} = g(z^{(2)})\).
  • Now to make things a bit easier, we can just define \(a^{(1)} = x\), so that we get \(z^{(2)} = \Theta^{(1)}a^{(1)}\).
  • Also note that to compute the new layer we must also Add the component \(a_0^{(2)} = 1\).
  • Then we compute \(z^{(3)} = \Theta^{(2)}a^{(2)}\)
  • The process of computing \(h_\Theta(x)\) is called forward propagation.

⇒ Neural networks are learning their own features.

  • The way the units are connected in a neural network is called the architecture.

8.5 Examples and Intuitions 1

  • We consider here y = x1 XNOR x2 (eg. y = NOT (x1 XOR x2)).
  • We can compute AND function and OR function with a single neuron (weights -30,20,20 for AND and -10,20,20 for OR).

8.6 Examples and Intuitions 2

  • To compute Negation, we can also use a single neuron with the weights (10,-20).

8.7 Multiclass Classification

  • We just used multiple output units where each output unit should be “1” when a specific class is found.