## VIII - Neural Networks: Representation

### 8.1 - Non-linear Hypothesis

• If we train a logistic regression algorithm with n features, including all the quadratic features $x_ix_j$ we get approximately $\frac{n^2}{2}$ features in total.

### 8.2 - Neurons and the Brain

• Origin of neural networks: try to mimic the brain.
• Was widely used in 80s and early 90s.
• Right now it is the state of the art of many applications.
• If we rewire the visual signal to the auditory cortex or somatosensory cortex, that that cortex learns to see! (these are called neuro-rewiring experiments).
• We can learn to see with our tongue.

### 8.3 - Model Representation 1

• Neuron inputs: the dendrites.
• Neuron output: the axon.
• Neurons communicate with pulses of electricity.

Add single neuron drawing here

• Usually when drawing the neuron inputs we only draw x1, x2, x3, etc, not x0. x0 is called the bias unit (x0 = 1).
• In neural networks, we sometimes use weights instead of parameters (\theta\).

Add neural network drawing here

• The Layer 1 is called the input layer and the final layer is called the ouput layer.
• Inbetween layers are called hidden layers.
• $a_i^{(j)}$ = “activation” of unit i in layer j
• $\Theta^{(j)}$ = matrix of weights controlling function mapping from layer j to layer j+1.
• So, on the previous drawing we have:

$a_1^{(2)} = g(\Theta_{10}^{(1)}x_0+\Theta_{11}^{(1)}x_1+\Theta_{12}^{(1)}x_2+\Theta_{13}^{(1)}x_3)$ $a_2^{(2)} = g(\Theta_{20}^{(1)}x_0+\Theta_{21}^{(1)}x_1+\Theta_{22}^{(1)}x_2+\Theta_{23}^{(1)}x_3)$ $a_3^{(2)} = g(\Theta_{30}^{(1)}x_0+\Theta_{31}^{(1)}x_1+\Theta_{32}^{(1)}x_2+\Theta_{33}^{(1)}x_3)$ $h_\Theta(x) = a_1^{(3)} = g(\Theta_{10}^{(2)}a_0^{(2)}+\Theta_{11}^{(2)}a_1^{(2)}+\Theta_{12}^{(2)}a_2^{(2)}+\Theta_{13}^{(2)}a_3^{(2)})$

• If a network has $s_j$ units in layer j and $s_{j+1}$ units in layer j+1, then $\Theta^{(j)}$ will be of dimension $s_{j+1} \times (s_j+1)$.

### 8.4 Model Representation 2

• We define $z_1^{(2)} = \Theta_{10}^{(1)}x_0+\Theta_{11}^{(1)}x_1+\Theta_{12}^{(1)}x_2+\Theta_{13}^{(1)}x_3$, we define $z_2^{(2)}$ and $z_3^{(2)}$ similarly.
• So we have the vectors: $x = \begin{bmatrix} x_0 \\ x_1 \\ x_2 \\ x_3 \end{bmatrix}$ and we define $z^{(2)} = \begin{bmatrix} z_1^{(2)} \\ z_2^{(2)} \\ z_3^{(2)} \end{bmatrix}$, we also define $a^{(2)}$ similarly. Then we can use the vectorized computation: $z^{(2)} = \Theta^{(1)}x$ and $a^{(2)} = g(z^{(2)})$.
• Now to make things a bit easier, we can just define $a^{(1)} = x$, so that we get $z^{(2)} = \Theta^{(1)}a^{(1)}$.
• Also note that to compute the new layer we must also Add the component $a_0^{(2)} = 1$.
• Then we compute $z^{(3)} = \Theta^{(2)}a^{(2)}$
• The process of computing $h_\Theta(x)$ is called forward propagation.

⇒ Neural networks are learning their own features.

• The way the units are connected in a neural network is called the architecture.

### 8.5 Examples and Intuitions 1

• We consider here y = x1 XNOR x2 (eg. y = NOT (x1 XOR x2)).
• We can compute AND function and OR function with a single neuron (weights -30,20,20 for AND and -10,20,20 for OR).

### 8.6 Examples and Intuitions 2

• To compute Negation, we can also use a single neuron with the weights (10,-20).

### 8.7 Multiclass Classification

• We just used multiple output units where each output unit should be “1” when a specific class is found.