===== VIII - Neural Networks: Representation ===== ==== 8.1 - Non-linear Hypothesis ==== * If we train a logistic regression algorithm with n features, including all the quadratic features \(x_ix_j\) we get approximately \(\frac{n^2}{2}\) features in total. ==== 8.2 - Neurons and the Brain ==== * Origin of neural networks: try to mimic the brain. * Was widely used in 80s and early 90s. * Right now it is the state of the art of many applications. * If we rewire the visual signal to the auditory cortex or somatosensory cortex, that that cortex learns to see! (these are called **neuro-rewiring experiments**). * We can learn to see with our tongue. ==== 8.3 - Model Representation 1 ==== * Neuron inputs: the **dendrites**. * Neuron output: the **axon**. * Neurons communicate with pulses of electricity. => **Add single neuron drawing here** * Usually when drawing the neuron inputs we only draw x1, x2, x3, etc, not x0. **x0** is called the **bias unit** (x0 = 1). * In neural networks, we sometimes use **weights** instead of **parameters** (\theta\). => **Add neural network drawing here** * The **Layer 1** is called the **input layer** and the final layer is called the ouput layer. * Inbetween layers are called **hidden layers**. * \(a_i^{(j)}\) = "activation" of unit //i// in layer //j// * \(\Theta^{(j)}\) = **matrix** of weights controlling function mapping from layer //j// to layer //j+1//. * So, on the previous drawing we have: \[a_1^{(2)} = g(\Theta_{10}^{(1)}x_0+\Theta_{11}^{(1)}x_1+\Theta_{12}^{(1)}x_2+\Theta_{13}^{(1)}x_3)\] \[a_2^{(2)} = g(\Theta_{20}^{(1)}x_0+\Theta_{21}^{(1)}x_1+\Theta_{22}^{(1)}x_2+\Theta_{23}^{(1)}x_3)\] \[a_3^{(2)} = g(\Theta_{30}^{(1)}x_0+\Theta_{31}^{(1)}x_1+\Theta_{32}^{(1)}x_2+\Theta_{33}^{(1)}x_3)\] \[h_\Theta(x) = a_1^{(3)} = g(\Theta_{10}^{(2)}a_0^{(2)}+\Theta_{11}^{(2)}a_1^{(2)}+\Theta_{12}^{(2)}a_2^{(2)}+\Theta_{13}^{(2)}a_3^{(2)})\] * If a network has \(s_j\) units in layer //j// and \(s_{j+1}\) units in layer j+1, then \(\Theta^{(j)}\) will be of dimension \(s_{j+1} \times (s_j+1)\). ==== 8.4 Model Representation 2 ==== * We define \(z_1^{(2)} = \Theta_{10}^{(1)}x_0+\Theta_{11}^{(1)}x_1+\Theta_{12}^{(1)}x_2+\Theta_{13}^{(1)}x_3\), we define \(z_2^{(2)}\) and \(z_3^{(2)}\) similarly. * So we have the vectors: \(x = \begin{bmatrix} x_0 \\ x_1 \\ x_2 \\ x_3 \end{bmatrix}\) and we define \(z^{(2)} = \begin{bmatrix} z_1^{(2)} \\ z_2^{(2)} \\ z_3^{(2)} \end{bmatrix}\), we also define \(a^{(2)}\) similarly. Then we can use the vectorized computation: \(z^{(2)} = \Theta^{(1)}x\) and \(a^{(2)} = g(z^{(2)})\). * Now to make things a bit easier, we can just define \(a^{(1)} = x\), so that we get \(z^{(2)} = \Theta^{(1)}a^{(1)}\). * Also note that to compute the new layer we must also **Add** the component \(a_0^{(2)} = 1\). * Then we compute \(z^{(3)} = \Theta^{(2)}a^{(2)}\) * The process of computing \(h_\Theta(x)\) is called **forward propagation**. => Neural networks are learning their own features. * The way the units are connected in a neural network is called the **architecture**. ==== 8.5 Examples and Intuitions 1 ==== * We consider here y = x1 XNOR x2 (eg. y = NOT (x1 XOR x2)). * We can compute AND function and OR function with a single neuron (weights -30,20,20 for AND and -10,20,20 for OR). ==== 8.6 Examples and Intuitions 2 ==== * To compute Negation, we can also use a single neuron with the weights (10,-20). ==== 8.7 Multiclass Classification ==== * We just used multiple output units where each output unit should be "1" when a specific class is found.