Table of Contents

IX - Neural Networks: Learning

9.1 - Cost Function

\[J(\Theta) = - \frac{1}{m} \left[ \sum\limits_{i=1}^m \sum\limits_{k=1}^K y_k^{(i)}~log((h_\Theta(x^{(i)}))_k) + (1 -y_k^{(i)})~log(1 - (h_\Theta(x^{(i)}))_k) \right] + \frac{\lambda}{2m} \sum\limits_{l=1}^{L-1} \sum\limits_{i=1}^{s_l} \sum\limits_{j=1}^{s_{l+1}} (\Theta_{ji}^{(l)})^2 \]

9.2 - Backpropagation Algorithm

\[\delta^{(3)} = (\Theta^{(3)})^T\delta^{(4)}.*g'(z^{(3)}) \text{ where } g'(z^{(3)}) = a^{(3)} .* (1-a^{(3)})\] \[\delta^{(2)} = (\Theta^{(2)})^T\delta^{(3)}.*g'(z^{(2)}) \text{ where } g'(z^{(2)}) = a^{(2)} .* (1-a^{(2)})\]

\[\frac{\partial}{\partial\Theta_{ij}^{(l)}} J(\Theta) = a_j^{(l)}\delta_i^{(l+1)}\]

⇒ Finally, it can be prooved that \(\frac{\partial}{\partial\Theta_{ij}^{(l)}} J(\Theta) = D_{ij}^{(l)}\).

9.3 - Backpropagation Intuition

9.4 - Implementation Note: Unrolling parameters

9.5 - Gradient checking

\[\frac{\partial}{\partial \theta_i} J(\theta) \approx \frac{J(\theta_1,\dots,\theta_i+\epsilon,\dots,\theta_n) - J(\theta_1,\cdots,\theta_i-\epsilon,\cdots,\theta_n)}{2\epsilon}\]

9.6 - Random Initialization

9.7 - Putting It Together

9.8 - Autonomous Driving