Table of Contents

VII - Regularization

7.1 - The problem of Overfitting

7.2 - Cost function

\[J(\theta) = \frac{1}{2m} \left[ \sum\limits_{i=1}^m (h_{\theta}(x^{(i)}) - y^{(i)})^2 + \lambda \sum\limits_{j=1}^n \theta_j^2 \right]\]

7.3 - Regularized Linear Regression

\[ \frac{\partial}{\partial\theta_j}J(\theta) = \frac{1}{m} \left( \sum\limits_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)}) x_j^{(i)} \right) + \frac{\lambda}{m} \theta_j \]

⇒ Note that the previous derivative does not apply for \(\theta_0\).

\[ \theta_j := \theta_j (1 - \alpha \frac{\lambda}{m}) - \alpha \sum\limits_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)}) x_j^{(i)} \]

\[\theta = (X^TX + \lambda ~ P)^{-1}X^Ty\]

where P is similar to the identity matrix from \(\mathbb{R}^{(n+1) \times (n+1)}\) except that \(P_{11} = 0\).

7.4 - Regularized Logistic Regression