Table of Contents

XII. Support Vector Machines

12.1 - Optimization Objective

\[J(\theta) = \frac{1}{m} \sum\limits_{i=1}^m y^{(i)} cost_1(\theta^Tx^{(i)}) + (1-y^{(i)})cost_0(\theta^Tx^{(i)}) + \frac{\lambda}{2m} \sum\limits_{j=1}^n \theta_j^2\]

\[J(\theta) = C \sum\limits_{i=1}^m \left[y^{(i)} cost_1(\theta^Tx^{(i)}) + (1-y^{(i)})cost_0(\theta^Tx^{(i)}) \right] + \frac{1}{2} \sum\limits_{j=1}^n \theta_j^2\]

12.2 - Large Margin Intuition

12.3 - Mathematics Behind Large Margin Classification

12.4 - Kernels I

12.5 - Kernels II

\[C \sum\limits_{i=1}^m \left[y^{(i)} cost_1(\theta^Tf^{(i)}) + (1-y^{(i)})cost_0(\theta^Tf^{(i)}) \right] + \frac{1}{2} \sum\limits_{j=1}^n \theta_j^2 \]

⇒ here we have n = m.

12.6 - Using an SVM

⇒ We need to perform feature scalling before using the Gaussian kernel.

Multo-class classification