\[J(\theta) = \frac{1}{m} \sum\limits_{i=1}^m y^{(i)} cost_1(\theta^Tx^{(i)}) + (1-y^{(i)})cost_0(\theta^Tx^{(i)}) + \frac{\lambda}{2m} \sum\limits_{j=1}^n \theta_j^2\]
\[J(\theta) = C \sum\limits_{i=1}^m \left[y^{(i)} cost_1(\theta^Tx^{(i)}) + (1-y^{(i)})cost_0(\theta^Tx^{(i)}) \right] + \frac{1}{2} \sum\limits_{j=1}^n \theta_j^2\]
\[C \sum\limits_{i=1}^m \left[y^{(i)} cost_1(\theta^Tf^{(i)}) + (1-y^{(i)})cost_0(\theta^Tf^{(i)}) \right] + \frac{1}{2} \sum\limits_{j=1}^n \theta_j^2 \]
⇒ here we have n = m.
function f = kernel(x1,x2) f = exp(- (x1 - x2)' * (x1 -x2) / (2*sigma)); return
⇒ We need to perform feature scalling before using the Gaussian kernel.