\[Cost(h_\theta(x),y) = \begin{cases} -log(h_\theta(x)) \text{ if } y=1 \\ -log(1 - h_\theta(x)) \text{ if } y=0 \end{cases}\]
\[Cost(h_\theta(x),y) = -y~log(h_\theta(x)) - (1-y)~log(1 - h_\theta(x))\]
\[ J(\theta) = -\frac{1}{m} \sum\limits_{i=1}^m y^{(i)}~log(h_\theta(x^{(i)})) + (1-y^{(i)})~log(1 - h_\theta(x^{(i)})) \]
\[ \frac{\partial}{\partial\theta_j}J(\theta) = \frac{1}{m} \sum\limits_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)}) x_j^{(i)} \]
⇒ This is exactly the same partial derivative as for linear regression! (except that h() is different).
function [jVal, gradient] = costFunction(theta) jVal = (theta(1)-5)^2+ (theta(2)-5)^2; gradient = zeros(2,1); gradient(1) = 2*(theta(1)-5); gradient(2) = 2*(theta(2)-5);
options = optimset('GradObj', 'on', 'MaxIter', '100'); initialTheta = zeros(2,1); [optTheta, functionVal, exitFlag] = fminunc(@costFunction, initialTheta, options);