Gradient Descent is an algorithm that can be used to minimize arbitrary functions such as \(J(\theta_0,\theta_1,...,\theta_n)\)
The idea is to start somewhere on the function “surface”, then iteratively find the local direction of the highest slope and move in that direction to get “down” as quickly as possible.
We use the symbol \(:=\) to denote assignment.
So gradient descent consist in repeating until convergence: \(\{ \theta_j := \theta_j - \alpha \frac{\partial}{\partial \theta_j} J(\theta_0,\theta_1)\}\) (for j=0 and j=1)
\(\alpha\) is called the learning rate
Note that we have simultaneously update the parameters \(\theta_0\) and \(\theta_1\).