XIII. Clustering
13.1 - Unsupervised Learning: Introduction
- A Clustering algorithm finds clusters in datasets with no labels.
13.2 - K-Means Algorithm
- We randomly initialize two points, called the cluster centroids.
- Then, we have two steps in the K-means iterative algorithm:
- Cluster assignment step: Assign each example in the dataset to the cluster of the closest centroid.
- Move the centroids to the new mean location of its cluster.
- Until the centroids are not moving anymore.
Algorithm
- Randomly initialier K cluster centroids \(\mu_1,\mu2, \dots, \mu_K \in \mathbb{R}^n\).
- Repeat:
- for i=1 to m: \(c^{(i)}\) := index (from 1 to K) of cluster centroid closest to \(x^{(i)}\) (we use the distance \(\|x-\mu_k\|\)).
- for k=1 to K: \(\mu_k\) := average (mean) of points assiged to cluster k.
- Finally if there is a cluster with no point assigned to it, we can:
- eliminate that cluster,
- randomly re-initialize the position of the cluster centroid.
K-means for non-separated clusters
- K-means also works when the data is not separated into clusters.
13.3 - Optimization Objective
- \(c^{(i)}\) : index of cluster to which example i is currently assigned.
- \(\mu_k\) : cluster centroid k
- \(\mu_{c^{(i)}}\): cluster centroid of cluster to which example i has been assigned.
- The cost function for the K-means can then be written as:
\[J(c^{(1)},\dots,c^{(m)},\mu_1,\dots,\mu_K) = \frac{1}{m} \sum\limits_{i=1}^m \|x^{(i)} - \mu_{c^{(i)}}\|^2 \]
- This cost function is sometimes called the distortion cost function.
13.4 - Random Initialization
- We should have K < m.
- Randomly pick K training examples and assign them as cluster centroids.
- To avoid local optimun, we can run K-means about 50 -1000 times. Then we pick the result that provide the lowest value for the cost function. This can be useful for k = 2-10.
13.5 - Choosing the number of clusters
- Elbow method : distortion goes rapidly first, then after the elbow it goes slowly (when drawing Cost function with respect to the number of clusters).