Michael Bradford Williams

Home / Programming / K-means Clustering

Below is a visual representation of the K-means clustering algorithm. This is an unsupervised machine learning algorithm that attempts to group a set of (unlabeled) data points into clusters according to proximity.

The algorithm first initializes a set of K "centroids", which are to be the geometric centers of the data clusters. Then, there is a repeating loop that has two main steps:

  1. Cluster assignments: decide to which cluster each data point belongs. This is done by finding the centroid that is closest to the data point in question.
  2. Centroid updating: try to improve the position of the centroids. This is done by moving a given centroid to the average (mean) position of all points in that centroid's cluster.

Instructions for this demonstration: the data used here consists of 75 randomly selected points in the plane, and the algorithm tries to group those points into 5 clusters. The centroids are initialized randomly. Press "Iterate" to step through one iteration of the algorithm. You can see how the centroids move, and how data may move from one cluster to another (by changing color). Press "Reset" to reset data and clusters.

Your browser is currently unsupported.