k-Means Clustering Simulator

Run the classic unsupervised-learning algorithm k-means (Lloyd's algorithm) and watch it work. Change the cluster count k or the initialisation, and the point assignments, centroid movements, the convergence of the within-cluster sum of squares (WCSS) and the elbow method for choosing k all come to life in animations and charts.

Parameters

Cluster count k

Number of clusters k-means splits the data into

Number of data points

True clusters (for data generation)

Number of clusters the synthetic data really has

Cluster spread

Standard deviation of the Gaussian noise per cluster

Maximum iterations

Upper limit on Lloyd's iterations

Initialisation method

How the initial centroids are chosen

Results

—

Iterations

—

WCSS (within-cluster sum of squares)

—

Cluster count k

—

Mean cluster radius

—

Points in largest cluster

—

Convergence

—

Clustering visualisation — iteration animation

Data points on the unit square [0,1]². Colour shows the assigned cluster and the large ✕ marks each centroid. Assignment and centroid moves advance per iteration, then replay from the start.

WCSS convergence

Elbow method — final WCSS vs cluster count k

Theory & Key Formulas

$$\text{WCSS}=\sum_{j=1}^{k}\sum_{x\in C_j}\lVert x-\mu_j\rVert^2$$

The within-cluster sum of squares (WCSS). The objective function: for every data point x, the squared distance to the centroid μ_j of its cluster C_j, summed over all points. Smaller means tighter clusters.

$$\mu_j=\frac{1}{|C_j|}\sum_{x\in C_j}x$$

The centroid update. The new centroid μ_j is the mean position of the points in cluster C_j; |C_j| is the number of points in the cluster.

Each iteration first assigns points to the nearest centroid, then moves each centroid to the cluster mean. These two steps drive WCSS down monotonically until it converges in a finite number of steps.

What is the k-Means Clustering Simulator?

🙋

"Clustering" means grouping data, right? But with no correct labels, how does it decide where one group ends and the next begins?

🎓

Good question — that is exactly the appeal of "unsupervised learning". k-means builds groups using only one idea: things that are close together belong together. The procedure has just two steps. First, assign each point to its nearest "centroid". Then move that centroid to the mean position of the points that gathered around it. Repeat the two steps. Leave the slider at k=3 and watch the animation — you will see the ✕ centroids get pulled toward the centre of the points.

🙋

Just alternating two steps — does it really stop? Couldn't it keep moving forever?

🎓

It does stop, and that is mathematically guaranteed. The key is a number called WCSS, the within-cluster sum of squares. It adds up the squared distance from every point to its centroid and measures how tight the clusters are. Neither the assignment step nor the update step can ever increase WCSS — it always goes down or stays the same. A value that only decreases and has a lower bound must eventually halt. Look at the "WCSS convergence" chart: it drops in steps and flattens out. With the default settings it converges in about seven iterations.

🙋

I see! So what should I set k to? At the start you never know how many groups the data has.

🎓

That is where the "elbow method" comes in. Run k-means for k = 1, 2, 3, … and chart each final WCSS. WCSS always falls as k grows, but once you pass the true number of clusters the fall suddenly slows. The point where the curve bends like an elbow is a good estimate of k. On the "Elbow method" chart you should see the bend near the true cluster count of 3. Move the true-clusters slider to 5 and the bend shifts with it.

🙋

There are two initialisation methods — "k-means++" and "random". Does that change the result?

🎓

It does, quite a lot. k-means settles in a different place depending on where the initial centroids start. Random initialisation can put the centroids close together and get stuck in a poor local minimum. k-means++ is smarter: it picks the next centroid with higher probability the farther a point is from the existing centroids. The initial centroids spread across the data, so it reaches a good solution more often. That is why the default in scikit-learn is k-means++. Switch the initialisation selector and compare the converged WCSS.

Frequently Asked Questions

k-means (Lloyd's algorithm) alternates two steps. (1) Assignment step: assign each data point to the cluster of its nearest centroid. (2) Update step: move each cluster's centroid to the mean position of the points assigned to it. The two steps repeat until no point changes its assignment. The within-cluster sum of squares (WCSS) decreases on every iteration, so the algorithm is guaranteed to converge in a finite number of steps.

WCSS (within-cluster sum of squares) is the sum, over all points, of the squared distance from each point to the centroid of its cluster. It is written WCSS = Σ_j Σ_{x∈C_j} ‖x−μ_j‖² and is the objective function that measures how tight the clustering is. A smaller WCSS means the clusters are more compact. k-means decreases this WCSS monotonically on each iteration.

The elbow method runs k-means for k = 1, 2, 3, … and plots the final WCSS against k. WCSS always drops as k grows, but once you pass the true number of clusters in the data the drop becomes much flatter. The point where the curve bends like an elbow is a good estimate of k. The chart on the right of this tool lets you see that bend directly.

Random initialisation picks the first centroids arbitrarily, so they can end up clustered together and the algorithm can fall into a poor local minimum. k-means++ picks one centroid at random, then chooses each next centroid with a probability proportional to the squared distance D² to the nearest existing centroid. This spreads the initial centroids across the data, so the converged WCSS is usually smaller and more stable. This tool lets you switch the initialisation and compare results.

Real-World Applications

Customer segmentation: In marketing, customers are grouped by metrics such as spend, visit frequency and recency, and k-means sorts them into segments like "high value", "at risk of churn" and "new". It is a standard way to organise customers so each segment can get a tailored campaign; the segment count is chosen with the elbow method or the silhouette score. When mixing metrics with different scales, standardise them first.

Image colour quantisation: A photo can contain tens of thousands of colours, but if each pixel is treated as a point in RGB space, k-means reduces them to k representative colours. It is used for building GIF palettes, image compression and posterisation-style effects. With k set to 16 or 256, file size shrinks with almost no visible loss of quality.

Anomaly detection and pre-processing: You can learn clusters from normal data and flag points far from every centroid as "anomalous". k-means is also used to summarise a large dataset first, then run heavier downstream analysis on only the representative point of each cluster — a vector-quantisation (VQ) style of pre-processing. Classifying the states of sensor data is a typical example.

Organising engineering and CAE data: Clustering the results of many analysis cases (stress, displacement, natural frequencies and so on) with k-means reveals groups of cases that behave similarly. It is applied to condensing the huge result set of a design-of-experiments (DOE) study into representative cases, and to classifying and visualising mesh elements by a physical quantity.

Common Misconceptions and Pitfalls

The biggest misconception is that k-means always finds the best possible clustering. All k-means guarantees is that WCSS decreases on each iteration and the algorithm converges to a local optimum — there is no guarantee it is the globally best (global optimum) solution. The converged result depends on where the initial centroids were placed. In practice you therefore run it several times with different initialisations and keep the result with the smallest WCSS. k-means++ gives a good initial solution, but trying multiple runs is still the rule. Switching the initialisation in this tool shows that the WCSS can differ even for the same k.

Next, the assumption that k-means can find clusters of any shape. Because it assigns each point to the nearest centroid, the decision boundaries are always straight lines (hyperplanes in higher dimensions), and the clusters it produces implicitly assume a spherical (isotropic) shape of roughly equal size. Non-convex clusters such as crescents, rings or long thin bands, and clusters that differ greatly in size or density, will not be split correctly. For such data, methods like DBSCAN, spectral clustering or Gaussian mixture models are more suitable.

Finally, the carelessness of skipping pre-processing. Because k-means is based on Euclidean distance, a feature with a large scale dominates the distance. Mix "annual income (millions)" with "age (tens)" as-is and clusters get decided by income alone. Standardising each feature (mean 0, variance 1) is the basic step. Note too that k-means is sensitive to outliers, because the centroid is computed as a mean and an outlier pulls it strongly. When outliers are common, k-medoids (PAM), which uses medians, is an alternative.

k-Means Clustering Simulator

What is the k-Means Clustering Simulator?

Frequently Asked Questions

Real-World Applications

Common Misconceptions and Pitfalls

How to Use

Worked Example

Practical Notes