k-Means Clustering Simulator Back
Machine Learning

k-Means Clustering Simulator

Run the classic unsupervised-learning algorithm k-means (Lloyd's algorithm) and watch it work. Change the cluster count k or the initialisation, and the point assignments, centroid movements, the convergence of the within-cluster sum of squares (WCSS) and the elbow method for choosing k all come to life in animations and charts.

Parameters
Cluster count k
Number of clusters k-means splits the data into
Number of data points
True clusters (for data generation)
Number of clusters the synthetic data really has
Cluster spread
Standard deviation of the Gaussian noise per cluster
Maximum iterations
Upper limit on Lloyd's iterations
Initialisation method
How the initial centroids are chosen
Results
Iterations
WCSS (within-cluster sum of squares)
Cluster count k
Mean cluster radius
Points in largest cluster
Convergence
Clustering visualisation — iteration animation

Data points on the unit square [0,1]². Colour shows the assigned cluster and the large ✕ marks each centroid. Assignment and centroid moves advance per iteration, then replay from the start.

WCSS convergence
Elbow method — final WCSS vs cluster count k
Theory & Key Formulas

$$\text{WCSS}=\sum_{j=1}^{k}\sum_{x\in C_j}\lVert x-\mu_j\rVert^2$$

The within-cluster sum of squares (WCSS). The objective function: for every data point x, the squared distance to the centroid μ_j of its cluster C_j, summed over all points. Smaller means tighter clusters.

$$\mu_j=\frac{1}{|C_j|}\sum_{x\in C_j}x$$

The centroid update. The new centroid μ_j is the mean position of the points in cluster C_j; |C_j| is the number of points in the cluster.

Each iteration first assigns points to the nearest centroid, then moves each centroid to the cluster mean. These two steps drive WCSS down monotonically until it converges in a finite number of steps.

What is the k-Means Clustering Simulator?

🙋
"Clustering" means grouping data, right? But with no correct labels, how does it decide where one group ends and the next begins?
🎓
Good question — that is exactly the appeal of "unsupervised learning". k-means builds groups using only one idea: things that are close together belong together. The procedure has just two steps. First, assign each point to its nearest "centroid". Then move that centroid to the mean position of the points that gathered around it. Repeat the two steps. Leave the slider at k=3 and watch the animation — you will see the ✕ centroids get pulled toward the centre of the points.
🙋
Just alternating two steps — does it really stop? Couldn't it keep moving forever?
🎓
It does stop, and that is mathematically guaranteed. The key is a number called WCSS, the within-cluster sum of squares. It adds up the squared distance from every point to its centroid and measures how tight the clusters are. Neither the assignment step nor the update step can ever increase WCSS — it always goes down or stays the same. A value that only decreases and has a lower bound must eventually halt. Look at the "WCSS convergence" chart: it drops in steps and flattens out. With the default settings it converges in about seven iterations.
🙋
I see! So what should I set k to? At the start you never know how many groups the data has.
🎓
That is where the "elbow method" comes in. Run k-means for k = 1, 2, 3, … and chart each final WCSS. WCSS always falls as k grows, but once you pass the true number of clusters the fall suddenly slows. The point where the curve bends like an elbow is a good estimate of k. On the "Elbow method" chart you should see the bend near the true cluster count of 3. Move the true-clusters slider to 5 and the bend shifts with it.
🙋
There are two initialisation methods — "k-means++" and "random". Does that change the result?
🎓
It does, quite a lot. k-means settles in a different place depending on where the initial centroids start. Random initialisation can put the centroids close together and get stuck in a poor local minimum. k-means++ is smarter: it picks the next centroid with higher probability the farther a point is from the existing centroids. The initial centroids spread across the data, so it reaches a good solution more often. That is why the default in scikit-learn is k-means++. Switch the initialisation selector and compare the converged WCSS.

Frequently Asked Questions

k-means (Lloyd's algorithm) alternates two steps. (1) Assignment step: assign each data point to the cluster of its nearest centroid. (2) Update step: move each cluster's centroid to the mean position of the points assigned to it. The two steps repeat until no point changes its assignment. The within-cluster sum of squares (WCSS) decreases on every iteration, so the algorithm is guaranteed to converge in a finite number of steps.
WCSS (within-cluster sum of squares) is the sum, over all points, of the squared distance from each point to the centroid of its cluster. It is written WCSS = Σ_j Σ_{x∈C_j} ‖x−μ_j‖² and is the objective function that measures how tight the clustering is. A smaller WCSS means the clusters are more compact. k-means decreases this WCSS monotonically on each iteration.
The elbow method runs k-means for k = 1, 2, 3, … and plots the final WCSS against k. WCSS always drops as k grows, but once you pass the true number of clusters in the data the drop becomes much flatter. The point where the curve bends like an elbow is a good estimate of k. The chart on the right of this tool lets you see that bend directly.
Random initialisation picks the first centroids arbitrarily, so they can end up clustered together and the algorithm can fall into a poor local minimum. k-means++ picks one centroid at random, then chooses each next centroid with a probability proportional to the squared distance D² to the nearest existing centroid. This spreads the initial centroids across the data, so the converged WCSS is usually smaller and more stable. This tool lets you switch the initialisation and compare results.

Real-World Applications

Customer segmentation: In marketing, customers are grouped by metrics such as spend, visit frequency and recency, and k-means sorts them into segments like "high value", "at risk of churn" and "new". It is a standard way to organise customers so each segment can get a tailored campaign; the segment count is chosen with the elbow method or the silhouette score. When mixing metrics with different scales, standardise them first.

Image colour quantisation: A photo can contain tens of thousands of colours, but if each pixel is treated as a point in RGB space, k-means reduces them to k representative colours. It is used for building GIF palettes, image compression and posterisation-style effects. With k set to 16 or 256, file size shrinks with almost no visible loss of quality.

Anomaly detection and pre-processing: You can learn clusters from normal data and flag points far from every centroid as "anomalous". k-means is also used to summarise a large dataset first, then run heavier downstream analysis on only the representative point of each cluster — a vector-quantisation (VQ) style of pre-processing. Classifying the states of sensor data is a typical example.

Organising engineering and CAE data: Clustering the results of many analysis cases (stress, displacement, natural frequencies and so on) with k-means reveals groups of cases that behave similarly. It is applied to condensing the huge result set of a design-of-experiments (DOE) study into representative cases, and to classifying and visualising mesh elements by a physical quantity.

Common Misconceptions and Pitfalls

The biggest misconception is that k-means always finds the best possible clustering. All k-means guarantees is that WCSS decreases on each iteration and the algorithm converges to a local optimum — there is no guarantee it is the globally best (global optimum) solution. The converged result depends on where the initial centroids were placed. In practice you therefore run it several times with different initialisations and keep the result with the smallest WCSS. k-means++ gives a good initial solution, but trying multiple runs is still the rule. Switching the initialisation in this tool shows that the WCSS can differ even for the same k.

Next, the assumption that k-means can find clusters of any shape. Because it assigns each point to the nearest centroid, the decision boundaries are always straight lines (hyperplanes in higher dimensions), and the clusters it produces implicitly assume a spherical (isotropic) shape of roughly equal size. Non-convex clusters such as crescents, rings or long thin bands, and clusters that differ greatly in size or density, will not be split correctly. For such data, methods like DBSCAN, spectral clustering or Gaussian mixture models are more suitable.

Finally, the carelessness of skipping pre-processing. Because k-means is based on Euclidean distance, a feature with a large scale dominates the distance. Mix "annual income (millions)" with "age (tens)" as-is and clusters get decided by income alone. Standardising each feature (mean 0, variance 1) is the basic step. Note too that k-means is sensitive to outliers, because the centroid is computed as a mean and an outlier pulls it strongly. When outliers are common, k-medoids (PAM), which uses medians, is an alternative.

How to Use

  1. Set cluster count k (2-10) using the kNum slider to define how many groups the algorithm will form from your dataset.
  2. Specify total data points (50-500) via nNum to control dataset size; larger datasets require more iterations to converge.
  3. Choose initialization method (sNum): random seed, k-means++ (deterministic), or forgy method to observe impact on iteration count and final WCSS.
  4. Set iteration limit (tNum: 10-200) as stopping criterion; convergence typically occurs between 15-60 iterations for 200-point datasets.
  5. Click Run to execute Lloyd's algorithm and monitor WCSS decrease, cluster assignments, and centroid movements in real-time.

Worked Example

Configure k=4 clusters with n=200 normally-distributed points (mean spacing ~0.5 units). Using k-means++ initialization: Algorithm converges in 12 iterations, achieving WCSS=850 (sum of squared distances from points to nearest centroid). With random initialization on identical data: Same k and n require 28 iterations, WCSS=862. Mean cluster radius stabilizes at 1.2 units. Largest cluster contains 68 points; smallest contains 31 points. Forgy initialization produces intermediate result: 18 iterations, WCSS=851. Compare centroid trajectories across methods—k-means++ reduces jitter, speeding convergence by ~60%.

Practical Notes

  1. Elbow method: Run simulator with k=2 through k=8 on fixed dataset; plot WCSS vs k to identify "elbow" point where marginal improvement drops below 5%—typically optimal k for real data.
  2. Initialization matters: k-means++ reduces bad local minima risk by 40-70% versus random seeding; critical for imbalanced clusters in industrial quality control datasets (e.g., defect detection with k=3: normal/minor/critical).
  3. Iteration scaling: With n=500 points, expect 25-45 iterations; exceeding 100 iterations signals poor initialization or k value mismatch—reduce k or switch initialization strategy.
  4. WCSS plateau detection: When WCSS change drops below 0.1% between iterations, algorithm converged; stopping early saves computation in production clustering pipelines.