Treat pairwise affinities as a graph, take the leading k eigenvectors of the normalised Laplacian L=D^-1/2(D-W)D^-1/2, and run k-means in that embedded space. Watch spectral clustering split moon-shaped and concentric-ring datasets that ordinary k-means cannot handle.
Parameters
Data shape
Non-convex shapes (moons, circles) show off spectral clustering best
Affinity
How to weight the graph edges W_ij
Number of clusters K
Total samples N
Affinity matrix is N×N → memory scales as N²
RBF bandwidth σ
W_ij = exp(-||x_i-x_j||²/(2σ²))
Eigenvectors used k
Smallest k eigenvectors give the low-dimensional embedding
Results
—
Clusters K
—
Eigenvectors k
—
Clustering quality
—
vs k-means (%)
—
Eigengap
—
Memory (MB)
—
Graph affinity view — edges and spectral colouring
Dots: samples, grey lines: k-NN graph edges, colours: spectral cluster labels — note how non-convex shapes are split differently from k-means.
W: the N×N affinity matrix (RBF / k-NN / cosine etc.), D: the diagonal degree matrix D_ii=Σ_j W_ij, L: the symmetric normalised graph Laplacian. Stack the k eigenvectors of L corresponding to the smallest eigenvalues as an N×k matrix, normalise each row to unit length, and run k-means (Ng-Jordan-Weiss 2001).
RBF affinity is controlled by the bandwidth σ. The eigengap heuristic selects the number of clusters K at the position of the largest jump in the sorted eigenvalues.
I tried k-means on a moon-shaped dataset and it failed completely. Why does spectral clustering split the same data so cleanly?
🎓
k-means uses Euclidean distance to a cluster centre, so it can only carve out convex (roughly spherical) regions. On half-moons the two moon centroids are actually close to each other and the far ends of the same moon are further apart than points on the other moon — the wrong distance pattern. Spectral clustering changes the question: it defines pairwise "affinities" and builds a graph, then takes the eigenvectors of the graph Laplacian and embeds the data in a low-dimensional space. Points connected by paths on the graph land in the same place, so however the moon bends, if you can walk along it you end up in one cluster.
🙋
Got it. And how do I build the affinity? The panel offers RBF, k-NN, cosine — when do I pick which?
🎓
The most common is the RBF (Gaussian) affinity, W_ij = exp(-||x_i-x_j||²/(2σ²)) — close pairs get values near 1 and distant pairs decay exponentially to zero. It is smooth and easy to use but σ is delicate: too big and everything fuses into one blob, too small and only direct neighbours connect. A k-NN graph just links each point to its k nearest neighbours, gives a sparse matrix, and removes the σ tuning — great for large data. Cosine similarity is the default for text and high-dimensional direction-only vectors. As a rule of thumb: RBF for image segmentation, k-NN for social networks and recommendations, cosine for NLP.
🙋
Are the clusters really visible in the eigenvectors? The maths is hard to picture.
🎓
The intuition: Laplacian eigenvalues are vibration frequencies on the graph and eigenvectors are the corresponding mode shapes. If the data really were K perfectly disconnected blobs, the K smallest eigenvalues would all be zero, and their eigenvectors would be indicator vectors that are constant on one blob and zero elsewhere. Stack them as rows and the points within a blob collapse to the same row — k-means trivially separates them. Real data is noisy so it is not perfect, but it is close enough that the eigen-embedding turns ugly non-convex shapes into nice convex blobs where k-means can finish the job.
🙋
Should the number of eigenvectors k always equal the number of clusters K?
🎓
Usually yes, k=K is the Ng-Jordan-Weiss recipe. The bonus is that the eigenvalue spectrum itself tells you the "right" K — sort the eigenvalues ascending, find the largest jump (the eigengap), and that index is K. So if you see 0.01, 0.02, 0.03, 0.5, 0.52… the gap before the fourth eigenvalue says K=3. The right-hand spectrum chart in this tool shows that pattern; on the moon dataset you should clearly see the jump around K=2 or 3.
🙋
For N=200 the memory shown is 0.3 MB, but in practice N is tens of thousands. Does this method scale?
🎓
That is the classical weakness — the affinity matrix is N×N and the dense eigendecomposition is O(N³). At N=10,000 the matrix alone is 800 MB and the solve takes tens of minutes. Three rescues: (1) build a sparse k-NN graph and use ARPACK/Lanczos for just the K smallest eigenvalues, (2) use the Nyström approximation to represent the matrix by m landmark columns (m≪N), or (3) apply random projection to shrink dimensions before computing affinities. scikit-learn's SpectralClustering uses (1) by default and stays practical up to N≈100k. Beyond that, mini-batch k-means or Power Iteration Clustering is more realistic.
Frequently Asked Questions
k-means and hierarchical clustering assume small intra-cluster distances in the original data space, so they can only separate roughly convex (ball-shaped) blobs. On moon, ring or spiral shapes nearby points end up in different clusters and far points in the same cluster — the opposite of the intuitive grouping. Spectral clustering treats the affinity matrix W as a graph and computes the leading k eigenvectors of the normalised Laplacian L=D^-1/2(D-W)D^-1/2, then runs k-means in that low-dimensional embedding. Because the cut is made on the graph, even non-convex shapes are separated cleanly.
In the RBF affinity W_ij = exp(-||x_i-x_j||²/(2σ²)) the value of σ controls clustering quality. Too large and even distant points get strong weights, collapsing everything into one blob. Too small and only direct neighbours have non-zero weight, creating isolated points where k-means fails. In practice start with the median pairwise distance or the mean distance to the k-th nearest neighbour, then sweep σ and pick the value that maximises the eigengap λ_{K+1}-λ_K. In this tool σ≈0.3 yields a quality score of 0.95 on the moon dataset.
The affinity matrix is N×N and a dense eigendecomposition is O(N³) with O(N²) memory. N=10,000 alone needs ~800 MB; N=100,000 needs 80 GB, which is unrealistic. Three escape routes: (1) build a k-nearest-neighbour graph so W is sparse and use Lanczos / ARPACK iterative solvers for the K smallest eigenvalues (O(N·K·iter)), (2) use the Nyström approximation that represents the N×N matrix with m sampled columns (m≪N), or (3) apply random projection to compress feature dimensions before computing affinities. scikit-learn's SpectralClustering defaults to ARPACK + k-NN and scales to N≈100k.
Spectral clustering offers a dedicated rule for choosing K — the eigengap heuristic. Sort the Laplacian eigenvalues in ascending order and pick the K at which λ_{K+1}-λ_K is largest. Theoretically, if the data forms K perfectly disconnected components, L has exactly K zero eigenvalues and λ_{K+1}>0, producing a clear gap. Real data is noisy and the gap is softer, but you still look for a distinct elbow in the eigenvalue plot. Unlike hierarchical clustering, where K must be guessed from a dendrogram, here the spectrum tells you objectively.
Real-World Applications
Image segmentation (Shi-Malik 2000): The breakthrough application of spectral clustering. Each pixel is a node, the affinity combines colour/texture similarity with spatial distance, and Normalised Cut cleanly separates object from background. It is still the baseline on the Berkeley Segmentation Dataset. Modern variants run spectral clustering on top of deep CNN features for medical-image organ segmentation and satellite land-use classification.
Community detection in social networks: In follower graphs, co-authorship networks and email networks, "community = densely connected subgraph" is exactly what spectral clustering finds. Using the adjacency matrix as W and computing the leading eigenvectors of the normalised Laplacian recovers Twitter follower clusters or Wikipedia topic groups. It gives a different angle from Modularity maximisation, so the two are often used in combination.
Gene-expression and single-cell RNA-seq analysis: Cells become nodes, cosine similarity between expression profiles becomes W, and cell types fall into separate clusters in the eigenvector embedding. Major packages such as Seurat and Scanpy use spectral-style pipelines as default and combine them with UMAP for visualisation, making this the workhorse of modern cell-atlas projects.
Mesh partitioning and substructuring in CAE: When a finite-element mesh has to be split across many CPUs, spectral bisection of the element-connectivity graph is the classical method. The Fiedler vector (the second smallest eigenvector) splits the mesh into two balanced parts by sign, recursively. Libraries such as MeTiS and Scotch implement this and run inside ANSYS / OpenFOAM parallel solvers — a nice bridge between clustering and numerical computation.
Common Misconceptions and Pitfalls
The biggest pitfall is assuming a fixed RBF bandwidth σ that always works. σ dominates the clustering result, and the right σ changes whenever the data scale changes. If σ=0.3 is optimal after normalising your features to [0, 1], you will need σ=30 once you switch back to the raw 0-100 range to get the same behaviour. The absolute value of σ becomes meaningful only relative to the preprocessing pipeline. A safer recipe is to seed σ at the median pairwise distance or, even better, replace RBF with a k-NN graph so σ disappears as a hyperparameter altogether.
Next, letting the eigendecomposition run as a dense solve every time. SpectralClustering with eigen_solver='arpack' (the default) only computes the K smallest eigenvalues by iteration, so it scales O(N·K). With eigen_solver='dense' it runs LAPACK on the whole matrix at O(N³); for N=10,000 the difference is seconds vs. tens of minutes. Even modern code samples sometimes hard-code dense, so check the documentation. When using a k-NN graph, pass W as a scipy.sparse matrix so it is not silently densified.
Finally, not checking the stability of the result. The final-stage k-means in spectral clustering depends on initialisation, so set n_init=10 or higher and keep the best. ARPACK convergence itself depends on the random initial vector, so different random seeds will produce slightly different embeddings. Published papers and field reports typically run several seeds, compute pairwise Adjusted Rand Index (ARI), and confirm stability above 0.95. Ignoring seed sensitivity risks promoting a single lucky result as the final answer.
How to Use
Set numSamples (e.g. 500) to define dataset cardinality and rbfSigma (e.g. 0.5) to control RBF kernel bandwidth for affinity matrix W computation
Specify numClusters K (e.g. 4) and numEigenvectors k—typically k equals K or slightly higher to capture spectral structure; simulator computes normalised Laplacian L = D^−1/2(D−W)D^−1/2
Extract leading k eigenvectors, apply k-means to eigenvector rows, observe Clusters K output and Eigengap (separation between k-th and (k+1)-th eigenvalue) to validate cluster separability
Worked Example
Dataset: 300 samples, rbfSigma=0.6, numClusters K=3, numEigenvectors k=3. RBF affinity matrix W ∈ R^300×300 constructed; degree matrix D computed. Normalised Laplacian eigenvectors extracted; k-means applied to 300×3 feature matrix. Output: Clustering quality=0.87, Eigengap=0.23 (strong separation), Memory=12 MB. Spectral method achieves 94% improvement vs standard k-means on non-convex moon dataset.
Practical Notes
RBF bandwidth: rbfSigma~0.3–0.8 suits unit-normalised Euclidean data; undersmoothed (sigma<0.2) loses connectivity, oversmoothed (sigma>1.5) collapses clusters into single component
Eigengap magnitude indicates cluster quality—Eigengap>0.15 signals well-separated clusters; Eigengap<0.05 suggests overlapping structure or k mismatch
For large N (>5000), computational cost dominates eigensolver O(n^3); normalised Laplacian stability ensures numerical robustness vs unnormalised L=D−W on imbalanced degree distributions (common in sensor networks, protein interaction graphs)