What kind of algorithm is k-nearest neighbours (k-NN)?

k-NN is a supervised classification algorithm that does not really train a model: it just memorizes the training data. At prediction time, it selects the k training points closest to the query and takes a majority vote on their classes. The usual distance is Euclidean. Despite its simplicity, it often performs well and is a staple of introductory machine-learning courses.

How do you choose the value of k?

Very small k (such as k=1) is sensitive to outliers and noise and produces a jagged, over-fitted decision boundary. Very large k smooths the boundary too much and lets the majority class swallow minority classes. In practice, you pick the k that maximizes accuracy on cross-validation, and you keep it odd so that the majority vote never ties.

What is a decision boundary, and what are the background colours in the simulator?

A decision boundary is the locus of points where the predicted class changes from one to another. The simulator splits the plane into a 30 by 30 grid, runs k-NN at the centre of each cell and tints the cell with a light colour matching the predicted class. Changing k makes it intuitively clear how the boundary moves and smooths out.

What is leave-one-out (LOO) cross-validation accuracy?

Leave-one-out cross-validation removes one training point at a time, fits a k-NN classifier on the remaining points, predicts the held-out point and checks whether the prediction matches the true label. Repeating this for every point and counting the correct predictions gives a stable estimate of generalization accuracy even on small datasets.

k-NN 2D Classifier Simulator — Free Online Calculator

Parameters

k (number of neighbours)

—

Query point x

—

Query point y

—

Training data spread σ

—

Data is generated deterministically with an LCG of seed 42 (15 points per class, 45 in total). The Euclidean distance is used.

Results

—

k (number of neighbours)

—

Predicted class of the query

—

Mean distance to k nearest neighbours

—

LOO cross-validation accuracy

Training Data and Decision Boundary

Red = class 0 / blue = class 1 / green = class 2 / black × = query / dashed circle = distance to the k-th nearest neighbour

Theory & Key Formulas

k-NN is a lazy supervised learner that simply memorizes the training data and, at query time, selects the k closest training points and predicts the majority class among them.

Two-dimensional Euclidean distance. The training point is $\mathbf{x}_i = (x_{i1}, x_{i2})$ and the query is $\mathbf{x} = (x_1, x_2)$:

$$d(\mathbf{x},\mathbf{x}_i) = \sqrt{(x_1 - x_{i1})^2 + (x_2 - x_{i2})^2}$$

Predicted class. $\mathcal{N}_k(\mathbf{x})$ is the set of k nearest neighbours and $y_i$ is the class of training point $\mathbf{x}_i$:

$$\hat{y}(\mathbf{x}) = \arg\max_{c}\;\sum_{\mathbf{x}_i \in \mathcal{N}_k(\mathbf{x})}\mathbb{1}[y_i = c]$$

Leave-one-out cross-validation accuracy. $\hat{y}^{(-i)}$ is the prediction made by the classifier trained without $\mathbf{x}_i$:

$$\text{Acc}_\text{LOO} = \frac{1}{N}\sum_{i=1}^{N}\mathbb{1}[\hat{y}^{(-i)}(\mathbf{x}_i) = y_i]$$

When the majority vote ties, the simulator compares the mean distance to the k neighbours for each candidate class and picks the smaller one.

What is the k-NN 2D Classifier Simulator

🙋

I just started learning machine learning, and people say k-NN is the simplest one but still useful. Is that really true?

🎓

Really. Roughly speaking, k-NN does not learn a model at all. It memorizes the training data, and when a new point comes in, it finds the "k closest training points" and takes a majority vote on their classes. That is it. Look at the red, blue and green dots in the simulator above. The black × is the query point, and with k=5 it looks at the five nearest points and takes the vote.

🙋

When I move the k slider, the background colours change completely. What are those?

🎓

That is the "decision boundary". For every point on the plane, we compute "if this were the query, what class would k-NN predict?" and tint the background lightly with that class colour. At k=1 the boundary should be jagged, because a single outlier can drag it around. Crank k up to 11 or 21 and the boundary smooths out, but now you start to miss finer structure.

🙋

So is bigger k better, or smaller k better?

🎓

Neither — the right answer is "tune it to the data". Look at the LOO accuracy card: it removes one training point at a time, predicts it from the rest, and reports the rate of correct predictions. On the seed-42 dataset it sits around 90% near k=5. In practice you sweep k = 1, 3, 5, … and pick the k with the highest accuracy. Odd values are conventional so the majority vote never ties.

🙋

When I push the data spread σ up to 3.0, the clusters overlap and the accuracy drops a lot.

🎓

Right, and that is both a weakness and a strength of k-NN. When the data separates cleanly, accuracy is great. When clusters overlap, no choice of k saves you. In practice, "engineer features that separate the classes" comes first, and tuning the classifier comes second. The reason plain k-NN is a strong baseline is that it shows you, honestly, how good (or bad) your features really are.

Frequently Asked Questions

No, you choose by use case. This simulator uses the Euclidean distance (L2 norm) because it is intuitive in 2D, but practice often uses the Manhattan distance (L1), cosine similarity or the Mahalanobis distance. Cosine similarity is standard for high-dimensional sparse vectors such as document representations, and the Mahalanobis distance is preferred when feature scales differ widely.

Prediction-time cost. For every new query you compute the distance to every training point, which is O(N) for N points. On large data this becomes slow and unsuitable for real-time use. In practice you speed it up with spatial data structures such as kd-trees or ball-trees, or with approximate nearest-neighbour libraries like FAISS or Annoy.

Yes, it is essential. Because k-NN works on distance, a large-scale feature (such as annual income) will dominate a small-scale one (such as age). Always standardize (mean 0, variance 1) or apply min-max normalization. This simulator already uses 2D data on a single common scale, so no scaling is needed here.

As k approaches the number of training points, every query returns "the majority of the entire dataset", that is, the most frequent class. The decision boundary nearly vanishes and minority classes are never predicted. At the other extreme, k=1 fits the training data perfectly and tends to overfit. Watching the LOO accuracy card while choosing k is the practical fix.

Real-World Applications

Recommendation systems: In movie or product recommendation, users and items are represented as multi-dimensional vectors and "similar users" or "similar items" are retrieved by k-NN. Amazon's "customers who bought this also bought…" and Netflix's collaborative filtering build on this idea. Production systems use approximate nearest-neighbour libraries to pull k neighbours instantly from hundreds of millions of vectors.

Anomaly detection: On manufacturing sensor data or network traffic, the "mean distance to the k nearest neighbours" can be used as an anomaly score. In a space where normal data is dense, a point whose neighbours are all far away is "isolated", that is, anomalous. The "mean distance to k nearest neighbours" card in this simulator is exactly that quantity.

Image classification and OCR baselines: For classic benchmarks such as handwritten-digit recognition (MNIST), k-NN was a strong baseline in the pre-deep-learning era. Treating raw pixel values as a 784-dimensional vector and running k-NN already reaches a misclassification rate around 3%. It still serves as a "minimum bar" sanity check for new models.

Geographic and spatial search: "Five cafes nearest to my current location" is intrinsically a k-NN query. Latitude and longitude form a 2D coordinate, the distance is the great-circle distance, and k-NN accelerated by a spatial index such as an R-tree powers many geographical services.

Common Misconceptions and Cautions

The most common misconception is to think that "k-NN is easy because it requires no training". The training phase is indeed just "save the data", but prediction is heavy and feature engineering is everything. Push the "training data spread σ" up to 3.0 in the simulator. The clusters overlap, and no choice of k can recover the LOO accuracy. This says "the classes are not separated in feature space", which k-NN alone cannot fix. Even when training looks unnecessary, a human has to do real work designing features.

The next pitfall is assuming "larger k is always better". Yes, k=1 overfits, but as you keep raising k, the opposite problem appears: minority classes get swallowed by the majority. Sweep k from 1 to 25 in the simulator while watching LOO accuracy. The optimum is somewhere in the middle, and it shifts with the data distribution. k is a hyperparameter that must be tuned to the data, ideally via systematic cross-validation. Both "always make it large" and "always use 1" are wrong.

Finally, do not over-trust "if LOO accuracy is high, the production accuracy will be the same". LOO cross-validation implicitly assumes that the training distribution matches the operating distribution. In reality, "domain shift" caused by changes in collection time or sensors is common, and an LOO score of 90% can drop to 70% in production. The LOO accuracy shown by the simulator is the value under ideal conditions; always supplement it with hold-out or time-series cross-validation when deploying.

k-NN 2D Classifier Simulator — Decision Boundary and LOO Accuracy

What is the k-NN 2D Classifier Simulator

Frequently Asked Questions

Real-World Applications

Common Misconceptions and Cautions

Related Tools