The query point sweeps the plane automatically, drawing live links to its k neighbours, the enclosing circle and the majority-vote class colour. Data is generated deterministically with an LCG of seed 42 (15 points per class, 45 in total). The Euclidean distance is used.
While paused, move the sliders to update the result instantly.
Red = class 0 / blue = class 1 / green = class 2 / white links = the k neighbours / dashed circle = distance to the k-th neighbour / filled dot = query (majority-vote colour)
k-NN is a lazy supervised learner that simply memorizes the training data and, at query time, selects the k closest training points and predicts the majority class among them.
Two-dimensional Euclidean distance. The training point is $\mathbf{x}_i = (x_{i1}, x_{i2})$ and the query is $\mathbf{x} = (x_1, x_2)$:
$$d(\mathbf{x},\mathbf{x}_i) = \sqrt{(x_1 - x_{i1})^2 + (x_2 - x_{i2})^2}$$Predicted class. $\mathcal{N}_k(\mathbf{x})$ is the set of k nearest neighbours and $y_i$ is the class of training point $\mathbf{x}_i$:
$$\hat{y}(\mathbf{x}) = \arg\max_{c}\;\sum_{\mathbf{x}_i \in \mathcal{N}_k(\mathbf{x})}\mathbb{1}[y_i = c]$$Leave-one-out cross-validation accuracy. $\hat{y}^{(-i)}$ is the prediction made by the classifier trained without $\mathbf{x}_i$:
$$\text{Acc}_\text{LOO} = \frac{1}{N}\sum_{i=1}^{N}\mathbb{1}[\hat{y}^{(-i)}(\mathbf{x}_i) = y_i]$$When the majority vote ties, the simulator compares the mean distance to the k neighbours for each candidate class and picks the smaller one.