Data is generated deterministically with an LCG of seed 42 (15 points per class, 45 in total). The Euclidean distance is used.
Red = class 0 / blue = class 1 / green = class 2 / black × = query / dashed circle = distance to the k-th nearest neighbour
k-NN is a lazy supervised learner that simply memorizes the training data and, at query time, selects the k closest training points and predicts the majority class among them.
Two-dimensional Euclidean distance. The training point is $\mathbf{x}_i = (x_{i1}, x_{i2})$ and the query is $\mathbf{x} = (x_1, x_2)$:
$$d(\mathbf{x},\mathbf{x}_i) = \sqrt{(x_1 - x_{i1})^2 + (x_2 - x_{i2})^2}$$Predicted class. $\mathcal{N}_k(\mathbf{x})$ is the set of k nearest neighbours and $y_i$ is the class of training point $\mathbf{x}_i$:
$$\hat{y}(\mathbf{x}) = \arg\max_{c}\;\sum_{\mathbf{x}_i \in \mathcal{N}_k(\mathbf{x})}\mathbb{1}[y_i = c]$$Leave-one-out cross-validation accuracy. $\hat{y}^{(-i)}$ is the prediction made by the classifier trained without $\mathbf{x}_i$:
$$\text{Acc}_\text{LOO} = \frac{1}{N}\sum_{i=1}^{N}\mathbb{1}[\hat{y}^{(-i)}(\mathbf{x}_i) = y_i]$$When the majority vote ties, the simulator compares the mean distance to the k neighbours for each candidate class and picks the smaller one.