Rudolf Adamkovič Personal site


K-nearest neighbors

a.k.a.

Define

… as a nonparametric method of classification that assigns to \(x\)

the most common class among \(k\) nearest neighbors

of \(x\) by Euclidean distance.

Discuss

Close in precision to the optimal Bayes classifier.

Parameterize

Explore

Import NumPy, Matplotlib, and Scikit-learn.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier

Ensure reproducibility.

import random
random.seed(0)

Use a dark theme. ☺

plt.style.use('dark_background')
plt.rcParams.update({'savefig.transparent': True})

Define the training data as

(In Scikit-learn, these go by ‘data’ and ‘target’, respectively.)

training_data = np.array([[1, 1],
                          [1, 2],
                          [2, 1],
                          [6, 8],
                          [7, 7],
                          [8, 6]])
training_target = np.array([0, 0, 0, 1, 1, 1])
training = np.c_[training_data, training_target]

training
xyclass
110
120
210
681
771
861

Visualize the training data.

training_class_0_mask = training[:, 2] == 0
training_class_0_x = training[training_class_0_mask][:, 0]
training_class_0_y = training[training_class_0_mask][:, 1]

training_class_1_mask = training[:, 2] == 1
training_class_1_x = training[training_class_1_mask][:, 0]
training_class_1_y = training[training_class_1_mask][:, 1]

plt.figure(figsize=(5, 5))
plt.scatter(training_class_0_x, training_class_0_y, marker='D')
plt.scatter(training_class_1_x, training_class_1_y, marker='s')
plt.grid(True, alpha = 0.25)

plt
babel-results/f5fe7c18-e9c1-40f8-805c-ceb9bf98d4bd

Define the test data.

test_data = np.array([[1, 6],
                      [1, 8],
                      [3, 8],
                      [6, 1],
                      [8, 1],
                      [8, 3]])

test_data
xy
16
18
38
61
81
83

Visualize the test data.

test_data_x = test_data[:, 0]
test_data_y = test_data[:, 1]

plt.figure(figsize=(5, 5))
plt.scatter(test_data_x, test_data_y, marker='*')
plt.grid(True, alpha = 0.25)

plt
babel-results/ebc9bed5-b7ec-4a4c-97a0-925a75d0c918
  1. Create a KNN model with the neighborhood size \(k = 3\).
  2. Train, or fit, the model to the training data.
knn = KNeighborsClassifier(n_neighbors = 3)
knn.fit(training_data, training_target)

Classify the test data.

test_target = knn.predict(test_data)
test = np.c_[test_data, test_target]

test
xyclass
160
181
381
610
811
831

Visualize the test data next to the training data.

both = np.r_[training, test]

both_class_0_mask = both[:, 2] == 0
both_class_0_x = both[both_class_0_mask][:, 0]
both_class_0_y = both[both_class_0_mask][:, 1]

both_class_1_mask = both[:, 2] == 1
both_class_1_x = both[both_class_1_mask][:, 0]
both_class_1_y = both[both_class_1_mask][:, 1]

plt.figure(figsize=(5, 5))
plt.scatter(both_class_0_x, both_class_0_y, marker='D')
plt.scatter(both_class_1_x, both_class_1_y, marker='s')
plt.grid(True, alpha = 0.25)

plt
babel-results/bd032dc8-8eeb-4675-becb-66e1661d150f


© 2024 Rudolf Adamkovič under GNU General Public License version 3.
Made with Emacs and secret alien technologies of yesteryear.