… as a nonparametric method of classification that assigns to \(x\)
of \(x\) by Euclidean distance.
Close in precision to the optimal Bayes classifier.
Import NumPy, Matplotlib, and Scikit-learn.
import numpy as np import matplotlib.pyplot as plt from sklearn.neighbors import KNeighborsClassifier
Ensure reproducibility.
import random random.seed(0)
Use a dark theme. ☺
plt.style.use('dark_background') plt.rcParams.update({'savefig.transparent': True})
Define the training data as
(In Scikit-learn, these go by ‘data’ and ‘target’, respectively.)
training_data = np.array([[1, 1], [1, 2], [2, 1], [6, 8], [7, 7], [8, 6]]) training_target = np.array([0, 0, 0, 1, 1, 1]) training = np.c_[training_data, training_target] training
x | y | class |
---|---|---|
1 | 1 | 0 |
1 | 2 | 0 |
2 | 1 | 0 |
6 | 8 | 1 |
7 | 7 | 1 |
8 | 6 | 1 |
Visualize the training data.
training_class_0_mask = training[:, 2] == 0 training_class_0_x = training[training_class_0_mask][:, 0] training_class_0_y = training[training_class_0_mask][:, 1] training_class_1_mask = training[:, 2] == 1 training_class_1_x = training[training_class_1_mask][:, 0] training_class_1_y = training[training_class_1_mask][:, 1] plt.figure(figsize=(5, 5)) plt.scatter(training_class_0_x, training_class_0_y, marker='D') plt.scatter(training_class_1_x, training_class_1_y, marker='s') plt.grid(True, alpha = 0.25) plt
Define the test data.
test_data = np.array([[1, 6], [1, 8], [3, 8], [6, 1], [8, 1], [8, 3]]) test_data
x | y |
---|---|
1 | 6 |
1 | 8 |
3 | 8 |
6 | 1 |
8 | 1 |
8 | 3 |
Visualize the test data.
test_data_x = test_data[:, 0] test_data_y = test_data[:, 1] plt.figure(figsize=(5, 5)) plt.scatter(test_data_x, test_data_y, marker='*') plt.grid(True, alpha = 0.25) plt
knn = KNeighborsClassifier(n_neighbors = 3) knn.fit(training_data, training_target)
Classify the test data.
test_target = knn.predict(test_data) test = np.c_[test_data, test_target] test
x | y | class |
---|---|---|
1 | 6 | 0 |
1 | 8 | 1 |
3 | 8 | 1 |
6 | 1 | 0 |
8 | 1 | 1 |
8 | 3 | 1 |
Visualize the test data next to the training data.
both = np.r_[training, test] both_class_0_mask = both[:, 2] == 0 both_class_0_x = both[both_class_0_mask][:, 0] both_class_0_y = both[both_class_0_mask][:, 1] both_class_1_mask = both[:, 2] == 1 both_class_1_x = both[both_class_1_mask][:, 0] both_class_1_y = both[both_class_1_mask][:, 1] plt.figure(figsize=(5, 5)) plt.scatter(both_class_0_x, both_class_0_y, marker='D') plt.scatter(both_class_1_x, both_class_1_y, marker='s') plt.grid(True, alpha = 0.25) plt