OpenCV's Understanding KNN Neighborhood Algorithm k-Nearest Neighbour

goal

In this chapter, an understanding of

Concept of k Nearest Neighbor (kNN) Algorithm

doctrinal

kNN is one of the simplest classification algorithms available for supervised learning. The idea is to search for the nearest neighbors of the test data in the feature space. Use the image below to study it.

In the image, there areTwo clades, blue squares and red triangles. Call each of theseClass. Their houses are shown in their town maps, which we callFeature Space. (One can think of the feature space as the space in which all data is projected. For example, consider a 2D coordinate space. Each data has two features, x and y coordinates. This data can be represented in a 2D coordinate space; if there are three features, a 3D space is required; now consider N features, which require an N-dimensional space, and this N-dimensional space is its feature space. In the above figure, it can be considered as a 2D case. (There are two features). Now a new member enters the town and creates a new house, shown as a green circle. It should be added to one of these blue/red families. Call the processClassification.. How should this new member be categorized? In this paper, kNN algorithm is used to solve the above problem.

One way to do this is to check who is its nearest neighbor. It is obvious from the image that it is in the red triangle family. Therefore, it is also added to the red triangles. This method is abbreviated asNearest Neighbour Classification, because the classification depends only on nearest neighbors. But this is problematic. The red triangle may be the nearest. But.What if there are a lot of blue squares in the neighborhood? Then, the blue squares have more weight in the area than the red triangles. Therefore, it is not enough to check the closest one. Instead, check some of the k nearest neighbor clans. Then, depending on who is in the majority, the new sample belongs to that class. In the above figure, suppose the setupk=3, i.e., the 3 nearest families. It has two red colors and one blue color (there are two equidistant blue colors, but since k = 3, only one of them is taken), so it should join the red family. However, if we takek=7When it has 5 blue families and 2 red families, it should join the blue family. Therefore.The classification results vary with the value of k. What's even more interesting is ifk=4When it has 2 red neighbors and 2 blue neighbors, it's a tie! ThereforeIt is best to set k to an odd numberThe method is called Since the classification depends on the k nearest neighbors, the method is calledk-nearest neighbor. Similarly, in kNNIs it fair to give equal weight to all when considering k neighbors?? For example, byk=4case as an example, which by volume is a tie. But one of theThe two red clans are closer to it than the other two blue clans. Therefore, it is more deserving of being added to the red. So how do you explain it mathematically?Give each family some weight based on their distance to the newcomers。For those who are close to it, the weight increases, and for those who are far from it, the weight decreases. Then, the total weight of each clan is added separately. Whoever gets the highest total weight, the new sample is categorized into that clan. This is calledmodified kNNOr ** weighted kNN**. Therefore, the important information to know when using the kNN algorithm is as follows:

Information about all the houses in the town is needed, as the distance from the new sample to all existing houses must be checked to find the nearest neighbor. Requires a lot of memory and more time for calculations if there are many houses and families
Any type of "training" or preparation takes almost zero time. "Learning" involves memorizing (storing) data prior to testing and classification.

kNN in OpenCV

Just like above, a simple example will be done here with two families (classes). So, here, the red family will be labeled asClass-0(denoted by 0), labeling the blue series asClass-1(denoted by 1). Create 25 families or 25 training data and label them as class 0 or class 1. Do all this with the help of Random Number Generator in Numpy. Then plot them with the help of Matplotlib.The red series is shown as red triangles and the blue series is shown as blue squares。

import cv2
import numpy as np
from matplotlib import pyplot as plt
# Feature set containing (x,y) values of 25 known/training data
trainData = (0, 100, (25, 2)).astype(np.float32)
# Label each one either Red or Blue with numbers 0 and 1
responses = (0, 2, (25, 1)).astype(np.float32)
responses
# Take Red neighbours and plot them
red = trainData[()==0]
(red[:, 0], red[:, 1], 80, 'r', '^')
# Take Blue neighbours and plot them
blue = trainData[()==1]
(blue[:, 0], blue[:, 1], 80, 'b', 's')
()

Since a random number generator is used, each time you run the code you will get different data. Next the kNN algorithm is launched and the trainData and response are passed to train the kNN (it will build the search tree). A new sample will then be classified with the help of the kNN in OpenCV. Before getting into the kNN, it is necessary to know about the test data (new sample data). The data should be a floating point array of size number of testdata × number of features × number of features. then find the nearest neighbors of the new addition. It is possible to specify how many neighbors we wantk(set to 3 here). It returns:

The labeling of new samples depends on the kNN theory. To use the "nearest neighbor" algorithm, just specifyk=1is sufficient, where k is the number of neighbors
Labeling of k nearest neighbors
Distance corresponding to the new distance to the new neighbor

Here's a look at how it works, with the new sample labeled green.

# Feature set containing (x,y) values of 25 known/training data
trainData = (0, 100, (25, 2)).astype(np.float32)
# Label each one either Red or Blue with numbers 0 and 1
responses = (0, 2, (25, 1)).astype(np.float32)
# Take Red neighbours and plot them
red = trainData[()==0]
(red[:, 0], red[:, 1], 80, 'r', '^')
# Take Blue neighbours and plot them
blue = trainData[()==1]
(blue[:, 0], blue[:, 1], 80, 'b', 's')
newcomer = (0, 100, (1, 2)).astype(np.float32)
(newcomer[:, 0], newcomer[:, 1], 80, 'g', 'o')
knn = .KNearest_create()
(trainData, .ROW_SAMPLE, responses)
ret, results, neighbours, dist = (newcomer, 3)
print("result: {}\n".format(results))
print("neighbours: {}\n".format(neighbours))
print("distance: {}\n".format(dist))
()

The following results were obtained:

It says that our new sample has 3 close neighbors, one from the Red family and the other two from the Blue family. Therefore, he is labeled as a Blue family.

If there is a large amount of data, it can be passed as an array. The corresponding result obtained is in the form of an array.

# Feature set containing (x,y) values of 25 known/training data
trainData = (0, 100, (25, 2)).astype(np.float32)
# Label each one either Red or Blue with numbers 0 and 1
responses = (0, 2, (25, 1)).astype(np.float32)
# Take Red neighbours and plot them
red = trainData[()==0]
(red[:, 0], red[:, 1], 80, 'r', '^')
# Take Blue neighbours and plot them
blue = trainData[()==1]
(blue[:, 0], blue[:, 1], 80, 'b', 's')
# newcomer = (0, 100, (1, 2)).astype(np.float32)
newcomers = (0,100,(10,2)).astype(np.float32)  # 10 new-comers
(newcomers[:, 0], newcomers[:, 1], 80, 'g', 'o')
knn = .KNearest_create()
(trainData, .ROW_SAMPLE, responses)
ret, results, neighbours, dist = (newcomers, 3)  # The results also will contain 10 labels.
()
print("result: {}\n".format(results))
print("neighbours: {}\n".format(neighbours))
print("distance: {}\n".format(dist))

Additional resources

NPTEL notes on Pattern Recognition, Chapter 11
Wikipedia article on Nearest neighbor search
Wikipedia article on k-d tree

Above is the OpenCV of understanding KNN k-Nearest Neighbour details, more information about OpenCV understanding KNN please pay attention to my other related articles!