brief description
The main job of this algorithm is to measure the distance between different feature values, there is a distance for this, and it can be classified.
Abbreviation kNN.
Known: training sets, and labels for each training set.
Next: compare to the data in the training set and calculate the k most similar distances. Choose the one with the most similar data for classification. Use it as the classification for the new data.
python example
# -*- coding: cp936 -*-
The #win system applies cp936 encoding, in linux it is better to utf-8.
from numpy import *#Introduce scientific computing package
import operator # Classic python function library. Operator module.
#Creating a dataset
def createDataSet():
group=array([[1.0,1.1],[1.0,1.0],[0,0],[0,0.1]])
labels=['A','A','B','B']
return group,labels
# Algorithmic core
#inX: Input vector for classification. It is about to be classified.
#dataSet: training sample set
#labels:label vector
def classfy0(inX,dataSet,labels,k):
# Distance calculation
dataSetSize =[0]# Get the number of rows in the array. I.e. know how many training data
diffMat =tile(inX,(dataSetSize,1))-dataSet#tile: function in numpy. tile takes the original one array and expands it into 4 identical arrays. diffMat gets the difference between the target and training values.
sqDiffMat = diffMat**2#Square each element separately
sqDistances = (axis=1)# Corresponding columns are multiplied together, i.e. you get the square of each distance
distances =sqDistances**0.5# Square up to get the distances.
sortedDistIndicies=()# ascending order
# Select the k points with the smallest distance.
classCount={}
for i in range(k):
voteIlabel=labels[sortedDistIndicies[i]]
classCount[voteIlabel]=(voteIlabel,0)+1
# Sort
sortedClassCount=sorted((),key=(1),reverse=True)
return sortedClassCount[0][0]
bonus
To add your module to the default search path in python, create a file in the python/lib/-packages directory and write the path to the module you wrote.