I. Brief summary of the algorithm
We would like to have such a function: accepting inputs and then predicting the categories, so that it is used for classification. Here, the sigmoid function in math is used. The specific expression and function image of the sigmoid function are as follows:
It can be seen more clearly that when the input x is less than 0, the function value <0.5 predicts the classification as 0; when the input x is greater than 0, the function value >0.5 predicts the classification as 1.
1.1 Representation of the prediction function
1.2 Solving for the parameters
II. Code Implementation
function sigmoid calculates the corresponding function value; gradAscent implements batch-gradient ascent, meaning that all datasets are taken into account in each iteration; while in stoGradAscent0, the examples in the dataset are all compared there, and the complexity is greatly reduced; stoGradAscent1 is an improvement to stochastic gradient ascent improvement, the specific changes are that the frequency of each change in alpha is variable, and the examples used for each parameter update are randomly selected.
from numpy import * import as plt def loadDataSet(): dataMat = [] labelMat = [] fr = open('') for line in (): lineArr = ('\n').split('\t') ([1.0, float(lineArr[0]), float(lineArr[1])]) (int(lineArr[2])) () return dataMat, labelMat def sigmoid(inX): return 1.0/(1+exp(-inX)) def gradAscent(dataMatIn, classLabels): dataMatrix = mat(dataMatIn) labelMat = mat(classLabels).transpose() m,n=shape(dataMatrix) alpha = 0.001 maxCycles = 500 weights = ones((n,1)) errors=[] for k in range(maxCycles): h = sigmoid(dataMatrix*weights) error = labelMat - h (sum(error)) weights = weights + alpha*()*error return weights, errors def stoGradAscent0(dataMatIn, classLabels): m,n=shape(dataMatIn) alpha = 0.01 weights = ones(n) for i in range(m): h = sigmoid(sum(dataMatIn[i]*weights)) error = classLabels[i] - h weights = weights + alpha*error*dataMatIn[i] return weights def stoGradAscent1(dataMatrix, classLabels, numIter = 150): m,n=shape(dataMatrix) weights = ones(n) for j in range(numIter): dataIndex=range(m) for i in range(m): alpha= 4/(1.0+j+i)+0.01 randIndex = int((0,len(dataIndex))) h = sigmoid(sum(dataMatrix[randIndex]*weights)) error = classLabels[randIndex]-h weights=weights+alpha*error*dataMatrix[randIndex] del(dataIndex[randIndex]) return weights def plotError(errs): k = len(errs) x = range(1,k+1) (x,errs,'g--') () def plotBestFit(wei): weights = () dataMat, labelMat = loadDataSet() dataArr = array(dataMat) n = shape(dataArr)[0] xcord1=[] ycord1=[] xcord2=[] ycord2=[] for i in range(n): if int(labelMat[i])==1: (dataArr[i,1]) (dataArr[i,2]) else: (dataArr[i,1]) (dataArr[i,2]) fig = () ax = fig.add_subplot(111) (xcord1, ycord1, s=30, c='red', marker='s') (xcord2, ycord2, s=30, c='green') x = arange(-3.0,3.0,0.1) y=(-weights[0]-weights[1]*x)/weights[2] (x,y) ('x1') ('x2') () def classifyVector(inX, weights): prob = sigmoid(sum(inX*weights)) if prob>0.5: return 1.0 else: return 0 def colicTest(ftr, fte, numIter): frTrain = open(ftr) frTest = open(fte) trainingSet=[] trainingLabels=[] for line in (): currLine = ('\n').split('\t') lineArr=[] for i in range(21): (float(currLine[i])) (lineArr) (float(currLine[21])) () trainWeights = stoGradAscent1(array(trainingSet),trainingLabels, numIter) errorCount = 0 numTestVec = 0.0 for line in (): numTestVec += 1.0 currLine = ('\n').split('\t') lineArr=[] for i in range(21): (float(currLine[i])) if int(classifyVector(array(lineArr), trainWeights))!=int(currLine[21]): errorCount += 1 () errorRate = (float(errorCount))/numTestVec return errorRate def multiTest(ftr, fte, numT, numIter): errors=[] for k in range(numT): error = colicTest(ftr, fte, numIter) (error) print "There "+str(len(errors))+" test with "+str(numIter)+" interations in all!" for i in range(numT): print "The "+str(i+1)+"th"+" testError is:"+str(errors[i]) print "Average testError: ", float(sum(errors))/len(errors) ''''' data, labels = loadDataSet() weights0 = stoGradAscent0(array(data), labels) weights,errors = gradAscent(data, labels) weights1= stoGradAscent1(array(data), labels, 500) print weights plotBestFit(weights) print weights0 weights00 = [] for w in weights0: ([w]) plotBestFit(mat(weights00)) print weights1 weights11=[] for w in weights1: ([w]) plotBestFit(mat(weights11)) ''' multiTest(r"",r"",10,500)
summarize
The above is the entire content of this article on the classic machine learning algorithm - logistic regression code details, I hope to help you. Interested friends can continue to refer to this site:
Implementing the k-means clustering algorithm in python in detail
Python Programming Implementation of Particle Swarm Algorithm (PSO) Details
Python Programming Implementation of Ant Colony Algorithm Details
If there are deficiencies, welcome to leave a message to point out. Thank you friends for the support of this site!