Machine learning classic algorithm - logistic regression code details

I. Brief summary of the algorithm

We would like to have such a function: accepting inputs and then predicting the categories, so that it is used for classification. Here, the sigmoid function in math is used. The specific expression and function image of the sigmoid function are as follows:

It can be seen more clearly that when the input x is less than 0, the function value <0.5 predicts the classification as 0; when the input x is greater than 0, the function value >0.5 predicts the classification as 1.

1.1 Representation of the prediction function

1.2 Solving for the parameters

II. Code Implementation

function sigmoid calculates the corresponding function value; gradAscent implements batch-gradient ascent, meaning that all datasets are taken into account in each iteration; while in stoGradAscent0, the examples in the dataset are all compared there, and the complexity is greatly reduced; stoGradAscent1 is an improvement to stochastic gradient ascent improvement, the specific changes are that the frequency of each change in alpha is variable, and the examples used for each parameter update are randomly selected.

from numpy import * 
import  as plt 
def loadDataSet(): 
  dataMat = [] 
  labelMat = [] 
  fr = open('') 
  for line in (): 
    lineArr = ('\n').split('\t') 
    ([1.0, float(lineArr[0]), float(lineArr[1])]) 
    (int(lineArr[2])) 
  () 
  return dataMat, labelMat 
def sigmoid(inX): 
  return 1.0/(1+exp(-inX)) 
def gradAscent(dataMatIn, classLabels): 
  dataMatrix = mat(dataMatIn) 
  labelMat = mat(classLabels).transpose() 
  m,n=shape(dataMatrix) 
  alpha = 0.001 
  maxCycles = 500 
  weights = ones((n,1)) 
  errors=[] 
  for k in range(maxCycles): 
    h = sigmoid(dataMatrix*weights) 
    error = labelMat - h 
    (sum(error)) 
    weights = weights + alpha*()*error 
  return weights, errors 
def stoGradAscent0(dataMatIn, classLabels): 
  m,n=shape(dataMatIn) 
  alpha = 0.01 
  weights = ones(n) 
  for i in range(m): 
    h = sigmoid(sum(dataMatIn[i]*weights)) 
    error = classLabels[i] - h  
    weights = weights + alpha*error*dataMatIn[i] 
  return weights 
def stoGradAscent1(dataMatrix, classLabels, numIter = 150): 
  m,n=shape(dataMatrix) 
  weights = ones(n) 
  for j in range(numIter): 
    dataIndex=range(m) 
    for i in range(m): 
      alpha= 4/(1.0+j+i)+0.01 
      randIndex = int((0,len(dataIndex))) 
      h = sigmoid(sum(dataMatrix[randIndex]*weights)) 
      error = classLabels[randIndex]-h 
      weights=weights+alpha*error*dataMatrix[randIndex] 
      del(dataIndex[randIndex]) 
    return weights 
def plotError(errs): 
  k = len(errs) 
  x = range(1,k+1) 
  (x,errs,'g--') 
  () 
def plotBestFit(wei): 
  weights = () 
  dataMat, labelMat = loadDataSet() 
  dataArr = array(dataMat) 
  n = shape(dataArr)[0] 
  xcord1=[] 
  ycord1=[] 
  xcord2=[] 
  ycord2=[] 
  for i in range(n):  
    if int(labelMat[i])==1: 
      (dataArr[i,1]) 
      (dataArr[i,2]) 
    else: 
      (dataArr[i,1]) 
      (dataArr[i,2]) 
  fig = () 
  ax = fig.add_subplot(111) 
  (xcord1, ycord1, s=30, c='red', marker='s') 
  (xcord2, ycord2, s=30, c='green') 
  x = arange(-3.0,3.0,0.1) 
  y=(-weights[0]-weights[1]*x)/weights[2] 
  (x,y) 
  ('x1') 
  ('x2') 
  () 
def classifyVector(inX, weights): 
  prob = sigmoid(sum(inX*weights)) 
  if prob>0.5: 
    return 1.0 
  else: 
    return 0 
def colicTest(ftr, fte, numIter): 
  frTrain = open(ftr) 
  frTest = open(fte) 
  trainingSet=[] 
  trainingLabels=[] 
  for line in (): 
    currLine = ('\n').split('\t') 
    lineArr=[] 
    for i in range(21): 
      (float(currLine[i])) 
    (lineArr) 
    (float(currLine[21])) 
  () 
  trainWeights = stoGradAscent1(array(trainingSet),trainingLabels, numIter) 
  errorCount = 0 
  numTestVec = 0.0 
  for line in (): 
    numTestVec += 1.0 
    currLine = ('\n').split('\t') 
    lineArr=[] 
    for i in range(21): 
      (float(currLine[i])) 
    if int(classifyVector(array(lineArr), trainWeights))!=int(currLine[21]): 
      errorCount += 1 
  () 
  errorRate = (float(errorCount))/numTestVec 
  return errorRate 
def multiTest(ftr, fte, numT, numIter): 
  errors=[] 
  for k in range(numT): 
    error = colicTest(ftr, fte, numIter) 
    (error) 
  print "There "+str(len(errors))+" test with "+str(numIter)+" interations in all!" 
  for i in range(numT): 
    print "The "+str(i+1)+"th"+" testError is:"+str(errors[i]) 
  print "Average testError: ", float(sum(errors))/len(errors) 
''''' 
data, labels = loadDataSet() 
weights0 = stoGradAscent0(array(data), labels) 
weights,errors = gradAscent(data, labels) 
weights1= stoGradAscent1(array(data), labels, 500) 
print weights 
plotBestFit(weights) 
print weights0 
weights00 = [] 
for w in weights0: 
  ([w]) 
plotBestFit(mat(weights00)) 
print weights1 
weights11=[] 
for w in weights1: 
  ([w]) 
plotBestFit(mat(weights11)) 
''' 
multiTest(r"",r"",10,500)

summarize

The above is the entire content of this article on the classic machine learning algorithm - logistic regression code details, I hope to help you. Interested friends can continue to refer to this site:

Implementing the k-means clustering algorithm in python in detail

Python Programming Implementation of Particle Swarm Algorithm (PSO) Details

Python Programming Implementation of Ant Colony Algorithm Details

If there are deficiencies, welcome to leave a message to point out. Thank you friends for the support of this site!