python machine learning theory and practice (vi) support vector machines

The previous section basically completed the theory of SVM pushback, looking for the goal of maximizing the interval eventually converted to solve the Lagrange multiplier variable alpha solution problem, the alpha can be solved to solve the SVM weights W, with the weights also have the maximum interval distance, but in fact, we have an assumption in the previous section: is that the training set is linearly divisible, so that the alpha of the solution in the [0, infinite]. But what if the data is not linearly divisible? At this point we have to allow some of the samples to cross the classifier, so that the objective function of the optimization can be unchanged, as long as the introduction of slack variablesThat is, it denotes the cost of misclassifying the sample point, which is equal to 0 when the classification is correct, and when the classification is incorrect, where Tn denotes the true label of the sample -1 or 1. Recall that in the previous section, we fixed the distance from the support vector to the classifier to 1, so the distance between the support vectors of the two classes must be greater than 1 when the classification is wrongmust also be greater than 1, as shown in (Fig. 5) (here both the formula and the icon serial number follow from the previous section).

(Figure V)

Thus with the misclassification cost, we add this misclassification cost to the objective function of the previous section (Eq. IV) to get the form as in (Eq. VIII):

(Formula VIII)

Repeating the steps of the Lagrange multiplier method of the previous section, we obtain (Eq. IX):

(Formula IX)

With an additional Un multiplier, it is of course our job to continue to solve for this objective function and continue to repeat the steps of the previous section to obtain the derivation (Eq. X):

(Formula X)

Again, since alpha is greater than 0 and Un is greater than 0, 0<alpha<C,To make the explanation clearer, we issue the KKT condition of (Eq. IX) as well (the third type of optimization problem in the previous section).Note that Un is greater than or equal to 0：

Derived so far, the form of the optimization function is basically unchanged, just one more misclassification value, but one more condition, 0<alpha<C, C is a constant, its role is to control the maximization of the spacing in the case of allowing misclassification, it is too large will lead to overfitting, too small will lead to underfitting. The next steps seem to be everyone should know, more than one C constant constraints, and then continue to use the SMO algorithm to optimize the solution of quadratic programming, but I want to continue to the kernel function also once said, if the sample is linearly indistinguishable, the introduction of the kernel function, after the introduction of the samples mapped to the high-dimensional space can be linearly divisible, such as (Figure 6) shown in linearly indistinguishable samples:

(Figure VI)

In (Figure 6), the existing samples are obviously linearly indistinguishable, but join us to use the existing samples between X to make some different operations, as shown on the right side of (Figure 6), and let f as a new sample (or new features) is not better? Now the X has been projected to higher dimensions, but f we do not know, at this time the kernel function should be on the field, as an example, the Gaussian kernel function, in the (Figure 7) in the selection of a few samples of points as a reference point, to use the kernel function to compute f, as shown in (Figure 7):

(Figure VII)

So that there is f, and the kernel function at this point is equivalent to the sample of X and the reference point of a metric, do the weight attenuation, the formation of new features dependent on x f, put f in the SVM said above to continue to solve for alpha, and then derive the weights on the line, the principle is very simple, right, in order to appear to be a little bit of an academic flavor, the kernel function is also done to look like to join the objective function, such as the (Equation XI) shown:

(Formula XI)

Where K(Xn,Xm) is the kernel function, and the above objective function than there is not much change, with SMO optimization to solve on the line, the code is as follows:

def smoPK(dataMatIn, classLabels, C, toler, maxIter): #full Platt SMO 
 oS = optStruct(mat(dataMatIn),mat(classLabels).transpose(),C,toler) 
 iter = 0 
 entireSet = True; alphaPairsChanged = 0 
 while (iter < maxIter) and ((alphaPairsChanged > 0) or (entireSet)): 
  alphaPairsChanged = 0 
  if entireSet: #go over all 
   for i in range():   
    alphaPairsChanged += innerL(i,oS) 
    print "fullSet, iter: %d i:%d, pairs changed %d" % (iter,i,alphaPairsChanged) 
   iter += 1 
  else:#go over non-bound (railed) alphas 
   nonBoundIs = nonzero(( > 0) * ( < C))[0] 
   for i in nonBoundIs: 
    alphaPairsChanged += innerL(i,oS) 
    print "non-bound, iter: %d i:%d, pairs changed %d" % (iter,i,alphaPairsChanged) 
   iter += 1 
  if entireSet: entireSet = False #toggle entire set loop 
  elif (alphaPairsChanged == 0): entireSet = True 
  print "iteration number: %d" % iter 
 return ,

A small example of handwriting recognition is demonstrated below.

(1) Data collection: provision of text files

(2) Preparing data: constructing vectors based on binary images

(3) Analyzing data: visual inspection of image vectors

(4) Training algorithm: the SMO algorithm is run using two different kernel functions with different settings for the radial basis functions.

(5) Test the algorithm: write a function to test different kernel functions and calculate the error rate

(6) Using algorithms: a complete application of image recognition also requires some image processing just, this demo is omitted.

The full code is below:

from numpy import * 
from time import sleep 
 
def loadDataSet(fileName): 
 dataMat = []; labelMat = [] 
 fr = open(fileName) 
 for line in (): 
  lineArr = ().split('\t') 
  ([float(lineArr[0]), float(lineArr[1])]) 
  (float(lineArr[2])) 
 return dataMat,labelMat 
 
def selectJrand(i,m): 
 j=i #we want to select any J not equal to i 
 while (j==i): 
  j = int((0,m)) 
 return j 
 
def clipAlpha(aj,H,L): 
 if aj > H: 
  aj = H 
 if L > aj: 
  aj = L 
 return aj 
 
def smoSimple(dataMatIn, classLabels, C, toler, maxIter): 
 dataMatrix = mat(dataMatIn); labelMat = mat(classLabels).transpose() 
 b = 0; m,n = shape(dataMatrix) 
 alphas = mat(zeros((m,1))) 
 iter = 0 
 while (iter < maxIter): 
  alphaPairsChanged = 0 
  for i in range(m): 
   fXi = float(multiply(alphas,labelMat).T*(dataMatrix*dataMatrix[i,:].T)) + b 
   Ei = fXi - float(labelMat[i])#if checks if an example violates KKT conditions 
   if ((labelMat[i]*Ei < -toler) and (alphas[i] < C)) or ((labelMat[i]*Ei > toler) and (alphas[i] > 0)): 
    j = selectJrand(i,m) 
    fXj = float(multiply(alphas,labelMat).T*(dataMatrix*dataMatrix[j,:].T)) + b 
    Ej = fXj - float(labelMat[j]) 
    alphaIold = alphas[i].copy(); alphaJold = alphas[j].copy(); 
    if (labelMat[i] != labelMat[j]): 
     L = max(0, alphas[j] - alphas[i]) 
     H = min(C, C + alphas[j] - alphas[i]) 
    else: 
     L = max(0, alphas[j] + alphas[i] - C) 
     H = min(C, alphas[j] + alphas[i]) 
    if L==H: print "L==H"; continue 
    eta = 2.0 * dataMatrix[i,:]*dataMatrix[j,:].T - dataMatrix[i,:]*dataMatrix[i,:].T - dataMatrix[j,:]*dataMatrix[j,:].T 
    if eta >= 0: print "eta>=0"; continue 
    alphas[j] -= labelMat[j]*(Ei - Ej)/eta 
    alphas[j] = clipAlpha(alphas[j],H,L) 
    if (abs(alphas[j] - alphaJold) < 0.00001): print "j not moving enough"; continue 
    alphas[i] += labelMat[j]*labelMat[i]*(alphaJold - alphas[j])#update i by the same amount as j 
                  #the update is in the oppostie direction 
    b1 = b - Ei- labelMat[i]*(alphas[i]-alphaIold)*dataMatrix[i,:]*dataMatrix[i,:].T - labelMat[j]*(alphas[j]-alphaJold)*dataMatrix[i,:]*dataMatrix[j,:].T 
    b2 = b - Ej- labelMat[i]*(alphas[i]-alphaIold)*dataMatrix[i,:]*dataMatrix[j,:].T - labelMat[j]*(alphas[j]-alphaJold)*dataMatrix[j,:]*dataMatrix[j,:].T 
    if (0 < alphas[i]) and (C > alphas[i]): b = b1 
    elif (0 < alphas[j]) and (C > alphas[j]): b = b2 
    else: b = (b1 + b2)/2.0 
    alphaPairsChanged += 1 
    print "iter: %d i:%d, pairs changed %d" % (iter,i,alphaPairsChanged) 
  if (alphaPairsChanged == 0): iter += 1 
  else: iter = 0 
  print "iteration number: %d" % iter 
 return b,alphas 
 
def kernelTrans(X, A, kTup): #calc the kernel or transform data to a higher dimensional space 
 m,n = shape(X) 
 K = mat(zeros((m,1))) 
 if kTup[0]=='lin': K = X *  #linear kernel 
 elif kTup[0]=='rbf': 
  for j in range(m): 
   deltaRow = X[j,:] - A 
   K[j] = deltaRow* 
  K = exp(K/(-1*kTup[1]**2)) #divide in NumPy is element-wise not matrix like Matlab 
 else: raise NameError('Houston We Have a Problem -- \ 
 That Kernel is not recognized') 
 return K 
 
class optStruct: 
 def __init__(self,dataMatIn, classLabels, C, toler, kTup): # Initialize the structure with the parameters 
   = dataMatIn 
   = classLabels 
   = C 
   = toler 
   = shape(dataMatIn)[0] 
   = mat(zeros((,1))) 
   = 0 
   = mat(zeros((,2))) #first column is valid flag 
   = mat(zeros((,))) 
  for i in range(): 
   [:,i] = kernelTrans(, [i,:], kTup) 
   
def calcEk(oS, k): 
 fXk = float(multiply(,).T*[:,k] + ) 
 Ek = fXk - float([k]) 
 return Ek 
   
def selectJ(i, oS, Ei):   #this is the second choice -heurstic, and calcs Ej 
 maxK = -1; maxDeltaE = 0; Ej = 0 
 [i] = [1,Ei] #set valid #choose the alpha that gives the maximum delta E 
 validEcacheList = nonzero([:,0].A)[0] 
 if (len(validEcacheList)) > 1: 
  for k in validEcacheList: #loop through valid Ecache values and find the one that maximizes delta E 
   if k == i: continue #don't calc for i, waste of time 
   Ek = calcEk(oS, k) 
   deltaE = abs(Ei - Ek) 
   if (deltaE > maxDeltaE): 
    maxK = k; maxDeltaE = deltaE; Ej = Ek 
  return maxK, Ej 
 else: #in this case (first time around) we don't have any valid eCache values 
  j = selectJrand(i, ) 
  Ej = calcEk(oS, j) 
 return j, Ej 
 
def updateEk(oS, k):#after any alpha has changed update the new value in the cache 
 Ek = calcEk(oS, k) 
 [k] = [1,Ek] 
   
def innerL(i, oS): 
 Ei = calcEk(oS, i) 
 if (([i]*Ei < -) and ([i] < )) or (([i]*Ei > ) and ([i] > 0)): 
  j,Ej = selectJ(i, oS, Ei) #this has been changed from selectJrand 
  alphaIold = [i].copy(); alphaJold = [j].copy(); 
  if ([i] != [j]): 
   L = max(0, [j] - [i]) 
   H = min(,  + [j] - [i]) 
  else: 
   L = max(0, [j] + [i] - ) 
   H = min(, [j] + [i]) 
  if L==H: print "L==H"; return 0 
  eta = 2.0 * [i,j] - [i,i] - [j,j] #changed for kernel 
  if eta >= 0: print "eta>=0"; return 0 
  [j] -= [j]*(Ei - Ej)/eta 
  [j] = clipAlpha([j],H,L) 
  updateEk(oS, j) #added this for the Ecache 
  if (abs([j] - alphaJold) < 0.00001): print "j not moving enough"; return 0 
  [i] += [j]*[i]*(alphaJold - [j])#update i by the same amount as j 
  updateEk(oS, i) #added this for the Ecache     #the update is in the oppostie direction 
  b1 =  - Ei- [i]*([i]-alphaIold)*[i,i] - [j]*([j]-alphaJold)*[i,j] 
  b2 =  - Ej- [i]*([i]-alphaIold)*[i,j]- [j]*([j]-alphaJold)*[j,j] 
  if (0 < [i]) and ( > [i]):  = b1 
  elif (0 < [j]) and ( > [j]):  = b2 
  else:  = (b1 + b2)/2.0 
  return 1 
 else: return 0 
 
def smoP(dataMatIn, classLabels, C, toler, maxIter,kTup=('lin', 0)): #full Platt SMO 
 oS = optStruct(mat(dataMatIn),mat(classLabels).transpose(),C,toler, kTup) 
 iter = 0 
 entireSet = True; alphaPairsChanged = 0 
 while (iter < maxIter) and ((alphaPairsChanged > 0) or (entireSet)): 
  alphaPairsChanged = 0 
  if entireSet: #go over all 
   for i in range():   
    alphaPairsChanged += innerL(i,oS) 
    print "fullSet, iter: %d i:%d, pairs changed %d" % (iter,i,alphaPairsChanged) 
   iter += 1 
  else:#go over non-bound (railed) alphas 
   nonBoundIs = nonzero(( > 0) * ( < C))[0] 
   for i in nonBoundIs: 
    alphaPairsChanged += innerL(i,oS) 
    print "non-bound, iter: %d i:%d, pairs changed %d" % (iter,i,alphaPairsChanged) 
   iter += 1 
  if entireSet: entireSet = False #toggle entire set loop 
  elif (alphaPairsChanged == 0): entireSet = True 
  print "iteration number: %d" % iter 
 return , 
 
def calcWs(alphas,dataArr,classLabels): 
 X = mat(dataArr); labelMat = mat(classLabels).transpose() 
 m,n = shape(X) 
 w = zeros((n,1)) 
 for i in range(m): 
  w += multiply(alphas[i]*labelMat[i],X[i,:].T) 
 return w 
 
def testRbf(k1=1.3): 
 dataArr,labelArr = loadDataSet('') 
 b,alphas = smoP(dataArr, labelArr, 200, 0.0001, 10000, ('rbf', k1)) #C=200 important 
 datMat=mat(dataArr); labelMat = mat(labelArr).transpose() 
 svInd=nonzero(>0)[0] 
 sVs=datMat[svInd] #get matrix of only support vectors 
 labelSV = labelMat[svInd]; 
 print "there are %d Support Vectors" % shape(sVs)[0] 
 m,n = shape(datMat) 
 errorCount = 0 
 for i in range(m): 
  kernelEval = kernelTrans(sVs,datMat[i,:],('rbf', k1)) 
  predict= * multiply(labelSV,alphas[svInd]) + b 
  if sign(predict)!=sign(labelArr[i]): errorCount += 1 
 print "the training error rate is: %f" % (float(errorCount)/m) 
 dataArr,labelArr = loadDataSet('') 
 errorCount = 0 
 datMat=mat(dataArr); labelMat = mat(labelArr).transpose() 
 m,n = shape(datMat) 
 for i in range(m): 
  kernelEval = kernelTrans(sVs,datMat[i,:],('rbf', k1)) 
  predict= * multiply(labelSV,alphas[svInd]) + b 
  if sign(predict)!=sign(labelArr[i]): errorCount += 1  
 print "the test error rate is: %f" % (float(errorCount)/m)  
  
def img2vector(filename): 
 returnVect = zeros((1,1024)) 
 fr = open(filename) 
 for i in range(32): 
  lineStr = () 
  for j in range(32): 
   returnVect[0,32*i+j] = int(lineStr[j]) 
 return returnVect 
 
def loadImages(dirName): 
 from os import listdir 
 hwLabels = [] 
 trainingFileList = listdir(dirName)   #load the training set 
 m = len(trainingFileList) 
 trainingMat = zeros((m,1024)) 
 for i in range(m): 
  fileNameStr = trainingFileList[i] 
  fileStr = ('.')[0]  #take off .txt 
  classNumStr = int(('_')[0]) 
  if classNumStr == 9: (-1) 
  else: (1) 
  trainingMat[i,:] = img2vector('%s/%s' % (dirName, fileNameStr)) 
 return trainingMat, hwLabels  
 
def testDigits(kTup=('rbf', 10)): 
 dataArr,labelArr = loadImages('trainingDigits') 
 b,alphas = smoP(dataArr, labelArr, 200, 0.0001, 10000, kTup) 
 datMat=mat(dataArr); labelMat = mat(labelArr).transpose() 
 svInd=nonzero(>0)[0] 
 sVs=datMat[svInd] 
 labelSV = labelMat[svInd]; 
 print "there are %d Support Vectors" % shape(sVs)[0] 
 m,n = shape(datMat) 
 errorCount = 0 
 for i in range(m): 
  kernelEval = kernelTrans(sVs,datMat[i,:],kTup) 
  predict= * multiply(labelSV,alphas[svInd]) + b 
  if sign(predict)!=sign(labelArr[i]): errorCount += 1 
 print "the training error rate is: %f" % (float(errorCount)/m) 
 dataArr,labelArr = loadImages('testDigits') 
 errorCount = 0 
 datMat=mat(dataArr); labelMat = mat(labelArr).transpose() 
 m,n = shape(datMat) 
 for i in range(m): 
  kernelEval = kernelTrans(sVs,datMat[i,:],kTup) 
  predict= * multiply(labelSV,alphas[svInd]) + b 
  if sign(predict)!=sign(labelArr[i]): errorCount += 1  
 print "the test error rate is: %f" % (float(errorCount)/m) 
 
 
'''''#######******************************** 
Non-Kernel VErsions below 
'''#######******************************** 
 
class optStructK: 
 def __init__(self,dataMatIn, classLabels, C, toler): # Initialize the structure with the parameters 
   = dataMatIn 
   = classLabels 
   = C 
   = toler 
   = shape(dataMatIn)[0] 
   = mat(zeros((,1))) 
   = 0 
   = mat(zeros((,2))) #first column is valid flag 
   
def calcEkK(oS, k): 
 fXk = float(multiply(,).T*(*[k,:].T)) +  
 Ek = fXk - float([k]) 
 return Ek 
   
def selectJK(i, oS, Ei):   #this is the second choice -heurstic, and calcs Ej 
 maxK = -1; maxDeltaE = 0; Ej = 0 
 [i] = [1,Ei] #set valid #choose the alpha that gives the maximum delta E 
 validEcacheList = nonzero([:,0].A)[0] 
 if (len(validEcacheList)) > 1: 
  for k in validEcacheList: #loop through valid Ecache values and find the one that maximizes delta E 
   if k == i: continue #don't calc for i, waste of time 
   Ek = calcEk(oS, k) 
   deltaE = abs(Ei - Ek) 
   if (deltaE > maxDeltaE): 
    maxK = k; maxDeltaE = deltaE; Ej = Ek 
  return maxK, Ej 
 else: #in this case (first time around) we don't have any valid eCache values 
  j = selectJrand(i, ) 
  Ej = calcEk(oS, j) 
 return j, Ej 
 
def updateEkK(oS, k):#after any alpha has changed update the new value in the cache 
 Ek = calcEk(oS, k) 
 [k] = [1,Ek] 
   
def innerLK(i, oS): 
 Ei = calcEk(oS, i) 
 if (([i]*Ei < -) and ([i] < )) or (([i]*Ei > ) and ([i] > 0)): 
  j,Ej = selectJ(i, oS, Ei) #this has been changed from selectJrand 
  alphaIold = [i].copy(); alphaJold = [j].copy(); 
  if ([i] != [j]): 
   L = max(0, [j] - [i]) 
   H = min(,  + [j] - [i]) 
  else: 
   L = max(0, [j] + [i] - ) 
   H = min(, [j] + [i]) 
  if L==H: print "L==H"; return 0 
  eta = 2.0 * [i,:]*[j,:].T - [i,:]*[i,:].T - [j,:]*[j,:].T 
  if eta >= 0: print "eta>=0"; return 0 
  [j] -= [j]*(Ei - Ej)/eta 
  [j] = clipAlpha([j],H,L) 
  updateEk(oS, j) #added this for the Ecache 
  if (abs([j] - alphaJold) < 0.00001): print "j not moving enough"; return 0 
  [i] += [j]*[i]*(alphaJold - [j])#update i by the same amount as j 
  updateEk(oS, i) #added this for the Ecache     #the update is in the oppostie direction 
  b1 =  - Ei- [i]*([i]-alphaIold)*[i,:]*[i,:].T - [j]*([j]-alphaJold)*[i,:]*[j,:].T 
  b2 =  - Ej- [i]*([i]-alphaIold)*[i,:]*[j,:].T - [j]*([j]-alphaJold)*[j,:]*[j,:].T 
  if (0 < [i]) and ( > [i]):  = b1 
  elif (0 < [j]) and ( > [j]):  = b2 
  else:  = (b1 + b2)/2.0 
  return 1 
 else: return 0 
 
def smoPK(dataMatIn, classLabels, C, toler, maxIter): #full Platt SMO 
 oS = optStruct(mat(dataMatIn),mat(classLabels).transpose(),C,toler) 
 iter = 0 
 entireSet = True; alphaPairsChanged = 0 
 while (iter < maxIter) and ((alphaPairsChanged > 0) or (entireSet)): 
  alphaPairsChanged = 0 
  if entireSet: #go over all 
   for i in range():   
    alphaPairsChanged += innerL(i,oS) 
    print "fullSet, iter: %d i:%d, pairs changed %d" % (iter,i,alphaPairsChanged) 
   iter += 1 
  else:#go over non-bound (railed) alphas 
   nonBoundIs = nonzero(( > 0) * ( < C))[0] 
   for i in nonBoundIs: 
    alphaPairsChanged += innerL(i,oS) 
    print "non-bound, iter: %d i:%d, pairs changed %d" % (iter,i,alphaPairsChanged) 
   iter += 1 
  if entireSet: entireSet = False #toggle entire set loop 
  elif (alphaPairsChanged == 0): entireSet = True 
  print "iteration number: %d" % iter 
 return ,

The result of the run is shown in (Figure VIII):

(Figure VIII)

The above code can be read if you are interested, and if you use it, it is recommended to use libsvm.

References:

[1]machine learning in action. PeterHarrington

[2] pattern recognition and machinelearning. Christopher M. Bishop

[3]machine Ng

This is the whole content of this article.