As mentioned in a previous post, the use of scrapy to crawl images is to collect data in order to categorize the image data.
This post is to use the last crawled image data to do a simple classification process based on the color characteristics of the image.
The realization steps are as follows:
1: Image path add
2: Contrast processing
3: Filtering
4: Data extraction and feature vectorization
5: Picture categorization and processing
6: According to the results of the processing of the picture will be categorized and saved
The amount of code is medium, it could be less, it's just that I have encapsulated each step into a separate class in order to practice the use of classes, of course, there are class inheritance problems in it, and the problems encountered are explained in a previous article. The content may be a bit cumbersome, especially the use of files and paths (you can modify your own), have tried to optimize the code.
The raw data crawled is below:
Straight to the code:
import os import numpy as np import skimage import as plt from skimage import io #Read the picture from skimage import exposure # Call the methods rescale_intensity, equalize_hist to adjust the contrast. from import gaussian #Gauss from skimage import img_as_float #image unit8 type to float from import kmeans,vq,whiten # Clustering algorithm import shutil # Folder contents deleted class Path(object): def __init__(self): = r"D:\PYscrapy\get_lixiaoran\picture" = [] #List of original images = 0 def append(self): #Load the path of each image into the list much = () for i in range(len(much)): repath = (,str()+'.jpg') +=1 (repath) return class Contrast(object): def __init__(self,pathlist): = pathlist = [] # List of images after changing contrast self.path2 = r"D:\PYscrapy\get_lixiaoran\picture2" self.page2 = 0 def balance(self): # Process each image for contrast in two ways 1: Equalize 2: Take extreme values starting from a certain value if (self.path2) == False: (self.path2) # for lis in : # data = (lis) # equalized = exposure.equalize_hist(data) # method one here uses a personal, artificially better equalization to handle contrast # (equalized) for lis in : data = (lis) high_contrast = exposure.rescale_intensity(data,in_range=(20,220)) #Method 2: Take the poles at 20 and 220. (high_contrast) for img in : repath = (self.path2,str(self.page2)+'.jpg') #Save the modified image (repath,img) self.page2 +=1 class Filter(Contrast): def __init__(self,pathlist): super().__init__(pathlist) self.path31 = self.path2 self.path32 = r"D:\PYscrapy\get_lixiaoran\picture3" self.page3 = 0 = [] def filte_r(self): img = (self.path31) #Read the contents of the file if (self.path32) == False: (self.path32) for lis in range(len(img)): # Loop to do Gaussian filtering on each image path = (self.path31,str(lis)+r'.jpg') img = (path) gas = gaussian(img,sigma=3) #multichannel=False Remove color 2D (gas) path_gas = (self.path32,str(self.page3)+r'.jpg') (path_gas,gas) self.page3 +=1 return self.path32 class Vectoring(object): def __init__(self,filter_path): self.path41 = filter_path = [] = [] def vector(self): numbers = (self.path41) # Get folder contents (self.path41) #Switching paths for i in range(len(numbers)): ([]) for j in range(4): [i].append([]) #diff[[number],[img_float],[bin_centers],[hist]] for cnt,number in enumerate(numbers): img_float = img_as_float((number)) #will image ndarry nint8->float hist,bin_centers = (img_float,nbins=10) #Take the pixel values of each interval of the image, separating the intervals. [cnt][0] = number [cnt][1] = img_float [cnt][2] = bin_centers #Add data to diff [cnt][3] = hist for i,j in enumerate(): # Use hist and bin_centers multiplication for dimensionality reduction and vectorization. ([y*[i][3][x] for x,y in enumerate([i][2])]) # Here's something that may need to be understood, is that there are a bit too many parameters involved for i in range(len()): [i].append([i]) # Add the eigenvector calculate to diff as well return #diff[[number],[img_float],[bin_centers],[hist],[calculate]] class Modeling(Vectoring): def __init__(self,filter_path,K): super().__init__(filter_path) = K def model(self): diff = () calculate = [] for i in range(len(diff)): (diff[i][4]) spot = whiten(calculate) # Here the k-means method of scipy is used to classify the images center,_ = kmeans(spot,) # If you're not familiar with k-means for scipy, there's a special section on it up front cluster,_ = vq(spot,center) return diff,cluster # Obtaining predicted values class Predicting(object): def __init__(self,predicted_diff,predicted_cluster,K): = predicted_diff = predicted_cluster self.path42 = r'D:\PYscrapy\get_lixiaoran\picture4' = K def predicted(self): if (self.path42) == True: much = (self.path42) (self.path42) else: (self.path42) (self.path42) for i in range(): #K folders created ('classify{}'.format(i)) for i,j in enumerate(): ('classify{}\\{}'.format(j,[i][0]),[i][1]) # Save images to their corresponding folders based on their classification if __name__=="__main__": (10) # file path add start = Path() pathlist = () #Contrast class second = Contrast(pathlist) () #get the number of images after changing contrast # Gaussian filtration filte = Filter(pathlist) filter_path = filte.filte_r() #Data extraction and vectorization vectoring = Vectoring(filter_path) #K value customization K = 3 #Modeling modeling = Modeling(filter_path,K) predicted_diff,predicted_cluster = () #Predictions predicted = Predicting(predicted_diff,predicted_cluster,K) ()
The documents are listed below:
(K=3) are categorized as follows (picrure4):
The white ones are basically in a class
Black basic class
The blurriness of the sorted images is because, I sorted the processed images, not the original ones.
Actually, the effect is still there when you look closely, it's just that it's really not too obvious, and the content of the image is still a bit complex. The general framework is already there, it is just a matter of optimization, adjusting the optimization, as well as the processing of vector eigenfunctionalization, you can get better results. Or use some better processing, I'm simply using a few common image processing methods here, so the effect is general.
There are a bit too many classes here, from top to bottom is the order of the classes, so it's still not complicated to look at it step by step. If you have any good suggestions you can share them.
Above this python data processing Classification of images according to color is all that I have shared with you, I hope to give you a reference, and I hope you support me more.