goal
In this chapter, learn
- Using kNN to Build Basic OCR Applications
- Using the numeric and alphabetic datasets that come with OpenCV
OCR of handwritten digits
The goal is to build an application that can read handwritten numbers. To do this, sometrain_data
cap (a poem)test_data
The OpenCV git project has a picture of the (in opencv/samples/data/), which contains 5000 handwritten digits (500 per digit), each of which is of size
20x20
The images.
So, the first step is to split this image above into 5000 (500*10) different numbers. For each number, spread it into a row of 400 pixels, this is the training set, i.e. the intensity values of all the pixels. This is the simplest feature set that can be created. Use the first 250 samples of each number as the training settrain_data
The 250 samples were then used as a test settest_data
。
import cv2 import numpy as np img = ('') gray = (img, cv2.COLOR_BGR2GRAY) # Now we split the image to 5000 cells, each 20x20 size cells = [(row, 100) for row in (gray, 50)] # Make it into a numpy array: its size will be (50, 100, 20, 20) x = (cells) # Now we prepare the training data and test data train = x[:,:50].reshape(-1,400).astype(np.float32) # Size = (2500,400) test = x[:,50:100].reshape(-1,400).astype(np.float32) # Size = (2500,400) # Create labels for train and test data k = (10) train_labels = (k, 250)[:, ] test_labels = train_labels.copy() # Initiate kNN, train it on the training data, then test it with the test data with k=1 knn = .KNearest_create() (train, .ROW_SAMPLE, train_labels) ret, result, neighbours, dist = (test, k=5) # Now we check the accuracy of classification # For that, compare the result with test_labels and check which are wrong matches = result==test_labels correct = np.count_nonzero(matches) accuracy = correct * 100.0/ print( accuracy ) # 91.76
As can be seen, the above builds a base digital handwriting OCR application ready to go. The accuracy in this particular example is 91.76%.
Improved accuracy methods:
One option to improve accuracy is to add more data for training, especially erroneous data.
The other is to replace the algorithm with a better one
In this paper, this training data is not found every time the application is started, so it is better to save it so that next time you can read this data directly from the file and start the classification. This can be done with the help of some Numpy functions (e.g.,
,
etc.) to accomplish this operation.
# Save the data ('knn_dight_data.npz', train=train, train_labels=train_labels) # Now load the data whit ('knn_data.npz') as data: print() train = data['train'] train_labels = data['train_labels']
On windows, it takes about 3.82 MB of memory. Since only intensity values (uint8 data) are used as features, if you need to consider memory issues, you can first convert the data tonp.uint8
The file is then saved. In this case, it only takes up 0.98MB . Then when loading, it can be converted back tofloat32
。
train_uint8 = (np.uint8) train_labels_uint8 = train_labels.astype(np.uint8) ('knn_dight_data_int8.npz', train=train_uint8, train_labels=train_labels_uint8)
It can also be used to predict individual numbers
# Take an element of the test set single_data = testData[0].reshape(-1, 400) single_label = labels[0] ret, result, neighbours, dist = (data, k=5) print(result) # [[0]] print(label) # [[0.]] print(result==label) # True
OCR of English letters
Next, perform the same operation for the English alphabet, but with a slightly different set of data and features.OpenCV uses the file( /data/samples/data/ ) instead of the image . If you open it, you will see 20,000 rows, which at first glance may look like garbage numbers.
In fact.In each row, the first column is the letter, which is the label。The next 16 numbers are its different features, which were obtained from the UCI Machine Learning Repository. This can be done inthis pageFind detailed information about these features in the
There are 20,000 samples available, the first 10,000 data will be used as training samples and the remaining 10,000 as test samples. Letters should be changed to ASCII characters as they cannot be used directly.
import numpy as np import cv2 # Load the data and convert the letters to numbers data = ('', dtype='float32', delimiter=',', converters={0: lambda ch: ord(ch)-ord('A')}) # Split the dataset in two, with 10000 samples each for training and test sets train, test = (data, 2) # Split trainData and testData into features and responses responses, trainData = (train, [1]) labels, testData = (test, [1]) # Initiate the kNN, classify, measure accuracy knn = .KNearest_create() (trainData, .ROW_SAMPLE, responses) ret, result, neighbours, dist = (testData, k=5) correct = np.count_nonzero(result==labels) accuracy = correct * 100 / print(accuracy) # 93.06
It gives me an accuracy of 93.06% . Again, to improve accuracy, iteratively add error data to each category.
Additional resources
- /4.5.5/d8/d4…
- /ml/datasets…
- /wiki/Optica…
The above is OpenCV using KNN to complete the details of OCR handwriting recognition, more information about OpenCV KNN to recognize OCR handwriting please pay attention to my other related articles!