How to implement a high-precision free OCR tool based on Python code

Recently Github open source a Python-based development, called Textshot screenshot tool, just open source less than half a month already 500 + Star.

I've taken the time to look at Textshot's source code over the past couple of days, and it's indeed a worthy introduction to the program.

Textshot has a distinct advantage over the complex engineering and poor results of most OCR tools.

Project simplicity
Abundance of technical points

Project simplicity

Textshot's entire project consists of only 1 Python file and 139 lines of code, with no complex third-party libraries to apply or too many back-end algorithm calls.

Abundance of technical points

The Textshot project is only 139 lines of code, but it involves the application of knowledge from several aspects of Python.

UI Development
Screenshot Tool Development
back-end engine call

With this short project, you will not only learn how to implement a user interface using PyQt5, but also learn how to develop a screenshot tool of your own using pyscreenshot. In addition, also be able to learn the back-end tesseract calls.

In other words, this short 139 lines of code encompasses the entire process from front-end to back-end, and involves the interface between two tools, screenshot and OCR. Therefore, although Textshot is not a big project, it is a very complete and worth learning project.

In this article to analyze the source code of this project, teach you step by step to achieve self-use and permanently free screenshot & OCR tool!

tesseract

At present, there are countless OCR tools, but most of them are in the same back-end algorithm above the different packages only. And really in the OCR core to do a better job, it is worth writing a book, then it must be tesseract is none other than!

tesseract has been developed by HP Labs since 1985, and in 1995 it was recognized as one of the 3 most accurate OCR tools. Since then, tesseract has been open-sourced, and after Google's continuous optimization and upgrading of it, it has now become a benchmark tool in OCR. Many open source or paid OCR tools, are directly called tesseract or slightly optimized.

Today's introduction of Textshot is a direct call tesseract back-end engine for OCR recognition. Therefore, Textshot is just the realization of a screenshot tool, playing a serial role in the front and back end, in the OCR recognition algorithms and did not do any work.

tesseract installation

Since Textshot's OCR recognition requires a call to the tesseract back-end engine, you first need to install tesseract.

The Windows version can be installed by visiting the download link directly[1].

Mac can be installed using Homebrew.

brew install tesseract　

Textshot

Textshot is an OCR tool for recognizing text in screenshots, so it involves 2 main environments, the

screenshot (computing)

OCR recognition

Textshot first obtains the image to be text-recognized by taking a screenshot, and then performs OCR text recognition on this image and outputs the recognition result.

As already described, Textshot's OCR recognition stage calls tesseract, so it only requires 1 line of code to complete.

As a result, Textshot's work has centered around the implementation aspects of the front-end windows and screenshot tools.

Screenshot Tools

Screenshot tool is a tool we often use, how to realize a screenshot tool?

Many people will think it is very complicated, in fact, Python has a lot of libraries or functions that can realize screenshots, for example, pyscreenshot or the ImageGrab function in pillow, which is called as follows.

shot = (bbox=(x1, y1, x2, y2))

In other words, we only need to pass the start and end coordinates of the mouse box selection to the grab method to realize the screenshot function.

So, now the question translates into how to get the start and end points of the mouse box selections?

Textshot implements some methods to get the start and end point of the box selection process by calling PyQt5 and inheriting QWidget.

Textshot inherits and overrides QWidget methods mainly as follows.

keyPressEvent(self, event): keyboard response function
paintEvent(self, event): UI paint function
mousePressEvent(self, event): mouse click event
mouseMoveEvent(self, event): mouse move event
mouseReleaseEvent(self, event): mouse release event

As you can see, the above rewritten method as well encompasses the various actions involved in the screenshot process.

mouse click
Drag, Draw Screenshot Frame
Release the mouse.

class Snipper():
  def __init__(self, parent=None, flags=()):
    super().__init__(parent=parent, flags=flags)
 
    ("TextShot")
    (
       |  | 
    )
 
    self.is_macos = ("darwin")
    if self.is_macos:
      (() | )
    else:
      (() | )
 
    ("background-color: black")
    (0.5)
 
    (())
 
    ,  = (), ()
 
  def keyPressEvent(self, event):
    if () == Qt.Key_Escape:
      ()
 
    return super().keyPressEvent(event)
 
  def paintEvent(self, event):
    if  == :
      return super().paintEvent(event)
 
    painter = (self)
    (((255, 255, 255), 3))
    ((255, 255, 255, 100))
 
    if self.is_macos:
      start, end = ((), ())
    else:
      start, end = , 
 
    ((start, end))
    return super().paintEvent(event)
 
  def mousePressEvent(self, event):
     =  = ()
    ()
    return super().mousePressEvent(event)
 
  def mouseMoveEvent(self, event):
     = ()
    ()
    return super().mousePressEvent(event)
 
  def mouseReleaseEvent(self, event):
    if  == :
      return super().mouseReleaseEvent(event)
 
    x1, x2 = sorted(((), ()))
    y1, y2 = sorted(((), ()))

Then launch the screenshot screen that

(Qt.AA_DisableHighDpiScaling)
app = ()
window = ()
snipper = Snipper(window)
()

The user drags and boxes the window, and will get the coordinates of the starting point and end point of the window. At this time, you can call the following statement to take a screenshot and get the text image that needs to be recognized by OCR.

shot = (bbox=(x1, y1, x2, y2))

OCR Text Recognition

By intercepting the text image shot, the next step is to feed the image content to the back-end tesseract engine so that it can convert the image into a string

result = pytesseract.image_to_string(img, timeout=2, lang=([1] if len() > 1 else None))

By this point, a highly accurate and permanently free OCR tool has been realized.

Looking back at Textshot's project, we see that images within the screenshot coordinates, OCR recognition only requires 2 lines of code, and most of it is developed around getting the window start and end coordinates. In other words, the Textshot project has not made any changes to the core OCR part, just some clever work on the product packaging.

This is the whole content of this article.