I. tesseract-ocr download and installation
1、Download
Here are the usual URLs about Tesseract
Download Address:/tesseract/
Official website:/tesseract-ocr/tesseract
Official Documentation:/tesseract-ocr/tessdoc
Language pack address:/tesseract-ocr/tessdata
2、Install tesseract-ocr
(1) Selection of language
(2) Start installation
(3) Consent to License
(4) Selecting users for installation
(5) Select the language pack to be installed.
After that, the language pack will be automatically downloaded from the server during the installation process. (It is not recommended to check the "Download language packs" box here, because the download speed is too slow. This tutorial will explain how to expand language packs later, but if you have already gone over the wall, you can ignore this advice.)
Default is fine.
(6) Installation position
(7) Start Installation
(8) Installation completed
3. Installation of language packs
(1) Download and install
/tesseract-ocr/tessdata
The project is larger and can be downloaded in Simplified Chinese on demand:
Store the downloaded file in this directory:D:\Program Files\Tesseract-OCR\tessdata
Note: If you can't access the Internet scientifically, you can download the Simplified Chinese language pack from here:https:///softs/
(2) Testing
Go to the Tesseract OCR installation directory:
# View Version PS D:\Program Files\Tesseract-OCR> .\ -v tesseract v5.3.0.20221214 leptonica-1.78.0 libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0 Found AVX2 Found AVX Found FMA Found SSE4.1 Found libarchive 3.5.0 zlib/1.2.11 liblzma/5.2.3 bz2lib/1.0.6 liblz4/1.7.5 libzstd/1.4.5 Found libcurl/7.77.0-DEV Schannel zlib/1.2.11 zstd/1.4.5 libidn2/2.0.4 nghttp2/1.31.0 # View installed language packs PS D:\Program Files\Tesseract-OCR> .\ --list-langs List of available languages in "D:\Program Files\Tesseract-OCR/tessdata/" (4): chi_sim chi_sim_vert eng osd
Second, python screenshot recognition text
1、Install the necessary packages
pip install pyautogui pip install pytesseract
2、Screenshot to recognize the text
import pyautogui import pytesseract # Set the installation path for Tesseract (if it is not in the default system path) .tesseract_cmd = 'D:/Program Files/Tesseract-OCR/' # Take screenshots screenshot = () # Define the range of the area (top left x-coordinate, top left y-coordinate, bottom right x-coordinate, bottom right y-coordinate) region = (100, 100, 300, 200) # Create a new image object from a screenshot using the specified area custom_screenshot = (region) # Convert image objects to grayscale to help improve text recognition accuracy custom_screenshot = custom_screenshot.convert('L') # Text recognition with pytesseract text = pytesseract.image_to_string(custom_screenshot) # Print recognized text print(text)
3. Accuracy
English accuracy is OK, Chinese accuracy. It's hard to say. It should be possible to improve accuracy through training.
bibliography
/weixin_51571728/article/details/120384909
to this article on python + Tesseract OCR to achieve screenshots to recognize the text of the article is introduced to this, more related python Tesseract OCR to recognize the text of the contents of the search for my previous posts or continue to browse the following related articles I hope that you will support me in the future more!