SoFunction
Updated on 2024-11-10

Python image text recognition implementation of PaddleOCR

preamble

What is OCR?

Optical Character Recognition (OCR), is the process of analyzing and recognizing image files of textual data to obtain text and layout information. In short, detects the text data in the image and recognizes the content of the text.

So what are the application scenarios?

In fact, there are ocr shadows everywhere in our daily life, such as ID card recognition to enter information during an epidemic, vehicle license plate number recognition, automatic driving and so on. Our life, machine learning has been more and more play an important role, and is no longer something mysterious.

What is the technical route to OCR?

The operation of ocr is shown in the following figure, Input-> Image Preprocessing-> Text Detection-> Text Recognition-> Output.

The main purpose of this article is to introduce a blogger to use a better OCR open source project, shared here - PaddleOCR.

Project Github address.PaddleOCR Address

I'll go through the process of validating the use of the program as it was when I first approached it.

Project use

First clone the project from github and analyze it slowly.

Project structure

First let's look at the construction of the program.

Found that the project has a Chinese introduction description, which is very convenient, click to follow the official instructions to start operating.

Environment deployment

Tap ,, you can see from the documentation tutorial that the first step is to teach you how to install the environment.

Since there's so much to cover, I'll make a generalization to make it easier for you to get started straight away.

1、Install Anaconda, construct a virtual environment

Here you can refer to my other post which is very detailed:Python Machine Learning Chapter 1 Environment Configuration Diagram Flow

The official virtual environment given is python 3.8, so let's construct one as well and open Anaconda Prompt.

Enter the command:

conda create -n paddle_env python=3.8

Activate the environment:

conda activate paddle_env

2、Dependency package download

paddlepaddle installation

pip install paddlepaddle -i /pypi/simple

layoutparser installation

pip3 install -U /whl/layoutparser-0.0.

Shapely installation, this needs to be downloaded, download address:Shapely download address

I chose this one.

Installation commands:

pip install Shapely-1.8.0-cp38-cp38-win_amd64.whl

paddleocr installation

pip install paddleocr -i /pypi/simple

Okay, a little bit of environment, all installed and ready to go for hands on use.

test code

Officially, two modes are given, one for command line execution and one for code execution. In order to visualize the configuration, I am using the code mode here.

Prepare a picture with text

The test code is as follows

#!/user/bin/env python
# coding=utf-8
"""
@project : ocr_paddle
@author  : huyi
@file   : 
@ide    : PyCharm
@time   : 2021-11-15 14:56:20
"""
from paddleocr import PaddleOCR, draw_ocr
 
# Paddleocr currently supports multiple languages that can be switched by modifying the lang parameter
# For example, `ch`, `en`, `fr`, `german`, `korean`, `japan`.
ocr = PaddleOCR(use_angle_cls=True, use_gpu=False,
                lang="ch")  # need to run only once to download and load model into memory
img_path = './data/'
result = (img_path, cls=True)
for line in result:
    # print(line[-1][0], line[-1][1])
    print(line)
 
# Show results
from PIL import Image
 
image = (img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
im_show = draw_ocr(image, boxes, txts, scores, font_path='./fonts/')
im_show = (im_show)
im_show.save('')

Code Description

1, I set use_gpu=False because my computer does not have a graphics card.

2、Display the results of the part of the recognition of the text will be marked with a box, and show the results of the recognition.

Verify it.

As we can see, the printout has the location of the image where each of the recognized sentences is located, as well as the recognition results and the confidence level. And the results graphic above has the text corresponding to each sentence boxed out. The results are pretty good!

Parameter additions

There are also some official parameters given to adjust the output. You can refer to the documentation. Parameter additions:

- Detection on its own: set `--rec` to `false`.
- Recognition on its own: set `--det` to `false`.

Officials also provide a standard json structure for outputting data

The result of PP-Structure is a list of dicts, as shown in the following example.

```shell
[{ 'type': 'Text',
'bbox': [34, 432, 345, 462],
'res': ([[36.0, 437.0, 341.0, 437.0, 341.0, 446.0, 36.0, 447.0], [41.0, 454.0, 125.0, 453.0, 125.0, 459.0, 41.0, 460.0]],
[('Tigure-6. The performance of CNN and IPT models using difforen', 0.90060663), ('Tent ', 0.465441)])
}
]
```

summarize

Overall, the project was interesting, I won't go into the training part, after all, it's quite a hassle to prepare the data. I'll think about this project later and see if it can be magically transformed into a good tool.

Share:

We don't need to end up where we end up at all, just keep moving forward, and the road will keep stretching as long as we don't stop. -- Attack on Titan

If this article was helpful to you, please don't be stingy with your likes, thanks!

This article on the implementation of Python image text recognition PaddleOCR article is introduced to this, more related Python text recognition content, please search for my previous articles or continue to browse the following related articles I hope that you will support me in the future!