SoFunction
Updated on 2024-11-20

How to realize image to text in python

python image to text

Made an image-to-text widget with python+Tesseract-OCR, GUI design using tkinter library controls

The interface and effects are shown below:

#Further optimization 1. add labels at the bottom 2. process the recognized text, remove spaces


from PIL import Image as PImage
from PIL import ImageTk
import pytesseract
from tkinter import *
from tkinter import filedialog
from  import ScrolledText
import re

# Translate image content into text to be displayed in a text box
def trans():
    ('1.0', END)
    transTxt = pytesseract.image_to_string((()),lang='chi_sim')
    # Process the transTxt Remove spaces, line breaks, de-emphasize.
    transTxt = ('\n\r')   #No parameter can delete the space at the beginning and end \n\t\r
    print(transTxt)
    ( INSERT, (' ','').replace('\n\n','\n').replace('\r',''))

# Open the image file, show the path, and present the image
def openfile():
    ('1.0', END)
    (())
    (1.0,())
    org_img = (())
    # Resize image display 600*800
    w,h = org_img.size
    if w>600:
        h=int(h*600/w)
        w=600
    if h>800:
        w=int(w*800/h)
        h=800
    img = (org_img.resize((w,h)))
    (image=img)
     = img       #Keeping a quote to show the image, tkinter's bugs
    

# Setting up the main window
top = Tk()
("OCR Image to Text Engine: Tesseract-OCR Made by: kaivis")
#("./pic/")
("1200x800")

filePath=StringVar()

bt_img1 = ( file= "./pic/")
bt_img2 = ( file= "./pic/bt_img2.png")

# First form
frame1 = Frame (top, relief=RAISED, borderwidth=2)
(side=TOP, fill=BOTH,  ipady=5, expand=0)
Label(frame1,height=1,text="Image path:").pack(side=LEFT)
filename = Text(frame1,height=2)
(side=LEFT,padx=1, pady=0,expand=True, fill=X)
Button(frame1,text="Open file", image=bt_img1, command=openfile).pack(side=LEFT,padx=5, pady=0)
Button(frame1,text="Chinese Recognition", image=bt_img2, command=trans).pack(side=LEFT,padx=5, pady=0)

# The second form
frame2 = Frame (top, relief=RAISED, borderwidth=2)
 (side=LEFT, fill=BOTH,  expand=1)
Label(frame2,text='Picture display:',borderwidth=5).pack(side=TOP,padx=20,pady=5)
showPic = Label(frame2,text='Picture display area')
(side=BOTTOM,expand=1,fill=BOTH)

# The third form
frame3 = Frame (top)
 (side=RIGHT, fill=BOTH,  expand=1)
#contents = ScrolledText(frame3)
Label(frame3,text='Identification results:',borderwidth=5).pack(side=TOP,padx=20,pady=10)
contents = Text(frame3,font=('Arial',15))
(side=TOP,expand=1,fill=BOTH)
Label(frame3,text='Copyright 2021  ALL Rights Reserved',borderwidth=5).pack(side=BOTTOM,padx=20,pady=10)

()

Problems:

  • The recognition rate is not high, and it is even more difficult to achieve a high accuracy rate for compact Chinese characters, is there a better OCR engine?
  • After the recognition of the text has been done to remove the space processing, the text can be further optimized, especially the redundant line breaks need to be processed

python screenshot to text function

Due to find information on the Internet, often encounter articles can not be copied, in order to be able to quickly copy the desired text, so I wanted to write a python program to achieve the function of screenshot to text.

1. Ideas

First of all, you need to have the ability to record the keyboard (so that the program knows that you are taking a screenshot) - keyboard library needs to receive the image after taking a screenshot - ImageGrab library needs to recognize the text after acquiring the image - Baidu AI Text Recognition API

2. Realization

2.1 Importing relevant libraries

2.2 Create classes and write functions that implement screenshot saving

Since I am using the screenshot software that comes with win10, the screenshot hotkey is 'win+shift+s', you are free to change it according to the screenshot software.

2.3 Write image-to-text functions

First, go to the official website of Baidu Intelligent Cloud to apply for an API for image recognition.

Write the parameters to the program:

Write the transcription function:

2.5 Operation

To use it, just create the class and call the two functions:

2.6 Effects

Run the program and take a screenshot of a random article in the Baidu library:

The results are as follows:

Attention:

As you can see by the results of the 2.6 run, the results are still good. Perfect solution for my current needs.

summarize

The above is a personal experience, I hope it can give you a reference, and I hope you can support me more.