preamble
In the use of automated login website, often after entering the user name and password will encounter a verification code. Today introduces a generic CAPTCHA recognition OCR library, the CAPTCHA recognition completely say bye-bye, its name is ddddocr (with a brother with OCR). Here mainly alphanumeric CAPTCHA to explain.
Project address: /sml2h3/ddddocr
I. Installation of ddddocr
The command will automatically install the latest ddddocr that matches your computer's environment.
pip install ddddocr
If the installation is slow, you can connect to a domestic mirror to install it with the following command:
pip install ddddocr -i /simple/
II. Use of ddddocr
1. Examples of use
import ddddocr ocr = () with open('', 'rb') as f: img_bytes = () res = (img_bytes) print('The recognized CAPTCHA is:' + res)
2. Full code
import os import ddddocr from time import sleep from PIL import Image from selenium import webdriver from import By class GetVerificationCode: def __init__(self): = None url = 'Address to log in to' = () .maximize_window() # Maximize the browser (url) # Get CAPTCHA information def getVerification(self): # Get the location of the current file and where to save the screenshot. current_location = (__file__) screenshot_path = (current_location, "..", "VerificationCode") # Capture the current web page and put it in a custom directory named printscreen, which contains the captcha we need. sleep(1) .save_screenshot(screenshot_path + '//' + '') sleep(1) # Locate CAPTCHA imgelement = .find_element(, 'Xpath localization of captcha images') # Get CAPTCHA x,y axis coordinates location = # Get the length and width of the captcha size = # Write it as the coordinates of the position we need to intercept rangle = (int(location['x'] + 430), int(location['y'] + 200), int(location['x'] + size['width'] + 530), int(location['y'] + size['height'] + 250)) # Open the screenshot i = (screenshot_path + '//' + '') # Use Image's crop function to capture the area we need again from the screenshot fimg = (rangle) fimg = ('RGB') # Save our captured CAPTCHA image and read the CAPTCHA content (screenshot_path + '//' + '') ocr = () with open(screenshot_path + '//' + '', 'rb') as f: img_bytes = () = (img_bytes) print('The recognized CAPTCHA is:' + ) # Determine if the alert message exists when the CAPTCHA is wrong def isElementPresent(self, by, value): try: element = .find_element(by=by, value=value) except NoSuchElementException: pass # A NoSuchElementException occurred, indicating that the element was not found on the page, return False return False else: # No exception occurred, means the element was found in the page, return True return True # Login def login(self): () .find_element(, 'Username input box Xpath positioning').send_keys('Username') .find_element(, 'Password input box Xpath positioning').send_keys('Password') .find_element(, 'Captcha input box Xpath positioning').send_keys() sleep(1) .find_element(, 'Login button Xpath positioning').click() sleep(2) isFlag = True while isFlag: try: isPresent = (, 'Prompt message on captcha error Xpath localization') if isPresent is True: codeText = .find_element(, 'Prompt message on captcha error Xpath localization').text if codeText == "Authentication code incorrect.": () sleep(2) .find_element(, 'Captcha input box Xpath positioning').clear() sleep(1) .find_element(, 'Captcha input box Xpath positioning').send_keys() sleep(1) .find_element(, 'Login button Xpath positioning').click() sleep(2) tips = .find_element(, 'Prompt message Xpath location when captcha is not entered').text if tips == "Please enter the verification code.": () sleep(2) .find_element(, 'Captcha input box Xpath positioning').click() sleep(1) .find_element(, 'Captcha input box Xpath positioning').send_keys() sleep(1) .find_element(, 'Login button Xpath positioning').click() sleep(2) continue else: print("The verification code is correct, login successful!") except NoSuchElementException: pass else: isFlag = False sleep(5) () if __name__ == '__main__': GetVerificationCode().login()
3. Sample Captcha
4. Identification of results
It can be realized that: after the CAPTCHA recognition error, continue to recognize the
III. Code descriptions
In this article, the code in the time to wait for the use of forced waiting, if necessary, you can modify the code, you can use the display waiting. About selenium's three ways to wait (display wait, implicit wait, forced wait) you can refer to other bloggers to understand the article to learn.
summarize
It is possible to have some recognition ability for all the CAPTCHA images that exist now. Simply put, ddddocr makes CAPTCHA recognition so simple and easy to use that it can quickly detect text, numbers or icons on the picture, allowing more partners to quickly crack the login CAPTCHA of a website.
To this article on the Python universal CAPTCHA recognition OCR library ddddocr installation and use of tutorials on this article, more related Python CAPTCHA recognition OCR library ddddocr content, please search for my previous posts or continue to browse the following related articles I hope you will support me more in the future!