Do crawlers will always encounter a variety of anti-climbing restrictions, anti-climbing the first line of defense often appear in the login, in order to limit the crawler automatically logged in, each family made every effort, the so-called Taoist higher than the devil.
Today I'd like to share a simple example of how to handle a captcha with a sliding image.
Login verification similar to this dragging the slider to move to the gap position in the image to overlap with it is more common in many websites or apps because it is friendly to real user experience and easy to recognize. It also blocks out most of the primary crawlers.
As a python crawler, how do you automate this validation process correctly?
First to analyze, the core problem is actually how to find the location of the target gap, once we know the location, we can borrow selenium and other tools to complete the drag operation.
We can borrow opencv to solve this problem, main steps:
What is opencv?
OpenCV (Open Source Computer Vision Library) is an open source computer vision library, the main algorithms involved in image processing, computer vision and machine learning related methods, can be used to develop real-time image processing, computer vision and pattern recognition programs.
Direct Installation
First the image is subjected to Gaussian blurring, the main purpose of Gaussian blurring is to reduce the noise of the image for the preprocessing stage.
Post-treatment effect
Canny edge detection is then used to obtain a binary image containing a "narrow border". A binary image is a black and white image, with only black and white.
Contour Detection
Find all the outlines and mark them with a red wireframe, and see that there are dozens of outlines, large and small.
The rest of the problem is easy, we only need to restrict the area or perimeter of the contour to filter out the location of the target contour, as long as we have predetermined the size of the contour at the target location.
The area of the outline is between 6000 and 8000, the perimeter is between 300 and 500, and the outer rectangle is used to get the coordinates of the outline and the width and height of the outline.
As above, you have found the target position, the rest of the work is to move the slider to the designated position can be
To this article on the Python crawler to crack the sliding CAPTCHA case analysis of the article is introduced to this, more related Python crawler to crack the sliding CAPTCHA content, please search for my previous articles or continue to browse the following related articles I hope that you will support me in the future more!