SoFunction
Updated on 2024-11-21

Pytorch-based version of yolov5 slider CAPTCHA crack idea details

preamble

In this paper, we are going to implement slider CAPTCHA cracking using pytorch framework's target recognition technique. We have chosen the yolov5 algorithm here

Example: Input image

在这里插入图片描述

output image

在这里插入图片描述

You can see that after the detection, we can accurately locate the position of the gap, and can get the coordinates of the gap, so that we can easily realize the sliding CAPTCHA crack.

I. Preliminary work

The yolov series are commonly used target detection algorithms. yolov5 is not only easy to configure, but also has a considerable increase in speed, and we can easily train our own dataset.
YOLOV5 Pytorch version GIthub URL thanks to this author for the code.

After downloading, it's in this format

---data/
	Annotations/ Annotation file for storing images(.xml)
	images/ Storing images to be trained
	ImageSets/ File to hold the partitioned dataset
	labels/ Box information for storing images

Only two of the folders, Annotations and images, need to be modified.
First we put the images to be trained into images

The dataset is thanks to this god for organizing /tzutalin/labelImg, on top of which I added 50 CAPTCHA images from Tencent

Data set has been uploaded to Baidu cloud

Link./s/1XS5KVoXqGHglfP0mZ3HJLQ

Extract code: wqi8

在这里插入图片描述

Then we need to annotate it and tell the computer what we want it to recognize. That's when we need the software Wizard Labeling. It's free and powerful, five stars!

在这里插入图片描述

The first step is to select the images folder, the second step is to write several categories, it is recommended to use English. Here there is only one category, that is, the location of the missing fast, named target. note that when labeling the left and right should be exactly stuck, otherwise the coordinates obtained will not be accurate.

When the labeling is done, click Export, the file format does not need to move, just click OK, it will generate our labeling file in the images/outputs folder. Copy them all to the Annotations folder.

Back to the main directory, run and voc_label.py, makeTxt directly run can, voc_label need to change the value of classes, this time only a target

import  as ET
import pickle
import os
The # () method is used to return a list of the names of the files or folders contained in the specified folder.
from os import listdir, getcwd
from  import join


sets = ['train', 'test', 'val']
classes = ['target'] # There were a few classes in the previous labeling, so enter a few here.

"""

............  

"""

Go to the data folder and the changes

# COCO 2017 dataset 
# Download command: bash yolov5/data/get_coco2017.sh
# Train command: python  --data ./data/
# Dataset should be placed next to yolov5 folder:
#  /parent_folder
#   /coco
#   /yolov5


# train and val datasets (image directory or *.txt file with image paths)
train: ../coco/ # 118k images
val: ../coco/ # 5k images
test: ../coco/ # 20k images for submission to /competitions/20794

# number of classes
nc: 1

# class names
names: ['target']

# Print classes
# with open('data/') as f:
#  d = (f, Loader=) # dict
#  for i, x in enumerate(d['names']):
#   print(i, x)

Then go to the mods folder and the modifications

nc: 1 # number of classes
depth_multiple: 0.33 # model depth multiple
width_multiple: 0.50 # layer channel multiple
"""
''''''''''''
"""

At this point the configuration session is finally over and training can begin!

To open it, we usually just need to change a few settings -weights, -cfg, -data, -epochs

parser = ()
parser.add_argument('--weights', type=str, default='', help='initial weights path')
parser.add_argument('--cfg', type=str, default='models/', help=' path')
parser.add_argument('--data', type=str, default='data/', help=' path')
parser.add_argument('--hyp', type=str, default='data/', help='hyperparameters path')
parser.add_argument('--epochs', type=int, default=300)
parser.add_argument('--batch-size', type=int, default=16, help='total batch size for all GPUs')
parser.add_argument('--img-size', nargs='+', type=int, default=[640, 640], help='[train, test] image sizes')
parser.add_argument('--rect', action='store_true', help='rectangular training')
parser.add_argument('--resume', nargs='?', const=True, default=False, help='resume most recent training')
parser.add_argument('--nosave', action='store_true', help='only save final checkpoint')
parser.add_argument('--notest', action='store_true', help='only test final epoch')
parser.add_argument('--noautoanchor', action='store_true', help='disable autoanchor check')
parser.add_argument('--evolve', action='store_true', help='evolve hyperparameters')
parser.add_argument('--bucket', type=str, default='', help='gsutil bucket')
parser.add_argument('--cache-images', action='store_true', help='cache images for faster training')
parser.add_argument('--image-weights', action='store_true', help='use weighted image selection for training')
parser.add_argument('--device', default='', help='cuda device, . 0 or 0,1,2,3 or cpu')
parser.add_argument('--multi-scale', action='store_true', help='vary img-size +/- 50%%')
parser.add_argument('--single-cls', action='store_true', help='train multi-class data as single-class')
parser.add_argument('--adam', action='store_true', help='use () optimizer')
parser.add_argument('--sync-bn', action='store_true', help='use SyncBatchNorm, only available in DDP mode')
parser.add_argument('--local_rank', type=int, default=-1, help='DDP parameter, do not modify')
parser.add_argument('--log-imgs', type=int, default=16, help='number of images for W&B logging, max 100')
parser.add_argument('--log-artifacts', action='store_true', help='log artifacts, . final trained model')
parser.add_argument('--workers', type=int, default=4, help='maximum number of dataloader workers')
parser.add_argument('--project', default='runs/train', help='save to project/name')
parser.add_argument('--name', default='exp', help='save to project/name')
parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment')
opt = parser.parse_args()

Run it straight away and start training!
。。。。。。。。。。。。。。。。

Once the training is done, go to runs/train/exp/weights and we copy to the home directory.

Finally, let's open up and change a couple of properties

parser = ()
  parser.add_argument('--weights', nargs='+', type=str, default='', help=' path(s)')
  parser.add_argument('--source', type=str, default='', help='source') # file/folder, 0 for webcam
  parser.add_argument('--img-size', type=int, default=640, help='inference size (pixels)')
  parser.add_argument('--conf-thres', type=float, default=0.25, help='object confidence threshold')
  parser.add_argument('--iou-thres', type=float, default=0.45, help='IOU threshold for NMS')
  parser.add_argument('--device', default='0', help='cuda device, . 0 or 0,1,2,3 or cpu')
  parser.add_argument('--view-img', action='store_true', help='display results')
  parser.add_argument('--save-txt', action='store_true', help='save results to *.txt')
  parser.add_argument('--save-conf', action='store_true', help='save confidences in --save-txt labels')
  parser.add_argument('--classes', nargs='+', type=int, help='filter by class: --class 0, or --class 0 2 3')
  parser.add_argument('--agnostic-nms', action='store_true', help='class-agnostic NMS')
  parser.add_argument('--augment', action='store_true', help='augmented inference')
  parser.add_argument('--update', action='store_true', help='update all models')
  parser.add_argument('--project', default='runs/detect', help='save results to project/name')
  parser.add_argument('--name', default='exp', help='save results to project/name')
  parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment')
  opt = parser.parse_args()

The -source attribute can be changed to data/images to recognize your own dataset to see if it can be recognized properly.
Small Tips, if the execution does not report an error, but there is no detection box, try to modify the -device to cpu, cuda version is too low will lead to the use of gpu no detection box (ask is persecuted by this small problem for a long time --_-).

Finally, around line 112, add a print

在这里插入图片描述

At this point the execution program will return the position information and confidence level of the box

在这里插入图片描述

Our precursor work is finally done~

II. Writing a crawler

1. Finding the right website

After a lot of searching, it finally locked up/

This is because it has a site structure that is easy for us to maneuver.

2. Import dependent libraries

Here we use selenium to simulate human actions.
About selenium installation and webdriver installation method in this article does not extend.

from selenium import webdriver
from .action_chains import ActionChains
import requests,re
import os
import requests
import re 
import time
from  import ActionChains

3. Writing cracking programs

Visit the website and discover the crack before you have to click in turn on the

在这里插入图片描述

Write code

def run()
	driver = ()
	
	headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.106 Safari/537.36"}
	# Disguise request headers
	
	
	         
	('/') # Visiting the site
	
	driver.find_element_by_xpath('/html/body/div[1]/section[1]/div/div/div/div[2]/div[1]/a[2]').click()
	driver.find_element_by_xpath('//*[@]').click()
	#Analogize the clicking action

proceed with

在这里插入图片描述

Here is the image we want to identify, but direct positioning can not be located, because this code is wrapped by the iframe, we need to locate the iframe first!

(2)      # Sleep for 2 seconds to prevent error reporting
	driver.switch_to_frame("tcaptcha_iframe") #Locate to an iframe based on its id
	target = driver.find_element_by_xpath("/html/body/div/div[3]/div[2]/div[1]/div[2]/img").get_attribute("src")
	# Get the original address of the image
	
	response = (target,headers=headers)	#Access to the image address
	 
	img = 
	with open( '','wb' ) as f:
	  (img)		#Saving images to the home directory,named after

Now that the pictures are available and the testing program is ready, let's get started!

'''
The usage of () is simply to execute the cmd command and get the return value of cmd
Here is the execution of
'''
	
	result = ("python ").readlines() # Implemented target detection program
	list = []
	for line in result:
	  (line)   #Store the return information from cmd into a list
	print(list)
	a = ("(.*):(.*]).(.*)\\n",list[-4]) #Get the location information of the image
	print(a)
	print(len(a))
	if len(a) != 0:     #If the box can be detected
	  tensor=a[0][1]
	  pro = a[0][2]
	  list_=tensor[2:-1].split(",")
	  
	  location = []
	  for i in list_:
	    print(i)
	    b = ("tensor(.*)",i)[0]
	    (b[1:-2])
	  # Extract the xy in the upper left corner of the box and the xy in the lower right corner of the box
	  drag1 = driver.find_element_by_xpath('/html/body/div/div[3]/div[2]/div[2]/div[2]/div[1]') 
	  #Locate to the drag button
	  
	  action_chains = ActionChains(driver) # Instantiate the mouse manipulation class
	  action_chains.drag_and_drop_by_offset(drag1, int(int(location[2])/2-85), 0).perform()
	  #Simulates holding and dragging the mouse to a distance of X and then releasing it.
	  input("Waiting for operation")  
	  () 
	else:
	  () 
	  print("Failure to recognize")    

Here's the focus.

action_chains.drag_and_drop_by_offset(drag1, int(int(location[2])/2-85), 0).perform()

Why are you dragging?int(int(location[2])/2-85) Far.

in the first placelocationThe format of this list is[Upper left x, upper left y, lower right x, lower right y]location[2]That is, to take out the x-value in the lower right corner.

The resolution of the CAPTCHA image we saved locally is as follows

在这里插入图片描述

However, the size of the images displayed on the website

在这里插入图片描述

x-axisis exactly half the size of the local image, soint(location[2]/2)And that's what you get.

在这里插入图片描述

However, the square to be dragged itself is still some distance to the left, and by analyzing it, we found that

在这里插入图片描述

The distance of the leftmost part of this small square from the leftmost part of the picture is 26 in the red box, i.e.

在这里插入图片描述

26 + 68 - 10 = 84, since this 10 is the length of the trial, let's make this distance 85

so farint(int(location[2])/2-85) The origin of this is also explained.
It's done, so let's watch the demo!

在这里插入图片描述

The full selenium code is as follows

from selenium import webdriver
from .action_chains import ActionChains
import requests,re
import os
import requests
import re 
import time
from  import ActionChains

def run()
	driver = ()
	
	headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.106 Safari/537.36"}
	# Disguise request headers
	('/') # Visiting the site
	driver.find_element_by_xpath('/html/body/div[1]/section[1]/div/div/div/div[2]/div[1]/a[2]').click()
	driver.find_element_by_xpath('//*[@]').click()
	#Analogize the clicking action
  (2)      # Sleep for 2 seconds to prevent error reporting
	driver.switch_to_frame("tcaptcha_iframe") #Locate to an iframe based on its id
	target = driver.find_element_by_xpath("/html/body/div/div[3]/div[2]/div[1]/div[2]/img").get_attribute("src")
	# Get the original address of the image
	
	response = (target,headers=headers)	#Access to the image address
	 
	img = 
	with open( '','wb' ) as f:
	  (img)		# Save the image to the home directory, name it
	'''
The usage of () is simply to execute the cmd command and get the return value of cmd
Here is the execution of
'''
	result = ("python ").readlines() # Implemented target detection program
	list = []
	for line in result:
	  (line)   #Store the return information from cmd into a list
	print(list)
	a = ("(.*):(.*]).(.*)\\n",list[-4]) #Get the location information of the image
	print(a)
	print(len(a))
	if len(a) != 0:     #If the box can be detected
	  tensor=a[0][1]
	  pro = a[0][2]
	  list_=tensor[2:-1].split(",")
	  
	  location = []
	  for i in list_:
	    print(i)
	    b = ("tensor(.*)",i)[0]
	    (b[1:-2])
	  # Extract the xy in the upper left corner of the box and the xy in the lower right corner of the box
	  drag1 = driver.find_element_by_xpath('/html/body/div/div[3]/div[2]/div[2]/div[2]/div[1]') 
	  #Locate to the drag button
	  action_chains = ActionChains(driver) # Instantiate the mouse manipulation class
	  action_chains.drag_and_drop_by_offset(drag1, int(int(location[2])/2-85), 0).perform()
	  #Simulates holding and dragging the mouse to a distance of X and then releasing it.
	  input("Waiting for operation")  
	  () 
	else:
	  () 
	  print("Failure to recognize")    

while True:   
  run()

To this article on the Pytorch version of the yolov5 based slider CAPTCHA crack ideas in detail to this article, more related Pytorch slider CAPTCHA crack content, please search for my previous articles or continue to browse the following related articles I hope that you will support me more in the future!