SoFunction
Updated on 2025-05-16

Python selenium opens the browser specified port to achieve continuous operation

Generally, when using selenium to crawl data, the common processing process is to let selenium complete all operations in the entire process from opening the browser. But sometimes, we hope that the user will open the browser and enter the specified web page first, complete a series of operations such as login authentication (such as user, password, SMS verification code and various difficult-to-process graphic verification codes), and then let selenium perform continuous operations from the logged-in page to crawl data. So how can we connect the front and back operations?

General Operation

The following method is generally used for routine operations. After setting the initial parameters, use the get method to open the web page directly.

from selenium import webdriver
 
 
class DriverClass:
    def __init__(self):
         = self._init_driver()
 
    def _init_driver(self):
        try:
            option = ()
            option.add_experimental_option('excludeSwitches', ['enable-automation'])
            option.add_experimental_option('useAutomationExtension', False)
            prefs = dict()
            prefs['credentials_enable_service'] = False
            prefs['profile.password_manager_enable'] = False
            prefs[''] = "Person 1"
            option.add_experimental_option('prefs', prefs)
            option.add_argument('--disable-gpu')
            option.add_argument("--disable-blink-features=AutomationControlled")
            option.add_argument('--user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"')
            option.add_argument('--no-sandbox')
            option.add_argument('ignore-certificate-errors')
            driver = (r"./driver/", options=option)
            driver.implicitly_wait(2)
            driver.maximize_window()
            return driver
        except Exception as e:
            raise e
 
    def get_driver(self) -> :
        if isinstance(, ):
            return 
        raise Exception('Initialization of the browser failed')
 
 
if __name__ == '__main__':
    dc = DriverClass()
    driver = dc.get_driver()
    print(driver)
    ("")

Continue operation

The connection operation is mainly done by setting the same interface when opening the browser (or selenium does not know which browser page to connect from).

User opens the browser

When the user opens the browser manually, specify the corresponding port (9527 is set here) and the data directory (customize one by one).

C:\Program Files\Google\Chrome\Application> --remote-debugging-port=9527 --user-data-dir="E:\lky_project\tmp_project\handle_qcc_data\\chrome_user_data"

After executing the above command, a new browser page will be opened.

After opening the browser, the user can manually enter the corresponding page to complete the corresponding user login authentication and other operations.

Program connection to browser

selenium by adding the following configuration parameters

option.add_experimental_option("debuggerAddress", "127.0.0.1:9527")

To open and continue the browser that handles the specified port that the user has opened. After that, the program can continue to process subsequent tasks through the browser handle.

driver_class.py

from selenium import webdriver
 
 
class DriverClass:
    def __init__(self):
         = self._init_driver()
 
    def _init_driver(self):
        try:
            option = ()
            # option.add_experimental_option('excludeSwitches', ['enable-automation'])
            # option.add_experimental_option('useAutomationExtension', False)
            # prefs = dict()
            # prefs['credentials_enable_service'] = False
            # prefs['profile.password_manager_enable'] = False
            # prefs[''] = "Person 1"
            # option.add_experimental_option('prefs', prefs)
            option.add_argument('--disable-gpu')
            option.add_argument("--disable-blink-features=AutomationControlled")
            option.add_argument('--user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"')
            option.add_argument('--no-sandbox')
            option.add_argument('ignore-certificate-errors')
            option.add_experimental_option("debuggerAddress", "127.0.0.1:9527")
            driver = (r"./driver/", options=option)
            driver.implicitly_wait(2)
            # driver.maximize_window()
            return driver
        except Exception as e:
            raise e
 
    def get_driver(self) -> :
        if isinstance(, ):
            return 
        raise Exception('Initialization of the browser failed')
 
 
if __name__ == '__main__':
    dc = DriverClass()
    driver = dc.get_driver()
    print(driver)
    # The program uses the subsequent browser handle driver Complete subsequent operations

Things to note

Note that some of the parameter settings of my follow-up operation functions above are commented out. This is because the connection is to continue to operate from the opened browser. Some parameters are already set when the user opens the browser, so it is no longer supported to continue to set repeatedly through the connection.

Practical examples

For example, after manually opening the browser with the specified port 9527, log in to Qichacha and enter advanced search, and then use the program to obtain the number of companies with the corresponding qualifications (the operation is too frequent and may trigger verification or blocking, please be cautious!), and finally generate the result file (there may be abnormal interruption in the middle, so the following method of using the breakpoint continuous search can be made. In this way, the subsequent operation will only query the unqueried qualification data).

driver_class.py is the above.

import json
import re
import time
 
from  import By
from driver_class import DriverClass
 
dc = DriverClass()
driver = dc.get_driver()
xpath_prefix = '//div/div/div/div/span[text()="Qualification Certificate"]/following-sibling::div' 
 
def checkbox_select(element_checkbox):
    """Check box selected"""
    class_attribute = element_checkbox.get_attribute("class")
    if "checked" not in class_attribute:
        element_checkbox.find_element(, './span[@class="qccd-tree-checkbox-inner"]').click()
 
 
def checkbox_unselect(element_checkbox):
    """Checkbox Unchecked"""
    class_attribute = element_checkbox.get_attribute("class")
    if "checked" in class_attribute:
        element_checkbox.find_element(, './span[@class="qccd-tree-checkbox-inner"]').click()
 
 
def get_amount(element_checkbox):
    """Get the number of enterprises corresponding to the corresponding check box"""
    checkbox_select(element_checkbox)
    xpath_confirm = xpath_prefix + '/div/div/div/div/div[text()="Sure"]'
    driver.find_element(, xpath_confirm).click()
    (0.5)
    try:
        xpath_result = '//div/div/div[@class="search-btn limit-svip"]'
        result = str(driver.find_element(, xpath_result).text)
    except Exception as e:
        print(f"abnormal: {str(e)}")
        result = "0"
    result = (",", "")
    match_object = ("(\d+)", result)
    amount = match_object.group(1)
    print(f"number:{amount}")
    # Clear the result to avoid accidentally clicking to close when clicking on the selection    xpath_clear = '//div/div/a[contains(text(), "clear")]'    try:
        driver.find_element(, xpath_clear).click()
    except:
        pass
    xpath_select = xpath_prefix + '[@class="trigger-container"]'
    driver.find_element(, xpath_select).click()
    (0.2)
    checkbox_unselect(element_checkbox)
    return amount
 
 
def extend_options():
    """Expand the collapse item and get data, expand only three layers"""
    # (data, open("", 'w', encoding="utf-8"), indent=2, ensure_ascii=False)
    try:
        data = (open("", encoding="utf-8"))
    except:
        data = {}
    try:
        xpath_first_class = xpath_prefix + '//div/ul/li[@role="treeitem"]'
        # xpath_first_class = xpath_prefix + '//div/ul/li/span[contains(@class, "qccd-tree-switcher")]'
        first_item_list = driver.find_elements(, xpath_first_class)
        for item_li in first_item_list:
            text_dk1 = item_li.find_element(, './span/span/div/span[@class="text-dk"]').text
            data[text_dk1] = (text_dk1, {})
            print(f"{text_dk1}")
            switcher = item_li.find_element(, './span[contains(@class, "qccd-tree-switcher")]')
            class_attribute = switcher.get_attribute("class")
            element_checkbox = item_li.find_element(, './span[contains(@class, "checkbox")]')
            if "close" in class_attribute:
                ()
                (0.1)
            elif "noop" in class_attribute:
                # The current node has no child nodes                if not data[text_dk1]:
                    amount = get_amount(element_checkbox)
                    data[text_dk1] = amount
                continue
            # After clicking, the lower level ul/li will be displayed            second_item_list = item_li.find_elements(, "./ul/li")
            for second_item_li in second_item_list:
                text_dk2 = second_item_li.find_element(, './span/span/div/span[@class="text-dk"]').text
                data[text_dk1][text_dk2] = data[text_dk1].get(text_dk2, {})
                print(f"--{text_dk2}")
                switcher = second_item_li.find_element(, './span[contains(@class, "qccd-tree-switcher")]')
                class_attribute = switcher.get_attribute("class")
                element_checkbox = second_item_li.find_element(, './span[contains(@class, "checkbox")]')
                if "close" in class_attribute:
                    ()
                    (0.1)
                elif "noop" in class_attribute:
                    # The current node has no child nodes                    if not data[text_dk1][text_dk2]:
                        amount = get_amount(element_checkbox)
                        data[text_dk1][text_dk2] = amount
                    continue
                # After clicking, the lower level ul/li will be displayed                third_item_list = second_item_li.find_elements(, "./ul/li")
                for third_item_li in third_item_list:
                    text_dk3 = third_item_li.find_element(, './span/span/div/span[@class="text-dk"]').text
                    data[text_dk1][text_dk2][text_dk3] = data[text_dk1][text_dk2].get(text_dk3, {})
                    print(f"----{text_dk3}")
                    switcher = third_item_li.find_element(, './span[contains(@class, "qccd-tree-switcher")]')
                    class_attribute = switcher.get_attribute("class")
                    # When you reach the third layer, no longer expand, directly select the check box                    element_checkbox = third_item_li.find_element(, './span[contains(@class, "checkbox")]')
                    if not data[text_dk1][text_dk2][text_dk3]:
                        amount = get_amount(element_checkbox)
                        data[text_dk1][text_dk2][text_dk3] = amount
    except Exception as e:
        raise e
    finally:
        (data, open("", 'w', encoding="utf-8"), indent=2, ensure_ascii=False)
 
 
def spider_data():
    # Try to close the qualification certificate selection box and clear the options    xpath_close = xpath_prefix + '/div/div/div/a[@class="nclose"]'
    xpath_clear = '//div/div/a[contains(text(), "clear")]'    try:
        driver.find_element(, xpath_close).click()
    except:
        pass
    try:
        driver.find_element(, xpath_clear).click()
    except:
        pass
    # Click the Qualification Certificate Selection Box    xpath_select = xpath_prefix + '[@class="trigger-container"]'
    driver.find_element(, xpath_select).click()
    (2)
    extend_options()
    # Cancel button    xpath_cancel = xpath_prefix + '/div/div/div/div/div[text()="Cancel"]'
    # OK button    xpath_confirm = xpath_prefix + '/div/div/div/div/div[text()="Sure"]'
    driver.find_element(, xpath_confirm).click()
 
 
if __name__ == '__main__':
    spider_data()

Finally, the generated file can be obtained as follows:

{
  "Construction Qualification": {
    "Engineering Design Qualification Certificate": {
      "Special qualification for engineering design": "26329",
      "Architectural Engineering Design Firm": "356",
      "Engineering Design Industry Qualification": "4487",
      "Professional Qualification for Engineering Design": "19902",
      "Comprehensive Qualification for Engineering Design": "98"
    },
    "Engineering Survey Qualification Certificate": {
      "Comprehensive Qualification for Engineering Survey": "377",
      "Professional Qualification for Engineering Survey": "7464",
      "Engineering Survey Labor Qualification": "3019"
    },
...
  },
  "Food and Agricultural Product Certification": {
    "Organic products(OGA)": "49868",
    "Good agricultural practices(GAP)": "6449",
    "Food quality certification(Wine)": "151",
    "Green food certification": "34723",
    "Green Market Certification": "318",
    "Pollution-free agricultural products": "31067",
    "Food Safety Management System Certification": "72075",
    "Hazard Analysis and Critical Control Point Certification": "51844",
    "Good production specification certification for dairy production enterprises": "445",
    "Hazard analysis and key control points of dairy production enterprises(HACCP)System certification": "570",
    "Feed products": "85"
  },
  "Other qualifications": {
    "School License": "192010",
    "Agent Accounting License": "34588",
    "Accounting Firm Practice Certificate": "12252",
    "DOC Certificate": "982",
    "SMC Certificate": "1886",
    "Famous and Special New Agricultural Products Certificate": "1818",
    "Comprehensive Qualifications for Bidding": "36317",
    "Blockchain Information Service Filing": "2765",
    "Medical Institution Practice License": "570877",
    "CCC Factory Certification": "16154",
    "Sanitation License": "3244"
  }
}

The above is the detailed content of opening the specified port of Python selenium to implement continuous operations. For more information about Python selenium browser, please follow my other related articles!