SoFunction
Updated on 2024-11-14

python+selenium+chrome batch file download and automatically create folder example

Achieve the effect: through the url is bound to the key name to create a directory name, each time to visit a web page url after the file download down

Code:

Where data[i][0], data[i][1] is the keyword (the directory where the file is stored), website link (the website where the file is to be downloaded).

def getDriverHttp():
 for i in range(reCount):
  # Create an instance of the Chrome configuration object
  chromeOptions = ()
  # Set the save directory of the downloaded file to the tudi directory on the d-disk.
  # If the directory does not exist, it will be created automatically
  prefs = {"download.default_directory": "e:\\tudi\\{0}".format(data[i][0]), "profile.default_content_setting_values.automatic_downloads":1}
  # Add customized settings to the Chrome configuration object instance
  chromeOptions.add_experimental_option("prefs", prefs)
  # Launch Chrome with customized settings
  # driver = (executable_path="e:\\chromedriver", chrome_options=chromeOptions)
  driver = (chrome_options=chromeOptions)
 
  (data[i][1])
 
  info2 = (r'<a href="#" rel="external nofollow" onclick="(.*?)" cssclass="xz_pic">', driver.page_source, )
  print(len(info2))
  for js in info2:
   driver.execute_script(js)
 
def main():
 getDriverHttp()

Note: When python downloads a file using selenium, chrome prompts whether to download multiple files or not (Download multiple files)

prefs = {"download.default_directory": "e:\\tudi\\{0}".format(data[i][0]), "profile.default_content_setting_values.automatic_downloads":1}

Set to allow multiple file downloads.

Additional knowledge:python project to implement the operation of unified configuration management

A larger project will always involve a lot of parameters, and the best way to manage them is to have them all in one place. I've been looking at quite a few python projects lately, and I've summarized two very interesting approaches to configuration management.

The first type of configuration management is based on easydict implementation.

First you need to install numpy, easydict, and yaml:

pip install numpy
pip install easydict
pip install yaml

That's all.

Then define the configuration class:

import numpy as np
from easydict import EasyDict as edict
import yaml
 
# Create dict
__C = edict()
cfg = __C
 
# Define the configuration dict
__C.dev = edict()
__C. = 'dev-xingoo'
__C. = 20
 
__C.test = edict()
__C. = 'test-xingoo'
__C. = 30
 
# internal method to implement yaml config file to dict merge
def _merge_a_into_b(a, b):
 """Merge config dictionary a into config dictionary b, clobbering the
 options in b whenever they are also specified in a.
 """
 if type(a) is not edict:
  return
 
 for k, v in ():
  # a must specify keys that are in b
  if k not in b:
   raise KeyError('{} is not a valid config key'.format(k))
 
  # the types must match, too
  old_type = type(b[k])
  if old_type is not type(v):
   if isinstance(b[k], ):
    v = (v, dtype=b[k].dtype)
   else:
    raise ValueError(('Type mismatch ({} vs. {}) '
        'for config key: {}').format(type(b[k]),
               type(v), k))
 
  # recursively merge dicts
  if type(v) is edict:
   try:
    _merge_a_into_b(a[k], b[k])
   except:
    print(('Error under config key: {}'.format(k)))
    raise
  else:
   b[k] = v
# Automatically load yaml files
def cfg_from_file(filename):
 """Load a config file and merge it into the default options."""
 with open(filename, 'r', encoding='utf-8') as f:
  yaml_cfg = edict((f))
 
 _merge_a_into_b(yaml_cfg, __C)

It's simple to use:

from config import cfg_from_file
from config import cfg
 
cfg_from_file('')
print()
print()

Create a configuration file in a sibling directory

dev:
name: xingoo-from-yml

Output.

xingoo-from-yml
test-xingoo

summarize

The advantage of this is that you can use the config file in any Python file just by from config import cfg.

Second Class-based implementation

This is based on a normal python object implementation that creates.

class Config:
 def __init__(self):
   = 'xingoo-config2'
   = 100

Create a new object directly when you use it, how python modules need to refer to this variable between them, then you need to pass the configuration object over:

import config2 as config2
 
cfg2 = ()
print()
print()

The output is:

xingoo-config2
100

summarize

The second method is simple and crude... But it's also a pain in the ass to pass parameters each time. Still prefer the first way.

Above this python + selenium + chrome batch file download and automatically create a folder example is all I have shared with you, I hope to be able to give you a reference, and I hope you can support me more.