SoFunction
Updated on 2024-11-13

Python data capture crawler proxy anti-blocking IP method

Crawler: a section of the program to automatically crawl the Internet information from the Internet to capture information that is valuable to us, in general, Python crawler program many times to use (Flying Pig IP) proxy IP address to crawl the program, but the default urlopen is not able to use the proxy IP, I'll share how to use the proxy IP experience of the Python crawler. (Recommended Flying Pig proxy IP registration can be used for free, browser search can be found)

1, focus, I use Python3 Oh, so to import urllib request, and then we call ProxyHandler, it can receive the parameters of the proxy IP. Proxy can be selected according to their own needs, of course, free of charge is also available, but the availability rate can be imagined. (Flying Pig IP)

2, then put the IP address in the form of a dictionary into it, this IP address is my mess, just for example. Set the key to http, of course, some are https, and then followed by the IP address as well as the port number (9000), depending on what type of IP address you have, different IP port number may be different according to the port you extracted in the Flying Pig shall prevail.

3. Then use build_opener() to build an opener object.

4, and then call the built opener object inside the open method to generate the request. In fact, urlopen is also similar to this using the internally defined (), here is equivalent to our own rewrite.

5. Of course, if we use install_opener(), we can set the previously customized opener to global.

6, set to global, if we then use urlopen to send a request, then send a request to use the IP address is the proxy IP, rather than the local IP address.

7, the last to say that the use of proxy errors encountered, prompting the target computer positive rejection, which means that the proxy IP may be invalid, or the port number is wrong, which requires the use of a valid IP to Oh. (This is now a mess of filling in the IP address) can choose the proxy IP of the Flying Pig.

Summary: This is this time about Python data capture crawler proxy anti-blocking IP method, thanks for reading and supporting me.