Requirement: There are multiple NICs on a machine, how to send data using a specific NIC when accessing a specific URL?
$ curl --interface eth0 # curl interface You can specify the network card
Read the source code, traced to open_http -> -> . _connection_class = HTTPConnection
HTTPConnection is created with a source_address.
when calling HTTPConnection._create_connection = socket.create_connection
# Take a look at the local NIC information first $ ifconfig lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384 options=3<RXCSUM,TXCSUM> inet6 ::1 prefixlen 128 inet 127.0.0.1 netmask 0xff000000 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 nd6 options=1<PERFORMNUD> en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500 ether c8:e0:eb:17:3a:73 inet6 fe80::cae0:ebff:fe17:3a73%en0 prefixlen 64 scopeid 0x4 inet 192.168.20.2 netmask 0xffffff00 broadcast 192.168.20.255 nd6 options=1<PERFORMNUD> media: autoselect status: active en1: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500 options=4<VLAN_MTU> ether 0c:5b:8f:27:9a:64 inet6 fe80::e5b:8fff:fe27:9a64%en8 prefixlen 64 scopeid 0xa inet 192.168.8.100 netmask 0xffffff00 broadcast 192.168.8.255 nd6 options=1<PERFORMNUD> media: autoselect (100baseTX <full-duplex>) status: active
You can see that en0 and en1, both cards have access to the public network. Lo0 is the local loopback.
Just modify it and test it.
def create_connection(address, timeout=_GLOBAL_DEFAULT_TIMEOUT, source_address=None): """If *source_address* is set it must be a tuple of (host, port) for the socket to bind as a source address before making the connection. An host of '' or port 0 tells the OS to use the default. source_address If you set the, Must be a pass tuple (host, port), The default is ("", 0) """ host, port = address err = None for res in getaddrinfo(host, port, 0, SOCK_STREAM): af, socktype, proto, canonname, sa = res sock = None try: sock = socket(af, socktype, proto) # (("192.168.20.2", 0)) # en0 # (("192.168.8.100", 0)) # en1 # (("127.0.0.1", 0)) # lo0 if timeout is not _GLOBAL_DEFAULT_TIMEOUT: (timeout) if source_address: print "socket bind source_address: %s" % source_address (source_address) (sa) return sock except error as _: err = _ if sock is not None: () if err is not None: raise err else: raise error("getaddrinfo returns an empty list")
Refer to the documentation, directly bind the IP address of a non-working NIC in three times, and set the port to 0.
# Test en0 $ python -c 'import urllib as u;print ("").read()' .148.245.16 # Test en1 $ python -c 'import urllib as u;print ("").read()' .94.115.227 # Testing lo0 $ python -c 'import urllib as u;print ("").read()' Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/System/Library/Frameworks//Versions/2.7/lib/python2.7/", line 87, in urlopen return (url) File "/System/Library/Frameworks//Versions/2.7/lib/python2.7/", line 213, in open return getattr(self, name)(url) File "/System/Library/Frameworks//Versions/2.7/lib/python2.7/", line 350, in open_http (data) File "/System/Library/Frameworks//Versions/2.7/lib/python2.7/", line 1049, in endheaders self._send_output(message_body) File "/System/Library/Frameworks//Versions/2.7/lib/python2.7/", line 893, in _send_output (msg) File "/System/Library/Frameworks//Versions/2.7/lib/python2.7/", line 855, in send () File "/System/Library/Frameworks//Versions/2.7/lib/python2.7/", line 832, in connect , self.source_address) File "/System/Library/Frameworks//Versions/2.7/lib/python2.7/", line 578, in create_connection raise err IOError: [Errno socket error] [Errno 49] Can't assign requested address
The test passes, which means that in the case of multiple NICs, it is sufficient to bind the IP of one NIC when creating the socket, and the port needs to be set to 0. If the port is not set to 0, you can see that an exception is thrown on the second request, and the port is occupied.
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/System/Library/Frameworks//Versions/2.7/lib/python2.7/", line 87, in urlopen return (url) File "/System/Library/Frameworks//Versions/2.7/lib/python2.7/", line 213, in open return getattr(self, name)(url) File "/System/Library/Frameworks//Versions/2.7/lib/python2.7/", line 350, in open_http (data) File "/System/Library/Frameworks//Versions/2.7/lib/python2.7/", line 1049, in endheaders self._send_output(message_body) File "/System/Library/Frameworks//Versions/2.7/lib/python2.7/", line 893, in _send_output (msg) File "/System/Library/Frameworks//Versions/2.7/lib/python2.7/", line 855, in send () File "/System/Library/Frameworks//Versions/2.7/lib/python2.7/", line 832, in connect , self.source_address) File "/System/Library/Frameworks//Versions/2.7/lib/python2.7/", line 577, in create_connection raise err IOError: [Errno socket error] [Errno 48] Address already in use
If you are in a project, just set the source_address of the socket.create_connection function to the (IP, 0) of the corresponding network card.
# test-interface_urllib.py import socket import urllib, urllib2 _create_socket = socket.create_connection SOURCE_ADDRESS = ("127.0.0.1", 0) #SOURCE_ADDRESS = ("172.28.153.121", 0) #SOURCE_ADDRESS = ("172.16.30.41", 0) def create_connection(*args, **kwargs): in_args = False if len(args) >=3: args = list(args) args[2] = SOURCE_ADDRESS args = tuple(args) in_args = True if not in_args: kwargs["source_address"] = SOURCE_ADDRESS print "args", args print "kwargs", str(kwargs) return _create_socket(*args, **kwargs) socket.create_connection = create_connection print ("").read()
After testing, we can see that the data can be sent through the specified NIC, and the IP address corresponds to the IP address assigned by the NIC.
The problem is, crawlers often use requests, does requests support it. As you can see from our tests, requests does not use python's built-in socket module.
Looking at the source code, requests creates a socket connection. The method is the same as looking at urllib to create a socket. I won't write the details.
Since I'm using python 2.7, I can tell that requests uses the socket module.
The modification method is not much different from urllib.
import _create_socket = .create_connection # pass .create_connection = create_connection # pass
after a run, May throw an exception. : Max retries exceeded with .. Invalid argument
This exception does not occur every time, it is related to IP segments, and is caused by too many levels of jump recursion, just remove the socket_options from the kwargs. 127.0.0.1 will definitely throw the exception.
import socket import urllib import urllib2 import import requests as req _default_create_socket = socket.create_connection _urllib3_create_socket = .create_connection SOURCE_ADDRESS = ("127.0.0.1", 0) #SOURCE_ADDRESS = ("172.28.153.121", 0) #SOURCE_ADDRESS = ("172.16.30.41", 0) def default_create_connection(*args, **kwargs): try: del kwargs["socket_options"] except: pass in_args = False if len(args) >=3: args = list(args) args[2] = SOURCE_ADDRESS args = tuple(args) in_args = True if not in_args: kwargs["source_address"] = SOURCE_ADDRESS print "args", args print "kwargs", str(kwargs) return _default_create_socket(*args, **kwargs) def urllib3_create_connection(*args, **kwargs): in_args = False if len(args) >=3: args = list(args) args[2] = SOURCE_ADDRESS in_args = True args = tuple(args) if not in_args: kwargs["source_address"] = SOURCE_ADDRESS print "args", args print "kwargs", str(kwargs) return _urllib3_create_socket(*args, **kwargs) socket.create_connection = default_create_connection # Use the default socket.create_connection because it can be problematic on occasion # .create_connection = urllib3_create_connection .create_connection = default_create_connection print " *** test requests: " + ("").content print " *** test urllib: " + ("").read() print " *** test urllib2: " + ("").read()
Caution.Use it. It doesn't seem to be working.
A slight refinement would be to automatically get the IP based on the network card name.
import subprocess def get_all_net_devices(): sub = ("ls /sys/class/net", shell=True, stdout=) () net_devices = ().strip().splitlines() # ['eth0', 'eth1', 'lo'] # Simply filter the name of the card here, and change it as needed. net_devices = [i for i in net_devices if "ppp" in i] return net_devices ALL_DEVICES = get_all_net_devices() def get_local_ip(device_name): sub = ("/sbin/ifconfig en0 | grep '%s ' | awk '{print $2}'" % device_name, shell=True, stdout=) () ip = ().strip() return ip def random_local_ip(): return get_local_ip((ALL_DEVICES)) # code ...
Just change args[2] = SOURCE_ADDRESS and kwargs["source_address"] = SOURCE_ADDRESS to random_local_ip() or get_local_ip("eth0")
What it's used for is a matter of imagination.
The above example of Python sending HTTP requests using a specified network card is all that I have shared with you.