Usage of python
analyze
url:(url, scheme='', allow_fragments=True) url:(url, scheme='', allow_fragments=True)
Simple to use:
urlparse
from urllib import request, parse2 #Parsing the url print(('/')) print(('/', scheme='http')) print(('/', scheme='http')) # Here are the results ParseResult(scheme='https', netloc='', path='/', params='', query='', fragment='') ParseResult(scheme='https', netloc='', path='/', params='', query='', fragment='') ParseResult(scheme='http', netloc='', path='/', params='', query='', fragment='')
As you can see there is a difference between the results returned with the scheme parameter and without it.
And when the scheme protocol is added and the preceding url also contains the protocol, the following scheme parameter is generally ignored
Since there are parsed urls, there are of course anti-parsed urls, which are elements concatenated into a single url
from urllib import parse # Splicing list elements into urls url = ['http', 'www', 'baidu', 'com', 'dfdf', 'eddffa'] # At least six elements are needed here print((url)) # Here are the results6http://www/baidu;com?dfdf#eddffa
urlunparse()Receive a list of parameters, and the length of the list is required, it must be more than six parameters, or will not throw an exception!
(): This fills in the missing part of the url of the second parameter with the url of the first parameter.
# link two parameters of the url, the second parameter in the missing part of the first parameter to fill in, if the second has a complete path, then the second one is the main one print(('/', 'index')) print(('/', '/login')) # Here are the results /index6 /login
urlencodeThere is a urlencode function inside the urllib library that converts key-value pairs like key-value into the format we want, returning a string like a=1&b=2, for example:
>>> from urllib import urlencode >>> data = { ... 'a': 'test', ... 'name': 'The Beast' ... } >>> print urlencode(data) a=test&name=%C4%A7%CA%DE If you only want to do this for a stringurlencodeconversions,what's be done?urllibProvide another function:quote() >>> from urllib import quote >>> quote('The Beast') '%C4%A7%CA%DE'
urldecodeWhen the string is passed after urlencode, it has to be decoded after accepting it - urldecode. urllib provides the function unquote(), but not urldecode()!
>>> from urllib import unquote >>> unquote('%C4%A7%CA%DE') '\xc4\xa7\xca\xde' >>> print unquote('%C4%A7%CA%DE') mythological animal
module (in software)
Modules are provided in python for encoding and decoding, which areurlencode()together withunquote()
encode urlencode()
# Import the parse module from urllib import parse # Call the parse module's urlencode() for encoding query_string = {'wd':'Crawler'} result = (query_string) # format function formats a string for url splicing url = '/s?{}'.format(result) print(url)
Encoding operations on url addresses
Encoding quote(string)
from urllib import parse url = "/s?wd={}" words = input('Please enter content') #quote() can only encode strings query_string = (words) url = (query_string) print(url)
quote() can only encode strings, while urlencode() can encode query strings.
Decode unquote(string)
from urllib import parse string = '%E7%88%AC%E8%99%AB' result = (string) print(result)
Decoding is the reduction of the encoded url
URL address splicing method
String addition
query1= '/s?' query2='wd=%E7%88%AC%E8%99%AB' url = query1 + query2
String Formatting
query2='wd=%E7%88%AC%E8%99%AB' url = '/s?%s'% query2
format()
# Import the parse module from urllib import parse # Call the parse module's urlencode() for encoding query_string = {'wd':'Crawler'} result = (query_string) # format function formats a string for url splicing url = '/s?{}'.format(result) print(url)
summarize
The above is a personal experience, I hope it can give you a reference, and I hope you can support me more.