Explanation of Chinese encoding in URL links
Chinese gbk (GB2312) encoding: one character corresponds to two sets of %xx, i.e. %xx%xx
Chinese UTF-8 encoding: one character corresponds to three sets of %xx, i.e. %xx%xx%xx%xx
You can use Baidu for URL encoding and decoding default gbk
/s?wd=%E4%B8%AD%E5%9B%BD
python3 encoding and decoding examples
# -*- coding: utf-8 -*- # @File : urldecode_demo.py # @Date : 2018-05-11 from import quote, unquote # Coding url1 = "/s?wd=sino" # utf8 encoding, specify security characters ret1 = quote(url1, safe=";/?:@&=+$,", encoding="utf-8") print(ret1) # /s?wd=%E4%B8%AD%E5%9B%BD # gbk encoding ret2 = quote(url1, encoding="gbk") print(ret2) # https%3A///s%3Fwd%3D%D6%D0%B9%FA # Decoding url3 = "/s?wd=%E4%B8%AD%E5%9B%BD" ret3 = unquote(url3, encoding='utf-8') print(ret3) # /s?wd=sino
The examples use the urllib module and the () function.
import urllib rawurl=xxx url=(rawurl)
Module used: urllib
Functions used: ()
case (law)
import urllib rawurl = "%E6%B2%B3%E6%BA%90" url = (rawurl) print url
exports
Heyuan prefecture level city in Guangdong
() aims to decode the url encoding, and the counterpart of this function is the encoding function ()
>>> import urllib >>> ("Heyuan.") '%E6%B2%B3%E6%BA%90
Expansion of the problem
Why do URLs need to be encoded and decoded?
Usually if something needs to be encoded, it means that such a thing is not suitable for transmission. There are various reasons for this, such as Size being too large and containing private data. In the case of Url, the reason for encoding is that there are some characters in the Url that can cause ambiguity.
For example, Url parameter strings use key=value key-value pairs to pass parameters in the form of key-value pairs separated by & symbols, such as /s?q=abc&ie=utf-8. If your value string contains = or &, it will inevitably result in parsing errors by the server receiving the Url, and so you must escape the ambiguous & and = symbols must be escaped, that is, encoded.
Another example is that the encoding format of the Url is in ASCII, not Unicode, which means you can't include any non-ASCII characters in the Url, such as Chinese. Otherwise Chinese may cause problems if the client browser and server browser support different character sets.
For more examples of URL encoding and decoding in Python using the urllib module, check out the following related links