1. Introduction
Hello everyone I'm Mr. Fei, in such as web crawlers,web
In scenarios such as application development, we need to utilizePython
Complete a large number ofurl
Parsing, generating and other operations.
as well asPython
ecology, whether it is using a method such asurllib
or a variety of third-party libraries that can be used to efficiently handleurl
The methods are all very rich. And today Mr. Fei I want to introduce to youurl
The processing library, on the other hand, is the one I'm most satisfied with after considering the simplicity of use and the speed of operation in actual use.
2. Efficient url processing in Python using yarl
This can be used to efficiently and easily processurl
The third-party library calledyarl
Usepip install yarl
After completing the installation, let's quickly learn some of the main ways it functions:
2.1 Parsing url information with yarl
on the basis ofyarl
hit the nail on the headURL()
We can get the information from any of the legalurl
The components shown in the figure below are parsed out in the
Let's start by looking at a simple example, where for each blog post attachment I keep thegithub
Warehouse Pathurl
Perform parsing:
from yarl import URL url = URL('/CNFeffery/DataScienceStudyNotes/tree/master/%E5%8E%86%E5%8F%B2%E6%96%87%E7%AB%A0%E9%99%84%E4%BB%B6%E5%88%97%E8%A1%A8')
The original URLs, because they contain Chinese and other nonASCII
character, so it becomes url-encoded after pasting it into the code, and directly calling thehuman_repr()
The method can be decoded and restored:
And by getting the correspondingurl
The attributes of each part name can be extracted separately:
where the port information is based on thescheme
information is extrapolated according to routine circumstances.http
be considered to be80
,https
be considered to be443
If you need to get theurl
The port information that appears explicitly in theexplicit_port
:
be directed againsturl
hit the nail on the headhash
Label information can then be accessed through thefragment
Acquisition:
If you want to parse theurl
includequery
parameter information, then you can directly callquery
captureMultiDict
type, which is a special type of dictionary that allows for duplicate keys and indexes values like a normal dictionary for key-value pairs that do not have duplicates.if notThen you need to pass thegetall()
method to return a list of all values corresponding to the passed-in key:
It can be felt through theyarl
analyzeurl
It's very convenient.
2.2 Constructing a url with yarl
When we need to construct information based on the existing parts of theurl
whenyarl
It's much more convenient that the base approach is based on the()
method, defined as a function passing a parameterurl
:
And if you already have a concrete presence ofobject, on which you want to set other parts of the content, you can use a series of names in the format of
with_xxx()
The method in which thexxx
It corresponds to the names of the individual parts:
In particular, for the query parameter section, there are also specializedupdate_query()
method for parameter appending, which is the same as thewith_query()
The difference can be appreciated in the following example:
2.3 Quickly synthesize url using /, % operators
existyarl
in the context of/
、%
operators have been rewritten to support shortcut operations like the example below, which is very convenient:
In addition to the aboveyarl
In addition to the commonly used functions, there are also functions such as utilizing theis_absolute()
methodological judgmenturl
Whether the absolute path and other useful functions, interested readers can go to the official documentation to learn more (/en/latest/)。
to this article on the use of Python yarl to achieve easy operation of url to this article, more related Python yarl operation of url content, please search for my previous posts or continue to browse the following related articles I hope you will support me in the future more!