SoFunction
Updated on 2024-11-15

Python using yarl to achieve easy manipulation of url

1. Introduction

Hello everyone I'm Mr. Fei, in such as web crawlers,webIn scenarios such as application development, we need to utilizePythonComplete a large number ofurlParsing, generating and other operations.

as well asPythonecology, whether it is using a method such asurllibor a variety of third-party libraries that can be used to efficiently handleurlThe methods are all very rich. And today Mr. Fei I want to introduce to youurlThe processing library, on the other hand, is the one I'm most satisfied with after considering the simplicity of use and the speed of operation in actual use.

2. Efficient url processing in Python using yarl

This can be used to efficiently and easily processurlThe third-party library calledyarlUsepip install yarlAfter completing the installation, let's quickly learn some of the main ways it functions:

2.1 Parsing url information with yarl

on the basis ofyarlhit the nail on the headURL()We can get the information from any of the legalurlThe components shown in the figure below are parsed out in the

Let's start by looking at a simple example, where for each blog post attachment I keep thegithubWarehouse PathurlPerform parsing:

from yarl import URL

url = URL('/CNFeffery/DataScienceStudyNotes/tree/master/%E5%8E%86%E5%8F%B2%E6%96%87%E7%AB%A0%E9%99%84%E4%BB%B6%E5%88%97%E8%A1%A8')

The original URLs, because they contain Chinese and other nonASCIIcharacter, so it becomes url-encoded after pasting it into the code, and directly calling thehuman_repr()The method can be decoded and restored:

And by getting the correspondingurlThe attributes of each part name can be extracted separately:

where the port information is based on theschemeinformation is extrapolated according to routine circumstances.httpbe considered to be80httpsbe considered to be443If you need to get theurlThe port information that appears explicitly in theexplicit_port

be directed againsturlhit the nail on the headhashLabel information can then be accessed through thefragmentAcquisition:

If you want to parse theurlincludequeryparameter information, then you can directly callquerycaptureMultiDicttype, which is a special type of dictionary that allows for duplicate keys and indexes values like a normal dictionary for key-value pairs that do not have duplicates.if notThen you need to pass thegetall()method to return a list of all values corresponding to the passed-in key:

It can be felt through theyarlanalyzeurlIt's very convenient.

2.2 Constructing a url with yarl

When we need to construct information based on the existing parts of theurlwhenyarlIt's much more convenient that the base approach is based on the()method, defined as a function passing a parameterurl

And if you already have a concrete presence ofobject, on which you want to set other parts of the content, you can use a series of names in the format ofwith_xxx()The method in which thexxxIt corresponds to the names of the individual parts:

In particular, for the query parameter section, there are also specializedupdate_query()method for parameter appending, which is the same as thewith_query()The difference can be appreciated in the following example:

2.3 Quickly synthesize url using /, % operators

existyarlin the context of/%operators have been rewritten to support shortcut operations like the example below, which is very convenient:

In addition to the aboveyarlIn addition to the commonly used functions, there are also functions such as utilizing theis_absolute()methodological judgmenturlWhether the absolute path and other useful functions, interested readers can go to the official documentation to learn more (/en/latest/)。

to this article on the use of Python yarl to achieve easy operation of url to this article, more related Python yarl operation of url content, please search for my previous posts or continue to browse the following related articles I hope you will support me in the future more!