1.beautifulsoup4 library installation
Step one:Install the beautifulsoup4 library by entering the following command in the console.
pip install beautifulsoup4
Step Three:existpycharm
In thefile
——settings
——project
——python interpreter
--Click on the + sign --Searchbeautifulsoup4
——install package!
This will allow you to import the module in the .py file!
2.beautifulsoup4 library use
import requests # Although the library is called beautiful4, it is imported using the abbreviation bs4 where BeautifulSoup is a class name. from bs4 import BeautifulSoup url = '/s?' # Since websites are generally for users to access, if the User-Agent is detected as a hacker or otherwise may deny access, the browser is simulated here. headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36' } response = (url=url, headers=headers) # In case of garbled code here set its encoding to utf-8 because of the Chinese language = 'utf-8' # print() # The parser used is Note that it is . O soup = BeautifulSoup(, '') # Print the parsed result print(())
What needs to be explained is in the code comments!
3.beautifulsoup4 library basic elements
beautifulsoup4
The library is a library of functions for parsing, traversing, and maintaining the "tag tree".
firstly, let's look atBeautifulSoup
library parsers, the first two are more commonly used!
On the basis of the above code, add the following lines, combined with the use of basic elements, can be obtained as shown in the figure.
It is important to note that .string
can cross tags, so it is likely that the result will also be a comment, in order to distinguish whether it is a string within a tag or a comment, you can determine this by the type of print.
Next, take a look.BeautifulSoup
library traversal, where the iterative traversal of the red box is drawn, can be used in a for in loop.
4. beautifulsoup4 library of HTML lookup methods
find_all( name , attrs , recursive , string , **kwargs )
find_all()
method searches for the currenttag
alltag
child nodes, and determines whether the conditions of the filter are met.
name
parameter allows you to search for tags with the name.
The attrs parameter allows for the retrieval of tags whose tag attribute value is attrs.
The recursive parameter indicates whether to search all the children and grandchildren, the default is TRUE, if you only want to search the son information of the current node, you can set it to FALSE.
The string parameter retrieves the content of the string in the label.
5. Supplementary Json (Javascript Object Notation)
We have learned js's or java's, should not be unfamiliar with Json!
Json is a typed key-value pair!
Note that both the key and the value need to be enclosed in "", and if the value is an integer, you can leave out the ""!
If the value is multi-valued, you can use [,]; if the value is a key-value pair, you can use {:,:,,}, which can be nested.
JSON is generally used for interfaces, while YAML is untyped key-value pairs, generally used for configuration files.
Up to this point this article about a program based onpycharm
(used form a nominal expression)beautifulsoup4
Library tutorial on how to use the article is introduced to this, more relevantpycharm
(used form a nominal expression)beautifulsoup4
Library use content please search my previous posts or continue to browse the related articles below I hope you will support me more in the future!