I. Installation of Beautiful Soup
Beautiful Soup is an HTML or XML parsing library for Python, using which you can easily extract data from web pages. Its parser is dependent on the lxml library, so make sure you have installed it beforehand.
The environment of this article is windows 10 64 bit + python3.11, here is the windows installation as an example.
1.1 Installing the lxml library
For the lxml library installation, first try to install it using pip:
pip install lxml
If the pip installation reports errors, such as missing libxml2 libraries, then you can use the wheel method to install
To install using the wheel method, you first need to install the wheel
pip install wheel
Then go to the official website/project/lxml/Download the wheel version of lxml, the latest version is lxml 4.9.1, click Download files.
Inside the listed files, pick the one that matches your version, for example, if your python version is 3.10 and your machine is windows, 64-bit version, then pick lxml-4.9.1-cp310-cp310-win_amd64.whl
One of the pitfalls here is that the latest python version is already version 3.11, but lxml does not have a corresponding official windows 311 version, only the 311 version under linux. You can choose to downgrade the python version, for example, to python version 3.10.
Or in the case of/~gohlke/pythonlibs/Instead, you can find the 311 version of the windows wheel installer, so you can try it on your own.
To install the wheel package, go to the directory where the wheel package is located and execute the pip command, or take the full path with you.
pip install lxml-4.9.0-cp311-cp311-win_amd64.whl
1.2 Install beautifulsoup4
It is recommended to use pip to install, execute the following installation command
pip install beautifulsoup4
1.3 Verifying that beautifulsoup4 works
Execute the following code, can successfully output hello, it shows that you can successfully use beautifulsoup4 for parsing.
If only beautifulsoup4 is installed successfully and the lxml library is not installed correctly, the following code cannot be executed successfully.
from bs4 import BeautifulSoup as bs soup = bs('<p>hello</p>', 'lxml') print()
Supplementary: Python install beautifulsoup4 library failed or reference error solution
1, first download the official website BeautifulSoup4 package inside the beautifulsoup4 library
2. Then unzip it into the G:\python\Lib\site-packages\bs4 directory, open a cmd window, go to the unzipped directory, and enter G:\python\Lib\site-packages\bs4\beautifulsoup4-4.3.2\beautifulsoup4- 4.3.2
3. Run cmd in the directory
python build python install
Errors that may be encountered :error in pymmseg setup command: use_2to3 is invalid.
The solution to the error: you need to lower the version, the last version less than 58 is 57.5.0, pip down a little on it:
pip install setuptools==57.5.0
Then you can reinstall the library
If you don't get any errors, just go straight to this step.
Importing modules (to test if module import is successful)
Go to cmd-python input:
from bs4 import BeautifulSoup
Note: Importing this library is in uppercase, lowercase doesn't work.
Problem solving success!
PS: This is a problem I encountered, solved for a long time and asked a lot of people, or not be able to solve is to rely on their own after Baidu query a lot of information to solve, I hope to help you.
summarize
to this detailed steps on the installation of the python parser library Beautiful Soup article is introduced to this, more related to the installation of the parser library Beautiful Soup content please search for my previous posts or continue to browse the following related articles I hope you will support me in the future more!