SoFunction
Updated on 2024-11-15

Detailed steps for installing the python parsing library Beautiful Soup

I. Installation of Beautiful Soup

Beautiful Soup is an HTML or XML parsing library for Python, using which you can easily extract data from web pages. Its parser is dependent on the lxml library, so make sure you have installed it beforehand.

The environment of this article is windows 10 64 bit + python3.11, here is the windows installation as an example.

1.1 Installing the lxml library

For the lxml library installation, first try to install it using pip:

pip install lxml

If the pip installation reports errors, such as missing libxml2 libraries, then you can use the wheel method to install

20221211174313

To install using the wheel method, you first need to install the wheel

pip install wheel

20221211193726

Then go to the official website/project/lxml/Download the wheel version of lxml, the latest version is lxml 4.9.1, click Download files.

20221211204339

Inside the listed files, pick the one that matches your version, for example, if your python version is 3.10 and your machine is windows, 64-bit version, then pick lxml-4.9.1-cp310-cp310-win_amd64.whl

20221211200756

One of the pitfalls here is that the latest python version is already version 3.11, but lxml does not have a corresponding official windows 311 version, only the 311 version under linux. You can choose to downgrade the python version, for example, to python version 3.10.

Or in the case of/~gohlke/pythonlibs/Instead, you can find the 311 version of the windows wheel installer, so you can try it on your own.

20221211204213

To install the wheel package, go to the directory where the wheel package is located and execute the pip command, or take the full path with you.

pip install lxml-4.9.0-cp311-cp311-win_amd64.whl

20221211202526

1.2 Install beautifulsoup4

It is recommended to use pip to install, execute the following installation command

pip install beautifulsoup4

20221211173125

1.3 Verifying that beautifulsoup4 works

Execute the following code, can successfully output hello, it shows that you can successfully use beautifulsoup4 for parsing.

If only beautifulsoup4 is installed successfully and the lxml library is not installed correctly, the following code cannot be executed successfully.

from bs4 import BeautifulSoup as bs

soup = bs('<p>hello</p>', 'lxml')
print()

20221211203212

Supplementary: Python install beautifulsoup4 library failed or reference error solution

1, first download the official website BeautifulSoup4 package inside the beautifulsoup4 library

2. Then unzip it into the G:\python\Lib\site-packages\bs4 directory, open a cmd window, go to the unzipped directory, and enter G:\python\Lib\site-packages\bs4\beautifulsoup4-4.3.2\beautifulsoup4- 4.3.2

3. Run cmd in the directory

python  build
python  install

Errors that may be encountered :error in pymmseg setup command: use_2to3 is invalid.

The solution to the error: you need to lower the version, the last version less than 58 is 57.5.0, pip down a little on it:

pip install setuptools==57.5.0

Then you can reinstall the library

If you don't get any errors, just go straight to this step.

Importing modules (to test if module import is successful)

Go to cmd-python input:

from bs4 import BeautifulSoup

Note: Importing this library is in uppercase, lowercase doesn't work.

Problem solving success!

PS: This is a problem I encountered, solved for a long time and asked a lot of people, or not be able to solve is to rely on their own after Baidu query a lot of information to solve, I hope to help you.

summarize

to this detailed steps on the installation of the python parser library Beautiful Soup article is introduced to this, more related to the installation of the parser library Beautiful Soup content please search for my previous posts or continue to browse the following related articles I hope you will support me in the future more!