Crawling down the piggy bank's website rental room information but the output is this kind:
Baidu. python2.7 coding on window is indeed a pitfall!
The solution is as follows
If it's a dictionary, you have to convert it to a string and import it into a json library.
Then so output((data).decode("unicode-escape"))
The entire code demo
# -*- coding: UTF-8 -*- #Piggy Crawl import requests from bs4 import BeautifulSoup import json def get_xinxi(i): url = '/search-duanzufang-p%d-0/' %i html = (url) soup = BeautifulSoup() #Get Address dizhis=(' div > a > span') #Get Price prices = (' span.result_price') # Get simple information ems = (' div > em') datas =[] for dizhi,price,em in zip(dizhis,prices,ems): data={ 'Price':price.get_text(), 'Information':em.get_text().replace('\n','').replace(' ',''), 'address':dizhi.get_text() } print((data).decode("unicode-escape")) i=1 while(i<12): get_xinxi(i) i=i+1
Crawled 12 pages of information
Summary:
What's important to note is that
Creating a Soup
soup = BeautifulSoup()
For assignment of multiple values
for dizhi,price,em in zip(dizhis,prices,ems):
Dictionary output encoding problems
(data).decode("unicode-escape")
If you want to get the details of each one, you can get the value of its href attribute.
#page_list > ul > li:nth-of-type(1) > a
Then get the value of its attribute get('href') to get the details of each in the parsing page to get the desired information to add to the data dictionary
The above example of this Python output \u encoding to convert it to Chinese is all I have to share with you, I hope it can give you a reference, and I hope you will support me more.