SoFunction
Updated on 2024-11-12

Crawling Ctrip Reviews in Python Crawler Practice

I. Analyzing data sources

Does the data source here mean html pages? Or Aajx asynchronous. For crawler beginners, may not know how to judge, here Tatsu also hand over once.

draw attention to sth.: None of the following require a login (of course a login is fine)

Let's search for Ctrip in your browser and then search for any attraction in Ctrip:Changlong Wildlife WorldHere's an example of how to go about crawling Ctrip review data, using Changlong Wildlife World as an example.

 Image

At the bottom of the page is the review data

 Image

Image Image

As you can see from the two images above, clicking on the next page of the comment leaves the browser link unchanged, theExplain that the data is an Ajax asynchronous request. So we've found that the data is loaded in asynchronously, and it's time to go to theInside network is to view packets

II. Analysis of data packages

Find the following packet in network

 Image

Viewing the content inside the Preview (requesting the return content)

Image

You can see that the data has been requested, and here's a look to see if the data is correct (and consistent with the content of the page).

 Image

ok, after no problem, the following start to write Python program to request data.

1. Request address

Image

The request link and request method can be obtained.

Image

It is possible to make a request here without adding a request header header. In this case, thepostUrlis the request link.data_1is the request parameter.

2. Request parameters

You can see the request parameters in network

Image

The build in the program is as follows:

Image

One of the things to focus on is the arg inpageIndex(number of pages).pageSize(number of entries per page).

Image

The final results are as follows:

Image

The reviews for that attraction can then be successfully crawled down.

III. Capture all comments

The above just captures the first page of comment data, by changing the arg in thepageIndex(number of pages), it is possible to iterate through and crawl all the comments.

Image

Let's say this attraction is 300 pages in total. Now add the loop to the

The final complete code is as follows:

Image

To this article on the Python crawler battle crawl Ctrip comments on the article is introduced to this, more related Python crawl Ctrip comments content please search my previous posts or continue to browse the following related articles I hope you will support me in the future more!