Crawling Ctrip Reviews in Python Crawler Practice

I. Analyzing data sources

Does the data source here mean html pages? Or Aajx asynchronous. For crawler beginners, may not know how to judge, here Tatsu also hand over once.

draw attention to sth.: None of the following require a login (of course a login is fine)

Let's search for Ctrip in your browser and then search for any attraction in Ctrip:Changlong Wildlife WorldHere's an example of how to go about crawling Ctrip review data, using Changlong Wildlife World as an example.

At the bottom of the page is the review data

As you can see from the two images above, clicking on the next page of the comment leaves the browser link unchanged, theExplain that the data is an Ajax asynchronous request. So we've found that the data is loaded in asynchronously, and it's time to go to theInside network is to view packets。

II. Analysis of data packages

Find the following packet in network

Viewing the content inside the Preview (requesting the return content)

You can see that the data has been requested, and here's a look to see if the data is correct (and consistent with the content of the page).

ok, after no problem, the following start to write Python program to request data.

1. Request address

The request link and request method can be obtained.

It is possible to make a request here without adding a request header header. In this case, thepostUrlis the request link.data_1is the request parameter.

2. Request parameters

You can see the request parameters in network

The build in the program is as follows:

One of the things to focus on is the arg inpageIndex(number of pages).pageSize(number of entries per page).

The final results are as follows:

The reviews for that attraction can then be successfully crawled down.

III. Capture all comments

The above just captures the first page of comment data, by changing the arg in thepageIndex(number of pages), it is possible to iterate through and crawl all the comments.

Let's say this attraction is 300 pages in total. Now add the loop to the

The final complete code is as follows:

To this article on the Python crawler battle crawl Ctrip comments on the article is introduced to this, more related Python crawl Ctrip comments content please search my previous posts or continue to browse the following related articles I hope you will support me in the future more!