A mention of python, we often mention the crawler, crawler recently emerged because I think the main reason is still because of big data, big data led to our data does not only exist in their own servers, and the simplicity of the python language has also become the primary language of the crawler tool, we are talking about this article to crawl the crawler, crawling Sina News!
1、As you know, the crawler is actually a simulation of the browser request, and then the request to the data, after our analysis, to extract the content we want, which is the implementation of the crawlerAs you know, the crawler is actually a simulation of the browser request, and then the request to the data, after our analysis, to extract the content we want, which is the implementation of the crawler
2, first of all, we have to write a crawler, you can draw on a number of tools, let's start with a simple introduction, first of all, when it comes to the request, we will think of python, very good use of requests, and then when it comes to analyzing and parsing will be used to bs4, and then we can directly use the pip command to achieve the installation, if the installation of the python3, you can also use pip3
3, after installing the two class libraries, then we can first request data to see the content of the news, this time we may see the code is messy!
4, how to deal with garbled code? We can take the browser to open the page, right-click to view the page source code, we can see the encoding format for utf-8
5, then we add the encoding format in the output, you can view the correct encoding of the data
6, get the data, we need to first analyze the data to see where we want the data, we open the browser, right-click to review, and then operate according to the sample chart, you can see our news where the label, if it is a windows system, select the development of tools inside the same!
7, we know which belongs to the label, is to use bs4 to parse to get the data we want
8, we want to get the specific title of the news, time, address, we need to be in the elements for in-depth analysis, we are still in accordance with the previous method, to find the title where the label
9, then we write the title time address python program, you can climb out of the corresponding title content, time and address
10、Simple python crawling news is here!
Summary: These are the steps about Python crawler to get Sina news content, thanks for your reading and support me.