pre-conditions
Familiar with basic HTML statements
Familiarity with basic Xpath statements
Introduction
Python is a cross-platform computer programming language. It is a high-level scripting language that combines interpreted, compiled, interactive, and object-oriented features. Originally designed for writing automated scripts (shells), it has been used more and more for stand-alone, large-scale projects as versions have been updated and new features have been added to the language.Requests is a useful Python HTTP client library.Pandas is a Python package that provides fast, flexible, and expressive data structures designed to make structured (tabular, multidimensional, potentially heterogeneous) and temporal data structures more accessible. (tabular, multidimensional, potentially heterogeneous) and time-series data to be both easy and intuitive to work with. time is a python standard library that requires no additional downloads, and is primarily used for working with time. lxml is an XML and HTML parser whose main function is to parse and extract data from XML and HTML; lxml, like canonicals, is also implemented in C and is a high-performance The python HTML and XML parser can also utilize XPath syntax to locate specific elements and node information.
HTML is Hypertext Markup Language, mainly used for displaying data, his focus is on the appearance of the data XML is Extensible Markup Language, mainly used for transmitting and storing data, his focus is on the content of the data
Experimental Objectives:Python crawl 2022 Spring Festival movie information
experimental environment
Python (object-oriented high-level language)
Resquest 2.14.2 (python third-party library)
Pandas 1.1.0 (python third-party library)
Time (python standard library)
Lxml (python third-party library)
concrete step
Target website
/cinema/later/shenzhen/
Analyzing the website
Press F12 to open the browser console
Press Ctrl+Shift+C shortcut key
Press the Ctrl+F shortcut and the search box appears on the console
Copy Xpath
Xpath is //*[@id="showing-soon"]/div[1]/div/h3/a
Paste it into the search box and verify the Xpath
Check out the HTML for commonalities
Find that the target elements are all in a div box, modify the Xpath
Xpath changed to //*[@id="showing-soon"]/div/div/h3/a
The remaining target elements, and so on
Finally, save as a CSV file with Pandas
# Save files using pandas df = () df['Release date'] = Ondate df['Title'] = name df['Type'] = movie_class df['Country/region of production'] = area df['Would like to see the number of people'] = num df['Hyperlink'] = href
code implementation
# -*- coding: utf-8 -*- """ Created on Tue Jan 25 10:07:11 2022 @author: TFX """ import time import requests # Request library import pandas as pd from lxml import etree# Extract the information base # Date today = ('%Y{y}%m{m}%d{d}',()).format(y='Year',m='Moon',d='Day') # Web site url = '/cinema/later/shenzhen/' # Request header headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36' } # Send request response = (url=url,headers=headers) # Data parsing, xpath can be obtained by inspecting the elements with the browser. html = () #Type transformations # Movie detail hyperlinks href = ('//*[@]/div/div/h3/a/@href') # Release date Ondate = ('//*[@]/div/div/ul/li[1]/text()') # Title name = ('//*[@]/div/div/h3/a/text()') # Type movie_class = ('//*[@]/div/div/ul/li[2]/text()') # Country/region of production area = ('//*[@]/div/div/ul/li[3]/text()') # of people wanting to see num = ('//*[@]/div/div/ul/li[4]/span/text()') # Save files using pandas df = () df['Release date'] = Ondate df['Title'] = name df['Type'] = movie_class df['Country/region of production'] = area df['Would like to see the number of people'] = num df['Hyperlink'] = href df.to_csv('2022 Spring Festival Movie_'+today+'.csv',mode='w',index=None,encoding='gbk') print('Save complete!')
output result
summarize
To this article on the Python crawl 2022 Spring Festival movie information is introduced to this article, more related Python Spring Festival movie information content please search my previous articles or continue to browse the following related articles I hope you will support me more in the future!