request
Use urllib to deal with web authentication and cookies, you need to write Opener and Handler to deal with, it is very inconvenient, here we learn a more powerful library request
get()
Example:
import requests #Importing requests html = ('/')# Use the get method to get page information print()#retrieve (data)textAttribute View Page Code
Add parameters using param+dictionary
import requests # Import requests data = { 'jl': '765', 'kw': 'python', 'kt': '3' } html = ('/',params=data) # Add parameters print() # retrieve (data)textAttribute View Page Code
To add headers use headers+dictionary
import requests # Import requests headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36' } data = { 'jl': '765', 'kw': 'python', 'kt': '3' } html = ('/',headers=headers,params=data) # Add parameters print() # retrieve (data)textAttribute View Page Code
Advanced Usage
Cookie settings, proxy settings, etc.
Cookies
Get cookies.
import requests # Import requests headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36' } data = { 'jl': '765', 'kw': 'python', 'kt': '3' } html = ('/qq_40966461/article/details/104974998',headers=headers,params=data) # Add parameters print() # Call the text attribute to view the page code for key,value in (): print(key+'='+value)
It's easy, just get the cookies attribute directly
Maintain session Session()
In requests, if the direct use of get () or post () and other methods can be done to simulate the web page request, but this is actually the equivalent of a different session, equivalent to the use of two browsers to open a different page, then you need to use session objects to maintain the conversation
import requests # Import requests headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36' } data = { 'jl': '765', 'kw': 'python', 'kt': '3' } html = ().get('/qq_40966461/article/details/104974998',headers=headers,params=data) # Add parameters print() # Call the text attribute to view the page code for key,value in (): print(key+'='+value)
Create a session object when calling the get method in the requests module.
SSL Certificate Validation
import requests # Import requests headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36' } response = ('http://',headers=headers,verify = False) print(response.status_code)
Verify=False is sufficient.
Proxy Settings
import requests # Import requests headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36' } proxies = { "http":"http://183.166.132.176", "https":"https://183.166.132.176" } response = ('http://',headers=headers,proxies=proxies,verify = False) print(response.status_code)
Just add proxies and search for fast proxies.
timeout setting
Add parameter timeout= 1
authenticate
Add parameter auth=('username','password') to get
OAuth authentication method
to this article on the basic use of Python crawler basic library request article is introduced to this, more related Python crawler request library content please search for my previous posts or continue to browse the following related articles I hope you will support me in the future!