SoFunction
Updated on 2024-11-15

Basic use of the basic Python crawler library request

request

Use urllib to deal with web authentication and cookies, you need to write Opener and Handler to deal with, it is very inconvenient, here we learn a more powerful library request

get()

Example:

import requests #Importing requests
html = ('/')# Use the get method to get page information
print()#retrieve (data)textAttribute View Page Code

Add parameters using param+dictionary

import requests  # Import requests
data = {
    'jl': '765',
    'kw': 'python',
    'kt': '3'
}
html = ('/',params=data)  # Add parameters
print()  # retrieve (data)textAttribute View Page Code

To add headers use headers+dictionary

import requests  # Import requests
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36'
}
data = {
    'jl': '765',
    'kw': 'python',
    'kt': '3'
}
html = ('/',headers=headers,params=data)  # Add parameters
print()  # retrieve (data)textAttribute View Page Code

Advanced Usage

Cookie settings, proxy settings, etc.

Cookies

Get cookies.

import requests  # Import requests
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36'
}
data = {
    'jl': '765',
    'kw': 'python',
    'kt': '3'
}
html = ('/qq_40966461/article/details/104974998',headers=headers,params=data)  # Add parameters
print()  # Call the text attribute to view the page code
for key,value in ():
    print(key+'='+value)

It's easy, just get the cookies attribute directly

Maintain session Session()

In requests, if the direct use of get () or post () and other methods can be done to simulate the web page request, but this is actually the equivalent of a different session, equivalent to the use of two browsers to open a different page, then you need to use session objects to maintain the conversation

import requests  # Import requests
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36'
}
data = {
    'jl': '765',
    'kw': 'python',
    'kt': '3'
}
html = ().get('/qq_40966461/article/details/104974998',headers=headers,params=data)  # Add parameters
print()  # Call the text attribute to view the page code
for key,value in ():
    print(key+'='+value)

Create a session object when calling the get method in the requests module.

SSL Certificate Validation

import requests  # Import requests
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36'
}
response  = ('http://',headers=headers,verify = False)
print(response.status_code)

Verify=False is sufficient.

Proxy Settings

import requests  # Import requests
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36'
}
proxies = {
    "http":"http://183.166.132.176",
    "https":"https://183.166.132.176"
}
response  = ('http://',headers=headers,proxies=proxies,verify = False)
print(response.status_code)

Just add proxies and search for fast proxies.

timeout setting

Add parameter timeout= 1

authenticate

Add parameter auth=('username','password') to get

OAuth authentication method

to this article on the basic use of Python crawler basic library request article is introduced to this, more related Python crawler request library content please search for my previous posts or continue to browse the following related articles I hope you will support me in the future!