urllib is a python get url (Uniform Resource Locators, Uniform Resource Locator), you can use it to capture remote data for preservation, this article collated some of the use of urllib on some of the header, proxy, timeout, authentication, exception handling methods.
1. Basic methodology
(url, data=None, [timeout, ]*, cafile=None, capath=None, cadefault=False, context=None)
- url: url to be opened
- data: data submitted by Post
- timeout: set the timeout for accessing the website
Directly use the module's urlopen() to get the page, the page's data format is of type bytes, which needs to be decoded() and converted to type str.
from urllib import request response = (r'/') # < object at 0x00000000048BC908> HTTPResponse type page = () page = ('utf-8')
The urlopen return object provides methods:
- read() , readline() ,readlines() , fileno() , close() : operate on HTTPResponse type data
- info(): return HTTPMessage object, represents the header information returned by the remote server
- getcode(): return Http status code. If it is an http request, 200 request completed successfully; 404 url not found
- geturl(): return the url of the request
1、Simple reading of web page information
import response = ('/') html = ()
2、Use request
(url, data=None, headers={}, method=None)
Use request() to wrap the request and urlopen() to get the page.
import req = ('/') response = (req) the_page = ()
3、Send data to log in to know for example
''''' Created on May 31, 2016 @author: gionee ''' import gzip import re import import import def ungzip(data): try: print("Trying to decompress...") data = (data) print("Decompression complete.") except: print("Uncompressed, no need to decompress") return data def getXSRF(data): cer = ('name=\"_xsrf\" value=\"(.*)\"',flags = 0) strlist = (data) return strlist[0] def getOpener(head): # Cookies handling cj = () pro = (cj) opener = .build_opener(pro) header = [] for key,value in (): elem = (key,value) (elem) = header return opener # header information can be obtained via firebug header = { 'Connection': 'Keep-Alive', 'Accept': 'text/html, application/xhtml+xml, */*', 'Accept-Language': 'en-US,en;q=0.8,zh-Hans-CN;q=0.5,zh-Hans;q=0.3', 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0', 'Accept-Encoding': 'gzip, deflate', 'Host': '', 'DNT': '1' } url = '/' opener = getOpener(header) op = (url) data = () data = ungzip(data) _xsrf = getXSRF(()) url += "login/email" email = "Login account." password = "Login password" postDict = { '_xsrf': _xsrf, 'email': email, 'password': password, 'rememberme': 'y' } postData = (postDict).encode() op = (url,postData) data = () data = ungzip(data) print(())
4. http error
import req = ('http://. ') try: (req) except as e: print() print(().decode("utf8"))
5. Exception handling
from import Request, urlopen from import URLError, HTTPError req = Request(" /") try: response = urlopen(req) except HTTPError as e: print('The server couldn't fulfill the request.') print('Error code: ', ) except URLError as e: print('We failed to reach a server.') print('Reason: ', ) else: print("good!") print(().decode("utf8"))
6、http authentication
import # create a password manager password_mgr = () # Add the username and password. # If we knew the realm, we could use it instead of None. top_level_url = "https:// /" password_mgr.add_password(None, top_level_url, 'rekfan', 'xxxxxx') handler = (password_mgr) # create "opener" (OpenerDirector instance) opener = .build_opener(handler) # use the opener to fetch a URL a_url = "https:// /" x = (a_url) print(()) # Install the opener. # Now all calls to use our opener. .install_opener(opener) a = (a_url).read().decode('utf8') print(a)
7. Use of proxies
import proxy_support = ({'sock5': 'localhost:1080'}) opener = .build_opener(proxy_support) .install_opener(opener) a = (" ").read().decode("utf8") print(a)
8. Timeout
import socket import # timeout in seconds timeout = 2 (timeout) # this call to now uses the default timeout # we have set in the socket module req = ('https:// /') a = (req).read() print(a)
This is the whole content of this article.