Let's start by explaining why a URLError might be generated:
- No network connection, i.e., the machine cannot access the Internet
- Cannot connect to a specific server
- Server does not exist
In code, we need to surround and catch the appropriate exception with a try-except statement. Here's an example to get a feel for it first
import urllib2 requset = ('') try: (requset) except , e: print
We have utilized the urlopen method to access a non-existent URL, and the result is as follows:
[Errno 11004] getaddrinfo failed
It states that the error code is 11004 and the reason for the error is getaddrinfo failed
HTTPError is a subclass of URLError. When you make a request using the urlopen method, the server will respond with a response object, which contains a numeric "status code". For example, if the response is a "redirect" to a different address for a document, urllib2 will handle this.
Other can not be processed, urlopen will generate a HTTPError, corresponding to the corresponding status? HTTP status code indicates the status of the response returned by the HTTP protocol. The following status code is summarized as follows:
- 100: Continue The client shall continue to send the request. The client should continue to send the remainder of the request or ignore this response if the request has been completed.
- 101: Switching Protocols After sending the last blank line of this response, the server will switch to those protocols defined in the Upgrade message header. This should only be done if it is more beneficial to switch to the new protocol.
- 102: Continue Processing A status code extended by WebDAV (RFC 2518) to represent that processing will continue.
- 200: request successful Processing: get the content of the response and process it
- 201: The request completes, resulting in the creation of a new resource. The URI of the newly created resource is available in the response entity Handling: not encountered in the crawler
- 202: request accepted, but processing not yet complete Processing: blocking wait
- 204: The request has been fulfilled on the server side, but no new information has been returned. If the client is a user agent, it does not need to update its document view for this purpose. Handling: Discard
- 300: This status code is not used directly by HTTP/1.0 applications, but only as a default interpretation of a 3XX type response. There is more than one available requested resource. Handling: if the program can handle it, further processing is performed, if the program cannot handle it, it is dropped
- 301: the requested resource is assigned a permanent URL so that the resource can be accessed in the future via that URL Handling: redirect to the assigned URL
- 302: the requested resource is temporarily stored at a different URL Handling: redirect to the temporary URL
- 304: the requested resource is not updated Handling: discarded
- 400: Illegal request Handling: Discard
- 401: unauthorized Disposal: discarded
- 403: Prohibited Disposal: Discard
- 404: not found Disposal: discarded
- 500: Internal Server Error The server encountered an unanticipated condition that prevented it from completing processing of the request. Typically, this problem occurs when there is an error in the server-side source code.
- 501: Server Unrecognized The server does not support a feature required by the current request. When the server cannot recognize the requested method and cannot support its request for any resource.
- 502: Error Gateway A server working as a gateway or proxy received an invalid response from an upstream server when it attempted to execute the request.
- 503: Service Error The server is currently unable to process requests due to temporary server maintenance or overload. This condition is temporary and will be restored after a period of time.
HTTPError instances are generated with a code attribute, which is the relevant error number sent by the server.
Because urllib2 can handle redirects for you, which means that codenames beginning with 3 can be processed and numbers in the 100-299 range indicate success, you'll only see error numbers in the 400-599 range.
Let's write an example to get a feel for it. The exception caught is HTTPError, which comes with a code attribute, which is the error code, and we print the reason attribute, which is an attribute of its parent class URLError.
import urllib2 req = ('/cqcre') try: (req) except , e: print print
The results of the run are as follows
403 Forbidden
The error code is 403 and the cause of the error is Forbidden, indicating that the server prohibits access.
We know that the parent class of HTTPError is URLError, according to programming experience, the exception of the parent class should be written to the back of the subclass exception, if the subclass can not be captured, then you can capture the exception of the parent class, so the above code can be rewritten this way
import urllib2 req = ('/cqcre') try: (req) except , e: print except , e: print else: print "OK"
If an HTTPError is caught, the code is output and the URLError exception is not handled again. If something other than an HTTPError occurs, the URLError exception is de-caught and the cause of the error is output.
In addition, you can also add the hasattr attribute to determine the attribute in advance, the code is rewritten as follows
import urllib2 req = ('/cqcre') try: (req) except , e: if hasattr(e,"code"): print if hasattr(e,"reason"): print else: print "OK"
The abnormal attributes are first judged to avoid reporting errors in the attribute output.
Above, is the introduction of URLError and HTTPError, as well as the corresponding error handling methods, partners cheer!