Handling URLError Exceptions in Python

Let's start by explaining why a URLError might be generated:

No network connection, i.e., the machine cannot access the Internet
Cannot connect to a specific server
Server does not exist

In code, we need to surround and catch the appropriate exception with a try-except statement. Here's an example to get a feel for it first

import urllib2
 
requset = ('')
try:
  (requset)
except , e:
  print

We have utilized the urlopen method to access a non-existent URL, and the result is as follows:

[Errno 11004] getaddrinfo failed

It states that the error code is 11004 and the reason for the error is getaddrinfo failed

HTTPError is a subclass of URLError. When you make a request using the urlopen method, the server will respond with a response object, which contains a numeric "status code". For example, if the response is a "redirect" to a different address for a document, urllib2 will handle this.

Other can not be processed, urlopen will generate a HTTPError, corresponding to the corresponding status? HTTP status code indicates the status of the response returned by the HTTP protocol. The following status code is summarized as follows:

100: Continue The client shall continue to send the request. The client should continue to send the remainder of the request or ignore this response if the request has been completed.
101: Switching Protocols After sending the last blank line of this response, the server will switch to those protocols defined in the Upgrade message header. This should only be done if it is more beneficial to switch to the new protocol.
102: Continue Processing A status code extended by WebDAV (RFC 2518) to represent that processing will continue.
200: request successful Processing: get the content of the response and process it
201: The request completes, resulting in the creation of a new resource. The URI of the newly created resource is available in the response entity Handling: not encountered in the crawler
202: request accepted, but processing not yet complete Processing: blocking wait
204: The request has been fulfilled on the server side, but no new information has been returned. If the client is a user agent, it does not need to update its document view for this purpose. Handling: Discard
300: This status code is not used directly by HTTP/1.0 applications, but only as a default interpretation of a 3XX type response. There is more than one available requested resource. Handling: if the program can handle it, further processing is performed, if the program cannot handle it, it is dropped
301: the requested resource is assigned a permanent URL so that the resource can be accessed in the future via that URL Handling: redirect to the assigned URL
302: the requested resource is temporarily stored at a different URL Handling: redirect to the temporary URL
304: the requested resource is not updated Handling: discarded
400: Illegal request Handling: Discard
401: unauthorized Disposal: discarded
403: Prohibited Disposal: Discard
404: not found Disposal: discarded
500: Internal Server Error The server encountered an unanticipated condition that prevented it from completing processing of the request. Typically, this problem occurs when there is an error in the server-side source code.
501: Server Unrecognized The server does not support a feature required by the current request. When the server cannot recognize the requested method and cannot support its request for any resource.
502: Error Gateway A server working as a gateway or proxy received an invalid response from an upstream server when it attempted to execute the request.
503: Service Error The server is currently unable to process requests due to temporary server maintenance or overload. This condition is temporary and will be restored after a period of time.

HTTPError instances are generated with a code attribute, which is the relevant error number sent by the server.
Because urllib2 can handle redirects for you, which means that codenames beginning with 3 can be processed and numbers in the 100-299 range indicate success, you'll only see error numbers in the 400-599 range.

Let's write an example to get a feel for it. The exception caught is HTTPError, which comes with a code attribute, which is the error code, and we print the reason attribute, which is an attribute of its parent class URLError.

import urllib2
 
req = ('/cqcre')
try:
  (req)
except , e:
  print 
  print

The results of the run are as follows

403
Forbidden

The error code is 403 and the cause of the error is Forbidden, indicating that the server prohibits access.

We know that the parent class of HTTPError is URLError, according to programming experience, the exception of the parent class should be written to the back of the subclass exception, if the subclass can not be captured, then you can capture the exception of the parent class, so the above code can be rewritten this way

import urllib2
 
req = ('/cqcre')
try:
  (req)
except , e:
  print 
except , e:
  print 
else:
  print "OK"

If an HTTPError is caught, the code is output and the URLError exception is not handled again. If something other than an HTTPError occurs, the URLError exception is de-caught and the cause of the error is output.

In addition, you can also add the hasattr attribute to determine the attribute in advance, the code is rewritten as follows

import urllib2
 
req = ('/cqcre')
try:
  (req)
except , e:
  if hasattr(e,"code"):
    print 
  if hasattr(e,"reason"):
    print 
else:
  print "OK"

The abnormal attributes are first judged to avoid reporting errors in the attribute output.

Above, is the introduction of URLError and HTTPError, as well as the corresponding error handling methods, partners cheer!