preamble
National Bureau of Statistics website has a relatively flush administrative code, for some websites this is very basic data, so wrote a Python program to capture this part of the data.
Attention:After the capture, there is also a simple manual organization
Sample code:
# -*- coding:utf-8 -*- ''' Getting the administrative code on the Office for National Statistics ''' import requests,re base_url = '/tjsj/tjbz/xzqhdm/201504/t20150415_712722.html' def get_xzqh(): html_data = (base_url).content pattern = ('<p class="MsoNormal" style=".*?"><span lang="EN-US" style=".*?">(\d+)<span>.*?</span></span><span style=".*?">(.*?)</span></p>') areas = (pattern,html_data) print "code,name,level" for area in areas: print area[0],area[1].decode('utf-8').replace(u' ',''),area[1].decode('utf-8').count(u' ') if __name__=='__main__': get_xzqh()
Caveats:
In addition, there is another way to get the information about the country and region table, which is the country and region information table that comes with the QQ software. (The file name is), the general storage location is:
C:\Program Files\Tencent\QQ\I18N\2052
If you need the Chinese version, install the Chinese version of QQ to get it, and if you need the English version, install the English version of QQ. The international version is in the 1033 directory.
The codes are all written to ISO 3166 standards and are extremely easy to import into the database.
summarize
Above is the use of Python to obtain the administrative code of all the content, I hope that the content of this article on everyone to learn or use python can help, if there are questions you can leave a message to exchange.