preamble
Everyone in the process of learning python crawler, will find a problem, the syntax I read through, said also very detailed, I also seriously look at the crawler or will not write, or no ideas, so all my articles will be from the perspective of the example to parse some of the common problems and reporting errors. The following words do not say much, come together to see the details.
What's Element?
Back to the topic, after everyone's giddy reading of the complicated grammar, they can't wait to write something, and then some of the students may have encountered this
<Element a at 0x39a9a80>
Or something like Element a at 0x??????? , such a value, and then everyone with the problem to search, and then all the English ah, what a whole lot of mess ah, the English is not good students on the collapse, here, I will focus on analyzing the
In a sense, this value that you get when you print the variable, it's actually a list, and then each value in the list is a dictionary
See the half-finished driving example for an understanding of how to use it, proving that I'm very good at combining learning and fun and solving everyday needs in a tangible way, funny face.jpg
from bs4 import BeautifulSoup from lxml import etree import requests gjc='SHKD-700' #define URL html = "/search/"+gjc+"-hot-desc-1" #Decode the URL html = (html).('utf-8') # Parse to xml dom_tree = (html) # Locate the node in xml, returns a list links = dom_tree.xpath("//a[@class='download']") for index in range(len(links)): # links[index] returns a dictionary if (index % 2) == 0: print(links[index].tag) print(links[index].attrib) print(links[index].text)
an illustrative analysis
Focusing on this code below, the
print(links[index]) print(type(links[index])) print(links[index].tag)#Get <a> tag name a print(links[index].attrib)# Get attributes href and class of <a> tag print(links[index].text)#gain<a>The text part of the label
The printout is
<Element a at 0x3866a58> <class '._Element'> a {'href': 'magnet:?xt=urn:btih:7502edea0dfe9c2774f95118db3208a108fe10ca', 'class': 'download'} magnetic link
The html code for this node is
<a href="magnet:xt=urn:btih:7502edea0dfe9c2774f95118db3208a108fe10ca" rel="external nofollow" class="download">magnetic link</a>
Seeing this should give you a very beastly idea of how the three attributes are used.
summarize
- The Element type is '. _Element', which is in a sense also a list of
- The list's need to use three different attributes of tag\attrib\text to get what we need
- The variable .tag gets that the tag name is --- string
- The variable .attrib gets the attributes of the node label a - dictionary
- The variable .text gets the label text - string
Welcome to favorite, refused to reproduce, because at present I am also self-taught forward groping, these are my current cognitive things, there must be inaccuracies, do not want to mislead to others!
Well, the above is the full content of this article, I hope that the content of this article on your learning or work has a certain reference learning value, if there are questions you can leave a message to exchange, thank you for my support.