SoFunction
Updated on 2024-11-12

Encountered in python using xpath: what exactly?

preamble

Everyone in the process of learning python crawler, will find a problem, the syntax I read through, said also very detailed, I also seriously look at the crawler or will not write, or no ideas, so all my articles will be from the perspective of the example to parse some of the common problems and reporting errors. The following words do not say much, come together to see the details.

What's Element?

Back to the topic, after everyone's giddy reading of the complicated grammar, they can't wait to write something, and then some of the students may have encountered this

<Element a at 0x39a9a80>

Or something like Element a at 0x??????? , such a value, and then everyone with the problem to search, and then all the English ah, what a whole lot of mess ah, the English is not good students on the collapse, here, I will focus on analyzing the

In a sense, this value that you get when you print the variable, it's actually a list, and then each value in the list is a dictionary

See the half-finished driving example for an understanding of how to use it, proving that I'm very good at combining learning and fun and solving everyday needs in a tangible way, funny face.jpg

from bs4 import BeautifulSoup
from lxml import etree
import requests
gjc='SHKD-700'
#define URL
html = "/search/"+gjc+"-hot-desc-1"
#Decode the URL
html = (html).('utf-8')
# Parse to xml
dom_tree = (html)
# Locate the node in xml, returns a list
links = dom_tree.xpath("//a[@class='download']")
for index in range(len(links)):
 # links[index] returns a dictionary
 if (index % 2) == 0:
  print(links[index].tag)
  print(links[index].attrib)
  print(links[index].text)

an illustrative analysis

Focusing on this code below, the

  print(links[index])
  print(type(links[index]))
  print(links[index].tag)#Get <a> tag name a
  print(links[index].attrib)# Get attributes href and class of <a> tag
  print(links[index].text)#gain<a>The text part of the label

The printout is

<Element a at 0x3866a58>
<class '._Element'>
a
{'href': 'magnet:?xt=urn:btih:7502edea0dfe9c2774f95118db3208a108fe10ca', 'class': 'download'}
magnetic link

The html code for this node is

<a href="magnet:xt=urn:btih:7502edea0dfe9c2774f95118db3208a108fe10ca" rel="external nofollow" class="download">magnetic link</a>

Seeing this should give you a very beastly idea of how the three attributes are used.

summarize

  • The Element type is '. _Element', which is in a sense also a list of
  • The list's need to use three different attributes of tag\attrib\text to get what we need
  • The variable .tag gets that the tag name is --- string
  • The variable .attrib gets the attributes of the node label a - dictionary
  • The variable .text gets the label text - string

Welcome to favorite, refused to reproduce, because at present I am also self-taught forward groping, these are my current cognitive things, there must be inaccuracies, do not want to mislead to others!

Well, the above is the full content of this article, I hope that the content of this article on your learning or work has a certain reference learning value, if there are questions you can leave a message to exchange, thank you for my support.