SoFunction
Updated on 2024-11-15

python processing xml file operation details

1, python operation xml way introduction

See all three ways of inclusion:

  • A . * module, which is an implementation of W3CDOMAPI and is suitable if you need to deal with DOMAPI;
  • The second is the . * module, which is an implementation of the SAX API that sacrifices convenience for speed and memory usage. SAX is an event-based API, which means that it can handle a huge number of files "on the fly" without having to load them completely into memory;
  • The third is modules (ET for short), which provide a lightweight Python-style API that is much faster than DOM and has a lot of pleasant APIs to use, and also provides "in-the-air" processing versus SAX, where there is no need to load the entire file into memory, and the average performance of ET is about the same as SAX, but the APIs are a bit more efficient and easy to use. The average performance of ET is about the same as SAX, but the API is a bit more efficient and easy to use. Documentation

2. ElementTree module

Parses the xml file and gets the root node:

from  import ElementTree as ET

# 1, open the xml file
tree =(r"E:\Acctrue2.0Test\testData\")
# Get the content of the xml file to take the root tag.
root = ()
print(root)

3, parse the xml format string and get the root node

Note that the content of the xml format should not have the content of the xml format declaration "<?xml version="1.0" encoding="UTF-8"? >".

If it does, it will parse incorrectly:

content = """
<Document xmlns:xsi="http:///2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="Association Relationship XML Schema-3." License="">
  <Events version="3.0">
    <Event name="RelationCreate">
      <Relation productCode="06970593810109" subTypeNo="06970593810109" cascade="1" packageSpec="50 servings/box." comment="" linkProductCode="" assCorpCode="">
		<Batch batchNo="N0530001" madeDate="2022-05-30" validateDate="2023-05-29" workshop="None." lineName="None." lineManager="None.">
			<Code curCode="010697059381010910N053000117230527" packLayer="1" parentCode="" flag="0" />
		 </Batch>
      </Relation>
    </Event>
  </Events>
</Document>
"""
root2= (content)
print(root2)

4. Read the contents of the node, getroot()

Get tag name, get tag attributes and get tag text:

# 1, open the xml file
tree =(r"E:\Acctrue2.0Test\testData\")
# Get the content of the xml file to take the root tag.
root = ()
# 2. Read node contents
# 2.1 Getting sub-tags under the root tag
for child in root: Get the child labels under the root node
    print()      # *.tag is to get the tag name (string type)
    print( )  # *.attrib is to get tag attributes (dictionary type)
    for node in child:   Get the sub-tag of the sub-tag under the following tag
        print()
        print()
        print()   # *.text Get label text

5, through the label label name direct access to the label (find, findall)

find() # This way of writing can only get the first level of sub-tags under the root tag, i.e., you can only query the next level of tags, not the next level of the next level of tags.and the first corresponding tag is found
findall () this approach can only get the root tag under the first level of sub-tags, that is, you can only query the next level of labels, can not query the next level of the next level of labels, but to find the next level of all the labels in line with the name of the label

# 1, open the xml file
from  import ElementTree as ET
# 1, open the xml file
from  import ElementTree as ET
tree =(r"E:\Acctrue2.0Test\testData\")
# Get the content of the xml file to take the root tag.
root = ()
print(root)
# 2.2 Getting tags by tag name find()
events_object = ("Events")  # This way of writing can only get the root tag under the first level of sub-tags, that is, you can only query the next level of labels, can not query the next level of the next level of labels, ** and to find the first response to the label **
print(events_object.tag, events_object.attrib)

event_object = events_object.find("Event")   # Further get the sub-tags of the sub-tags under the heel tag
print(event_object.tag,event_object.attrib)

# 2.3 Getting tags by tag name findall()
events_objects = ("Events")  # This way of writing can only get the root tag under the first level of sub-tags, that is, you can only query the next level of labels, can not query the next level of the next level of labels, but you can get the next level of the next level of all the labels in line with the name of the label
for event_clee in events_objects:
    print(event_clee.tag, event_clee.attrib)
    event_object = event_clee.findall("Event")   # Further get the sub-tags of the sub-tags under the heel tag
    for relation_cell in event_object:
        print(relation_cell.tag, relation_cell.attrib)

# 2.4 findall(xpath)
Events_object = ('.//Code')
Events_object1 = ('.//Code[@curCode="010697059381010910N053000117230527"]')
Events_object2 = ('.//*[@curCode="010697059381010910N053000117230527"]')
#Note that the "." in front of the "." cannot be omitted.
print(Events_object)
print(Events_object1)
print(Events_object2[0])

6, full-text search tag name (similar to xpath path to find the label)

from  import ElementTree as ET

# 1, open the xml file
tree =(r"E:\Acctrue2.0Test\testData\")
# Get the content of the xml file to take the root tag.
root = ()
print(root)
# 2.2 Full-text search for tags by tag name
Code_object = ("Code")   The full-text search tag is called“Code”labels
print(Code_object)
for code in Code_object:
    print(, )

7、Modify the node

from  import ElementTree as ET
tree =(r"E:\Acctrue2.0Test\testData\")
# Get the content of the xml file to take the root tag.
root = ()

relation_object = ("Events").find("Event").iter("Relation") # Get the Relation sub-tag under the first level sub-tag of the first Events
for relation_cell in relation_object:
    relation_cell.set("productCode", "Product code")    # Modify the value of the attribute if it has a corresponding attribute, add one if it does not
    relation_cell.set("productCode2", "Product code 2")
    relation_cell.find("Batch").find("Code").text="Traceability Code"  # Note: If it was a short label before, it automatically becomes a long label after adding the text attribute.
tree = (root)
("", encoding="utf-8",short_empty_elements=True)    # If the file does not exist,Then create the file,Modify the response if the file already exists

8. Delete nodes

from  import ElementTree as ET
tree =(r"E:\Acctrue2.0Test\testData\")
# Get the content of the xml file to take the root tag.
root = ()
# #################### wrong way to remove ########################
# # Get the response label
# Event_object = ("Events").find("Event")
# # Delete the corresponding tag
# (Event_object) # Deletion can only delete its sub-tag, not the sub-tags under its sub-tag, since relation_object is a sub-tag under the sub-tag, so deletion fails at this point

################## The right way to delete #############################
Events_object = ("Events")
Event_object = Events_object.find("Event")
Events_object.remove(Event_object)
tree = (root)
("", encoding="utf-8")    # If the file does not exist,Then create the file,Modify the response if the file already exists

9. Build files

Mode 1 (Element)

The various types of tags are created first, and then the relationships between the tags are established:

from  import ElementTree as ET
# Create root tags
root = ("root")
# Create a tag tagName1
tagName1 = ("tagName1", {"tag1Attribute":"AttributeValue1"})
# Create a tag tagName2
tagName2 = ("tagName2", {"tag2Attribute":"AttributeValue2"})
# Create a tag tagName11
tagName11 = ("tagName11", {"tag11Attribute":"AttributeValue11"})
# Create a tag tagName12
tagName12 = ("tagName12", {"tag12Attribute":"AttributeValue12"})
# TagName1 with tags tagName11 and tagName12 added as sub-tags of tagName1
(tagName11)
(tagName12)
# Add tags tagName1 and tagName2 to root as sub-tags of root.
(tagName1)
(tagName2)
# Save
tree = (root)
("", xml_declaration=True,  encoding="utf-8",short_empty_elements=True)
# xml_declarationWhether or not to include a declaration file, encodingcoding method,short_empty_elements Provisions are short labeled(single label)Or is it double labeled
<?xml version='1.0' encoding='utf-8'?>
<root>
    <tagName1 tag1Attribute="AttributeValue1">
        <tagName11 tag11Attribute="AttributeValue11"/>
        <tagName12 tag12Attribute="AttributeValue12"/>
    </tagName1>
    <tagName2 tag2Attribute="AttributeValue2"/>
</root>

Mode 2 (makeelement)

from  import ElementTree as ET
# Create the root node
root = ("family")

# Create first-level sub-tags
son1 = ("son", {"name":"son1"})
son2 = ("son", {"name":"son2"})

# Create secondary sub-tags
grandson1 = ("grandson1", {"name":"grandson1"})
grandson2 = ("grandson1", {"name":"grandson2"})

# Associate secondary sub-tags with primary sub-tags
(grandson1)
(grandson2)
# Associate first-level sub-tags with the root tag
(son2)
(son1)
tree = (root)
("../testData/", xml_declaration=True, encoding="utf-8")

Mode 3

This approach establishes the correlation directly when the element is created:

from  import ElementTree as ET

# Create the root node
root = ("family")
# Create first-level sub-tags
son1 = (root, "son", {"name":"son1"})
son2 = (root,"son", {"name":"son2"})

# Create secondary sub-tags
grandson1 = (son1,"grandson1", {"name":"grandson1"})
="Great-grandson."
grandson2 = (son1,"grandson1", {"name":"grandson2"})
="Little grandson."
tree = (root)
("../testData/", xml_declaration=True, encoding="utf-8")
<?xml version='1.0' encoding='utf-8'?>
<family>
    <son name="son1">
        <grandson1 name="grandson1">great-grandson</grandson1>
        <grandson1 name="grandson2">grandson</grandson1>
    </son>
    <son name="son2"/>
</family>

to this article on python processing xml file operation details of the article is introduced to this, more related python processing xml content please search my previous posts or continue to browse the following related articles I hope you will support me in the future more!