1, python operation xml way introduction
See all three ways of inclusion:
- A . * module, which is an implementation of W3CDOMAPI and is suitable if you need to deal with DOMAPI;
- The second is the . * module, which is an implementation of the SAX API that sacrifices convenience for speed and memory usage. SAX is an event-based API, which means that it can handle a huge number of files "on the fly" without having to load them completely into memory;
- The third is modules (ET for short), which provide a lightweight Python-style API that is much faster than DOM and has a lot of pleasant APIs to use, and also provides "in-the-air" processing versus SAX, where there is no need to load the entire file into memory, and the average performance of ET is about the same as SAX, but the APIs are a bit more efficient and easy to use. The average performance of ET is about the same as SAX, but the API is a bit more efficient and easy to use. Documentation
2. ElementTree module
Parses the xml file and gets the root node:
from import ElementTree as ET # 1, open the xml file tree =(r"E:\Acctrue2.0Test\testData\") # Get the content of the xml file to take the root tag. root = () print(root)
3, parse the xml format string and get the root node
Note that the content of the xml format should not have the content of the xml format declaration "<?xml version="1.0" encoding="UTF-8"? >".
If it does, it will parse incorrectly:
content = """ <Document xmlns:xsi="http:///2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="Association Relationship XML Schema-3." License=""> <Events version="3.0"> <Event name="RelationCreate"> <Relation productCode="06970593810109" subTypeNo="06970593810109" cascade="1" packageSpec="50 servings/box." comment="" linkProductCode="" assCorpCode=""> <Batch batchNo="N0530001" madeDate="2022-05-30" validateDate="2023-05-29" workshop="None." lineName="None." lineManager="None."> <Code curCode="010697059381010910N053000117230527" packLayer="1" parentCode="" flag="0" /> </Batch> </Relation> </Event> </Events> </Document> """ root2= (content) print(root2)
4. Read the contents of the node, getroot()
Get tag name, get tag attributes and get tag text:
# 1, open the xml file tree =(r"E:\Acctrue2.0Test\testData\") # Get the content of the xml file to take the root tag. root = () # 2. Read node contents # 2.1 Getting sub-tags under the root tag for child in root: Get the child labels under the root node print() # *.tag is to get the tag name (string type) print( ) # *.attrib is to get tag attributes (dictionary type) for node in child: Get the sub-tag of the sub-tag under the following tag print() print() print() # *.text Get label text
5, through the label label name direct access to the label (find, findall)
find() # This way of writing can only get the first level of sub-tags under the root tag, i.e., you can only query the next level of tags, not the next level of the next level of tags.and the first corresponding tag is found
findall () this approach can only get the root tag under the first level of sub-tags, that is, you can only query the next level of labels, can not query the next level of the next level of labels, but to find the next level of all the labels in line with the name of the label
# 1, open the xml file from import ElementTree as ET # 1, open the xml file from import ElementTree as ET tree =(r"E:\Acctrue2.0Test\testData\") # Get the content of the xml file to take the root tag. root = () print(root) # 2.2 Getting tags by tag name find() events_object = ("Events") # This way of writing can only get the root tag under the first level of sub-tags, that is, you can only query the next level of labels, can not query the next level of the next level of labels, ** and to find the first response to the label ** print(events_object.tag, events_object.attrib) event_object = events_object.find("Event") # Further get the sub-tags of the sub-tags under the heel tag print(event_object.tag,event_object.attrib) # 2.3 Getting tags by tag name findall() events_objects = ("Events") # This way of writing can only get the root tag under the first level of sub-tags, that is, you can only query the next level of labels, can not query the next level of the next level of labels, but you can get the next level of the next level of all the labels in line with the name of the label for event_clee in events_objects: print(event_clee.tag, event_clee.attrib) event_object = event_clee.findall("Event") # Further get the sub-tags of the sub-tags under the heel tag for relation_cell in event_object: print(relation_cell.tag, relation_cell.attrib) # 2.4 findall(xpath) Events_object = ('.//Code') Events_object1 = ('.//Code[@curCode="010697059381010910N053000117230527"]') Events_object2 = ('.//*[@curCode="010697059381010910N053000117230527"]') #Note that the "." in front of the "." cannot be omitted. print(Events_object) print(Events_object1) print(Events_object2[0])
6, full-text search tag name (similar to xpath path to find the label)
from import ElementTree as ET # 1, open the xml file tree =(r"E:\Acctrue2.0Test\testData\") # Get the content of the xml file to take the root tag. root = () print(root) # 2.2 Full-text search for tags by tag name Code_object = ("Code") The full-text search tag is called“Code”labels print(Code_object) for code in Code_object: print(, )
7、Modify the node
from import ElementTree as ET tree =(r"E:\Acctrue2.0Test\testData\") # Get the content of the xml file to take the root tag. root = () relation_object = ("Events").find("Event").iter("Relation") # Get the Relation sub-tag under the first level sub-tag of the first Events for relation_cell in relation_object: relation_cell.set("productCode", "Product code") # Modify the value of the attribute if it has a corresponding attribute, add one if it does not relation_cell.set("productCode2", "Product code 2") relation_cell.find("Batch").find("Code").text="Traceability Code" # Note: If it was a short label before, it automatically becomes a long label after adding the text attribute. tree = (root) ("", encoding="utf-8",short_empty_elements=True) # If the file does not exist,Then create the file,Modify the response if the file already exists
8. Delete nodes
from import ElementTree as ET tree =(r"E:\Acctrue2.0Test\testData\") # Get the content of the xml file to take the root tag. root = () # #################### wrong way to remove ######################## # # Get the response label # Event_object = ("Events").find("Event") # # Delete the corresponding tag # (Event_object) # Deletion can only delete its sub-tag, not the sub-tags under its sub-tag, since relation_object is a sub-tag under the sub-tag, so deletion fails at this point ################## The right way to delete ############################# Events_object = ("Events") Event_object = Events_object.find("Event") Events_object.remove(Event_object) tree = (root) ("", encoding="utf-8") # If the file does not exist,Then create the file,Modify the response if the file already exists
9. Build files
Mode 1 (Element)
The various types of tags are created first, and then the relationships between the tags are established:
from import ElementTree as ET # Create root tags root = ("root") # Create a tag tagName1 tagName1 = ("tagName1", {"tag1Attribute":"AttributeValue1"}) # Create a tag tagName2 tagName2 = ("tagName2", {"tag2Attribute":"AttributeValue2"}) # Create a tag tagName11 tagName11 = ("tagName11", {"tag11Attribute":"AttributeValue11"}) # Create a tag tagName12 tagName12 = ("tagName12", {"tag12Attribute":"AttributeValue12"}) # TagName1 with tags tagName11 and tagName12 added as sub-tags of tagName1 (tagName11) (tagName12) # Add tags tagName1 and tagName2 to root as sub-tags of root. (tagName1) (tagName2) # Save tree = (root) ("", xml_declaration=True, encoding="utf-8",short_empty_elements=True) # xml_declarationWhether or not to include a declaration file, encodingcoding method,short_empty_elements Provisions are short labeled(single label)Or is it double labeled
<?xml version='1.0' encoding='utf-8'?> <root> <tagName1 tag1Attribute="AttributeValue1"> <tagName11 tag11Attribute="AttributeValue11"/> <tagName12 tag12Attribute="AttributeValue12"/> </tagName1> <tagName2 tag2Attribute="AttributeValue2"/> </root>
Mode 2 (makeelement)
from import ElementTree as ET # Create the root node root = ("family") # Create first-level sub-tags son1 = ("son", {"name":"son1"}) son2 = ("son", {"name":"son2"}) # Create secondary sub-tags grandson1 = ("grandson1", {"name":"grandson1"}) grandson2 = ("grandson1", {"name":"grandson2"}) # Associate secondary sub-tags with primary sub-tags (grandson1) (grandson2) # Associate first-level sub-tags with the root tag (son2) (son1) tree = (root) ("../testData/", xml_declaration=True, encoding="utf-8")
Mode 3
This approach establishes the correlation directly when the element is created:
from import ElementTree as ET # Create the root node root = ("family") # Create first-level sub-tags son1 = (root, "son", {"name":"son1"}) son2 = (root,"son", {"name":"son2"}) # Create secondary sub-tags grandson1 = (son1,"grandson1", {"name":"grandson1"}) ="Great-grandson." grandson2 = (son1,"grandson1", {"name":"grandson2"}) ="Little grandson." tree = (root) ("../testData/", xml_declaration=True, encoding="utf-8")
<?xml version='1.0' encoding='utf-8'?> <family> <son name="son1"> <grandson1 name="grandson1">great-grandson</grandson1> <grandson1 name="grandson2">grandson</grandson1> </son> <son name="son2"/> </family>
to this article on python processing xml file operation details of the article is introduced to this, more related python processing xml content please search my previous posts or continue to browse the following related articles I hope you will support me in the future more!