SoFunction
Updated on 2024-11-17

Python manipulation of XML files using the guide

What is XML

XML is an extensible markup language that is similar in appearance to HTML, but XML is used for data representation, whereas HTML is used to define the data being used.XML is specifically designed to send and receive data back and forth between a client and a server. Take a look at the following example:

<?xml version="1.0" encoding="UTF-8"?>
<metadata>
<food>
    <item name="breakfast">Idly</item>
    <price>$2.5</price>
    <description>
   Two idly's with chutney
   </description>
    <calories>553</calories>
</food>
<food>
    <item name="breakfast">Paper Dosa</item>
    <price>$2.7</price>
    <description>
    Plain paper dosa with chutney
    </description>
    <calories>700</calories>
</food>
<food>
    <item name="breakfast">Upma</item>
    <price>$3.65</price>
    <description>
    Rava upma with bajji
    </description>
    <calories>600</calories>
</food>
<food>
    <item name="breakfast">Bisi Bele Bath</item>
    <price>$4.50</price>
    <description>
   Bisi Bele Bath with sev
    </description>
    <calories>400</calories>
</food>
<food>
    <item name="breakfast">Kesari Bath</item>
    <price>$1.95</price>
    <description>
    Sweet rava with saffron
    </description>
    <calories>950</calories>
</food>
</metadata>

The above example shows the contents of a file named "", and the code examples that follow will be based on this XML example.

Python XML Parsing Module

Python allows these XML documents to be parsed using two modules, the module and Minidom (the minimal DOM implementation). Parsing means reading information from a file and breaking it into pieces by recognizing the various parts of a particular XML file. Let's learn more about how to parse XML data using these modules.

module (in software)

This module helps us to format XML data into a tree structure, which is the most natural representation of hierarchical data. The element type allows to store hierarchical data structures in memory with the following properties:

Property Description
Tag A string indicating the type of data being stored
Attributes Consists of many attributes stored as a dictionary
Text String A text string containing the information to be displayed
Tail String Can also have tail strings if necessary
Child Elements Consists of many sub-elements stored as sequences

ElementTree is a class that encapsulates the structure of an element and allows it to be converted to and from XML. Now let's try to parse the above XML file using the python module.

There are two methods that can be usedElementTreeModule parsing files.

The first is the use ofparse() function, and the second is thefromstring() function.parse() function parses the XML document provided as a file, and thefromstring Parses XML when provided as a string, i.e. within triple quotes.

Using the parse() function

As mentioned earlier, this function uses XML in file format for parsing, take a look at the following example:

import  as ET
mytree = ('')
myroot = ()

The first thing we need to do is to import the module and then use theparse() method to parse the "" file.getroot() method returns the root element of "".

When the above code is executed, we will not see the output returned, but as long as there are no errors it indicates that the code has been executed successfully. To check the root element, you can simply use the print statement as shown below:

import  as ET
mytree = ('')
myroot = ()
print(myroot)

Output:

<Element ‘metadata’ at 0x033589F0>

The output above indicates that the root element in our XML document is "metadata".

Using the fromstring() Function

We can also use thefromstring() function to parse string data, we need to pass the XML as a string within triple quotes, as shown below:

import  as ET
data='''<?xml version="1.0" encoding="UTF-8"?>
<metadata>
<food>
    <item name="breakfast">Idly</item>
    <price>$2.5</price>
    <description>
   Two idly's with chutney
   </description>
    <calories>553</calories>
</food>
</metadata>
'''
myroot = (data)
#print(myroot)
print()

The above code will return the same output as the previous one, the XML document used as a string is just a part of the "", which has been used to improve visibility, you can also use the full XML document.

You can also retrieve the root tag using the Tag object, as shown below:

print()

Output:

metadata

Marker string output can also be sliced by specifying only the portion of the string to be seen in the output.

print([0:4])

Output:

meta

As mentioned earlier, tags can also have dictionary attributes. To check if the root tag has any attributes, you can use the "attrib" object as shown below:

print()

Output:

{}

As you can see, the output is an empty dictionary because our root tag has no attributes.

Finding elements of interest

The root also consists of sublabels. To retrieve the sublabels of the root label, use the following command:

print(myroot[0].tag)

Output:

food

Now, if you want to retrieve all the first child tokens of the root, you can iterate over them using a for loop as follows:

for x in myroot[0]:
     print(, )

Output:

item {‘name’: ‘breakfast’}
price {}
description {}
calories {}

All items returned are sub-properties and labels of the food.

To separate text from XML using ElementTree, you can use the text attribute. For example, if you want to retrieve all the information about the first food item, you should use the following code:

for x in myroot[0]:
        print()

Output:

Idly
$2.5
Two idly’s with chutney
553

As you can see, the text message of the first item has been returned as output. Now if you want to display all items at a specific price, you can use theget() method, this method accesses the attributes of the element.

for x in ('food'):
    item =('item').text
    price = ('price').text
    print(item, price)

Output:

Idly $2.5
Paper Dosa $2.7
Upma $3.65
Bisi Bele Bath $4.50
Kesari Bath $1.95

The output above shows all the required items and the price of each item, using ElementTree, and also modifying the XML file.

Modifying XML files

The elements in our XML file can be manipulated, and for this purpose, theset() functions. Let's start by looking at how to add something to XML.

Add to XML:

The following example shows how to add content to the project description.

for description in ('description'):
     new_desc = str()+'wil be served'
      = str(new_desc)
     ('updated', 'yes')
 
('')

write() function helps to create a new xml file and write the updated output to that file, but it is also possible to modify the original file using the same functionality. After executing the above code, you will be able to see that a new file has been created containing the updated results.

The image above shows a description of the modifications to our food program. To add a new sub-label, use theSubElement() method. For example, if you want to add a new specialty label to the first Idly, you can do the following:

(myroot[0], 'speciality')
for x in ('speciality'):
     new_desc = 'South Indian Special'
      = str(new_desc)
 
('')

Output:

As we have seen, a new tab has been added under the first food tab. This can be done by adding a new tag to the[] Specify subscripts in parentheses to add labels at arbitrary positions.

Let's see how to delete items using this module.

Remove from XML:

To useElementTree To delete an attribute or child element, you can use thepop() method, this method will remove the required attributes or elements that are not needed by the user.

myroot[0][0].('name', None)
 
# create a new XML file with the results
('')

Output:

The image above shows that the name attribute has been removed from the item tag. To remove the full tag, you can use the samepop() method, as shown below:

myroot[0].remove(myroot[0][0])
('')

Output:

The output shows that the first child element of the food label has been deleted. To remove all labels, you can use theclear() function, as shown below:

myroot[0].clear()
('')

When the above code is executed, thefood The first sub-tag of the tag will be completely deleted, including all sub-tags.

So far, we've been using the Python XML parser's module. Now let's see how to use theMinidom Parsing XML.

Module

This module is basically used by people who are well-versed in the DOM (Document Object Module), which is usually used by DOM applications to first parse XML into DOM. in the

Use the parse() function:

The first method is used by providing the XML file to be parsed as a parameterparse()function. Example:

from  import minidom
p1 = ("")

After performing this operation, you will be able to split the XML file and get the required data. You can also use this function to parse an open file.

dat=open('')
p2=(dat)

In this case, the variable storing the open file is supplied as an argument to the parse function.

Use the parseString() method:

Use this method when you want to provide XML to be parsed as a string.

p3 = ('<myxml>Using<empty/> parseString</myxml>')

The XML can be parsed using any of the above methods, now let's try to get the data using this module

Finding elements of interest

After my file is parsed, if we try to print it, the returned output shows a message that the variable storing the parsed data is an object of the DOM.

dat=('')
print(dat)

Output:

< object at 0x03B5A308>

Accessing Elements with GetElementsByTagName

tagname= ('item')[0]
print(tagname)

If we try to use theGetElementByTagName method to get the first element, I will see the following output:

<DOM Element: item at 0xc6bd00>

Note that only one output is returned, because for convenience, this uses the[0] subscript, which will be removed in further examples.

To access the value of an attribute, we will have to use thevalue attribute, as shown below:

dat = ('')
tagname= ('item')
print(tagname[0].attributes['name'].value)

Output:

breakfast

To retrieve the data present in these tags, you can use thedata attribute, as shown below:

print(tagname[1].)

Output:

Paper Dosa

It is also possible to usevalue Attribute splitting and retrieving the value of an attribute.

print(items[1].attributes['name'].value)

Output:

breakfast

To print out all the available items in our menu, you can iterate over them and return all of them.

for x in items:
    print()

Output:

Idly
Paper Dosa
Upma
Bisi Bele Bath
Kesari Bath

To count the number of items on our menu, use thelen() function, as shown below:

print(len(items))

Output:

5

The output specifies that our menu contains 5 items.

to this article on the use of Python operation of XML files guide to this article, more related to Python operation of XML content, please search for my previous posts or continue to browse the following related articles I hope that you will support me in the future more!