In a sample XML string such as:
01.
<?
xml
version
=
"1.0"
encoding
=
"utf-8"
?>
02.
<
YourRoot
>
03.
<
Group
Found
=
"1000"
>
04.
<
Item
ID
=
"1"
>
05.
<
Title
>Something silly</
Title
>
06.
<
Summary
>
<![CDATA[Blah blah blah...]]>
</
Summary
>
07.
<
Location
>
08.
<
Country
>Australia</
Country
>
09.
<
State
>NSW</
State
>
10.
<
City
>Sydney</
City
>
11.
<
PostalCode
>2002</
PostalCode
>
12.
</
Location
>
13.
</
Item
>
14.
<
Item
ID
=
"2"
>
15.
...
16.
</
Item
>
17.
<
Item
ID
=
"3"
>
18.
...
19.
</
Item
>
20.
</
Group
>
21.
</
YourRoot
>
Reading the data
Using ElementTree, you can either read directly from the file or load it into a string first. Include the following import.
1.
from
xml.etree.ElementTree
import
ElementTree
2.
from
xml.parsers.expat
import
ExpatError
If you are using a string:
1.
from
xml.etree.ElementTree
import
fromstring
2.
3.
try
:
4.
tree
=
fromstring(xml_data)
5.
except
ExpatData:
6.
print
"Unable to parse XML data from string"
Otherwise, to load it directly:
1.
try
:
2.
tree
=
ElementTree(file
=
"filename"
)
3.
except
ExpatData:
4.
print
"Unable to parse XML from file"
Once you have the tree initialised, you can begin parsing the information.
As ElementTree is quite versatile, there are a few options for parsing the data. You can choose between being lazy or specific.
Being lazy and get all "Item" elements from XML
For simple XML data like RSS feeds, this is usually enough to get by.
01.
def
parse_results(
self
, tree):
02.
results
=
[]
03.
04.
for
item
in
tree.getiterator(
'Item'
):
05.
location
=
element.find(
'Location'
)
06.
07.
results.append({
'id'
: element.get(
'ID'
),
08.
'title'
: element.find(
'Title'
).text,
09.
'summary'
: element.find(
'Summary'
).text,
10.
'location'
: {
11.
'country'
: location.find(
'Country'
).text
if
location.find(
'Country'
)
is
not
None
else
'',
12.
'state'
: location.find(
'State'
).text
if
location.find(
'State'
)
is
not
None
else
'',
13.
'city'
: location.find(
'City'
).text
if
location.find(
'City'
)
is
not
None
else
'',
14.
'postcode'
: location.find(
'PostalCode'
).text
if
location.find(
'PostalCode'
)
is
not
None
else
'',
15.
},
16.
})
From the example, element.get('ID') reads the element attribute and element.find('Title').text returns the element value.
The code checks for information within location before reading from it, otherwise it defaults to an empty string.
Being picky and navigating the XML paths manually
Depending on how complex the XML structure is, you may have to navigate some of it manually.
01.
def
parse_results(
self
, tree):
02.
results
=
[]
03.
group
=
tree.find(
"YourRoot/Group"
)
04.
05.
for
item
in
group.getiterator(
'Item'
):
06.
location
=
element.find(
'Location'
)
07.
08.
results.append({
09.
# Exactly the same as above...
10.
},
11.
})
This time we navigate the tree a little by using tree.find("YourRoot/Group") to tell ElementTree that we want the specific element.
Then we iterate all "Item" elements in "Group" as per usual.