Python: How to parse XML/RSS feeds with namespaces using lxml.etree

Alright, this one had me stumped for a good hour or two.

Take for example a Flickr RSS feed.


Those namespaces are a pain, but it's not too bad if you can sort them out before you use them.

# Some basic setup
from urllib2 import urlopen
from lxml import etree

# Namespaces copied straight out of the feed source
namespaces = {
'media': "",
'dc': "",
'creativeCommons': "",

# Untested fetching code, just for understanding
file = urlopen(feed_url)
xml = etree.parse(file)
item = xml.get_root().find('channel')[0]

Now here's the basic structure of an RSS feed item.

<title>For the lazy</title>
<description>blah blah blah</description>
<pubDate>Sat, 23 Jun 2012 21:59:07 -0700</pubDate>
<author flickr:profile="">Handles</author>
<guid isPermaLink="false">,2004:/photo/7429952158</guid>
<media:content url="" type="image/jpeg" height="1024" width="768"/>
<media:title>For the lazy</media:title>
<media:thumbnail url="" height="75" width="75" />
<media:credit role="photographer">Handles</media:credit>

To get information off those elements, you'll need some slightly different syntax.

# Now to fetch the data from the namespaced elements
media_title = item.find("{%s}title" % namespaces['media']).text

media_thumbnail = media_title = item.find("{%s}thumbnail" % namespaces['media'])
thumbnail = {
'url': media_thumbnail.get('url'),
'width': media_thumbnail.get('width'),
'height': media_thumbnail.get('height'),

Problem solved, like a boss!


Copyright © Twig's Tech Tips
Theme by BloggerThemes & TopWPThemes Sponsored by iBlogtoBlog