Alright, this one had me stumped for a good hour or two.
Take for example a Flickr RSS feed.
Those namespaces are a pain, but it's not too bad if you can sort them out before you use them.
01.
# Some basic setup
02.
from
urllib2
import
urlopen
03.
from
lxml
import
etree
04.
05.
# Namespaces copied straight out of the feed source
06.
namespaces
=
{
07.
'media'
:
"http://search.yahoo.com/mrss/"
,
08.
'dc'
:
"http://purl.org/dc/elements/1.1/"
,
09.
'creativeCommons'
:
"http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html"
,
10.
}
11.
12.
# Untested fetching code, just for understanding
13.
file
=
urlopen(feed_url)
14.
xml
=
etree.parse(file)
15.
item
=
xml.get_root().find(
'channel'
)[
0
]
Now here's the basic structure of an RSS feed item.
01.
<
item
>
02.
<
title
>For the lazy</
title
>
03.
<
link
>http://www.flickr.com/photos/handles/7429952158/<;/
link
>
04.
<
description
>blah blah blah</
description
>
05.
<
pubDate
>Sat, 23 Jun 2012 21:59:07 -0700</
pubDate
>
06.
<
dc:date.Taken
>2012-06-24T14:58:54-08:00</
dc:date.Taken
>
07.
<
author
flickr:profile
=
"http://www.flickr.com/people/handles/"
>Handles</
author
>
08.
<
guid
isPermaLink
=
"false"
>tag:flickr.com,2004:/photo/7429952158</
guid
>
09.
<
media:content
url
=
"http://farm6.staticflickr.com/5347/7429952158_962a849b30_b.jpg"
type
=
"image/jpeg"
height
=
"1024"
width
=
"768"
/>
10.
<
media:title
>For the lazy</
media:title
>
11.
<
media:thumbnail
url
=
"http://farm6.staticflickr.com/5347/7429952158_962a849b30_s.jpg"
height
=
"75"
width
=
"75"
/>
12.
<
media:credit
role
=
"photographer"
>Handles</
media:credit
>
13.
</
item
>
To get information off those elements, you'll need some slightly different syntax.
01.
# Now to fetch the data from the namespaced elements
02.
media_title
=
item.find(
"{%s}title"
%
namespaces[
'media'
]).text
03.
04.
media_thumbnail
=
media_title
=
item.find(
"{%s}thumbnail"
%
namespaces[
'media'
])
05.
thumbnail
=
{
06.
'url'
: media_thumbnail.get(
'url'
),
07.
'width'
: media_thumbnail.get(
'width'
),
08.
'height'
: media_thumbnail.get(
'height'
),
09.
}