Twig's Tech Tips: Python: Scrape pages and extract information

Posted by twig at 4:43 PM Wednesday, April 21, 2010

I was amazed at how incredibly easy it was to scrape pages using Python.

To download the page markup, use:

 import urllib
 content = urllib.urlopen("http://finance.google.com/finance?q=IBM").read()

Once you have the content, simply use regex to parse the bit you want.

 import re
 m = re.search('class="pr".*?>(.*?)<', content)
 
 if m:
   quote = m.group(1)

Twig's Tech Tips