Python: Simple URL manipulation to add/remove GET args, change URL domain or path

With all the helpful libraries out there, you'd think Python would have an easier way of doing this!

Using urlparse() to get the ParseResult object only allows read-only access to the attributes, making it useless if you want to set the query field.

This solution comes from Ned Batchelder from StackOverflow. I found this to be much more useful than the stuff in urlparse and urllib libraries.

Create a helper class:

To get an idea of what's what:

txt = ""
url = Url(txt)
print "original", txt
print "scheme", url.scheme
print "domain", url.domain
print "path", url.path
print "args", url.args
print "params", url.params
print "query", url.query
print "query_string", url.query_string
print "fragment", url.fragment
print "URL", url.url
print "str", str(url)

Prints out:

scheme: https
path:   /twig/d42c4ec9ecc5eccf3614
args:   {'q': 'url', 'l': 'python'}
query:  q=url&l=python
query_string:   ?q=url&l=python
fragment:       file-python-pil-exif-reader-py-L12


To make changes, just use like this:

u = Url(url)
del u.query['page']
print u.url

*update 23/08/2011*

  • Removed the use of urllib.urlencode() because it's deprecated. Now uses urllib.
  • Added __str__() to accompany __unicode__(). Added build() method.
  • Renamed args to be query.

*update 11/11/2014*

  • Cleaned up the code a bit
  • Added Url.url and Url.query_string attributes
  • Updated comments
  • Renamed Url.netloc to Url.domain
  • Now supports Url and Django's HttpRequest objects in __init__()


Copyright © Twig's Tech Tips
Theme by BloggerThemes & TopWPThemes Sponsored by iBlogtoBlog