Python: Reading EXIF and IPTC tags from JPG/TIFF image files

0 Comments

Wow this was a bit of an "antigravity" moment for me (XKCD #353).

All I really needed to do was choose my poison:

  • Pillow (an actively maintained PIL fork which unofficially supports basic EXIF tag reading)
  • ExifRead (EXIF reader) v1.4.2 at time of writing
  • IPTCInfo (IPTC reader) v1.9.5-6 at time of writing

Notes

I'd just like to point out some things I learned the hard way.

  • If something isn't showing up in the EXIF tag data, then it's most likely saved as IPTC metadata (like title, subject and tags/comments in the screenshot below).
  • The fields XPTitle, XPComment, XPAuthor, XPKeywords and XPSubject are encoded in UCS2 (UTF-16). This will need to be converted, unless you like dealing with hex strings or byte arrays.
  • Combining an EXIF and IPTC method together will be your best way of extracting metadata from the file.

 

image

Using Pillow

This one reads some of the basic EXIF data but tended to be a bit clunky to use.

Helper function:

from PIL import Image, ExifTags
tag_name_to_id = dict([ (v, k) for k, v in ExifTags.TAGS.items() ])
# These I got from reading in files and matching to http://www.exiv2.org/tags.html
# You'll have to map your own if something isn't recognised
tag_name_to_id[270] = 'ImageDescription'
tag_name_to_id[306] = 'DateTime'
tag_name_to_id[256] = 'ImageWidth'
tag_name_to_id[257] = 'ImageLength'
tag_name_to_id[258] = 'BitsPerSample'
tag_name_to_id[40962] = 'PixelXDimension'
tag_name_to_id[40963] = 'PixelYDimension'
tag_name_to_id[305] = 'Software'
tag_name_to_id[37510] = 'UserComment'
tag_name_to_id[40091] = 'XPTitle'
tag_name_to_id[40092] = 'XPComment'
tag_name_to_id[40093] = 'XPAuthor'
tag_name_to_id[40094] = 'XPKeywords'
tag_name_to_id[40095] = 'XPSubject'
tag_name_to_id[40961] = 'ColorSpace' # Bit depth
tag_name_to_id[315] = 'Artist'
tag_name_to_id[33432] = 'Copyright'
def convert_exif_to_dict(exif):
"""
This helper function converts the dictionary keys from
IDs to strings so your code is easier to read.
"""
data = {}
if exif is None:
return data
for k,v in exif.items():
if k in tag_name_to_id:
data[tag_name_to_id[k]] = v
else:
data[k] = v
# These fields are in UCS2/UTF-16, convert to something usable within python
for k in ['XPTitle', 'XPComment', 'XPAuthor', 'XPKeywords', 'XPSubject']:
if k in data:
data[k] = data[k].decode('utf-16').rstrip('\x00')
return data

And to use it, simply pass in the values from Image._getexif(). The helper function convert_exif_to_dict() will make it easier for you to read information off, rather than referring to data by ID.

im = Image.open(filename)
im.verify()
if im.format in ['JPG', 'TIFF']:
exif = convert_exif_to_dict(im._getexif())
print exif['XPTitle']

Would I recommend it? Probably not. ExifRead below is a much better candidate.

Using ExifRead

This alternative to using PIL can extract much more information and also makes it easier to fetch information, but it also does require you to install another library.

# pip install exifread
import exifread
f = open(filename)
exif = exifread.process_file(f)
f.close()
# Convert byte array to unicode
for k in ['Image XPTitle', 'Image XPComment', 'Image XPAuthor', 'Image XPKeywords', 'Image XPSubject']:
if k in exif:
exif[k].values = u"".join(map(unichr, exif[k].values)).decode('utf-16')
if 'Image XPTitle' in exif:
print "Title", exif['Image ImageDescription'].values
if 'Image ImageDescription' in exif:
print "Description", exif['Image ImageDescription'].values

As you can see, this one is much cleaner to use, except for the weird problem where you have to access exif['something'].values instead of exif['something'].

It's not gonna ruin your day, but I guess you could write a helper function for it if it really bothered you.

There's also another bit in the snippet where I use join/map/unichr. That's because the values for those fields need to be converted from an array of bytes to a string which Python can understand.

Using IPTCInfo

Lastly there's IPTC, which I've NEVER heard of until I started trying to pull metadata from JPG/TIFF files. This was my antigravity moment where I just looked for a library, installed it and "it just works".

# pip install iptcinfo
import iptcinfo
im = Image.open(filename)
im.verify()
# Not sure what other formats are supported, I never looked into it.
if im.format in ['JPG', 'TIFF']:
try:
iptc = iptcinfo.IPTCInfo(filename)
image_title = iptc.data.get('object name', '') or iptc.data.get('headline', '')
image_description = iptc.data.get('caption/abstract', '')
image_tags = iptc.keywords
except Exception, e:
if str(e) != "No IPTC data found.":
raise

Sadly, it feels a little bit like a Java developer wrote it. Mainly because of little things like iptc.getData() where you can just use iptc.data.

You'll also feel a little lost without knowing which data keys to use, so can get a list of all the possible data key names by examining:

1.from iptcinfo import c_datasets, c_datasets_r

Those two dictionary objects will tell you what's available so it shouldn't be too hard to write a nice little wrapper for it. These kinda feel like Java enums =\

GZUKvFD
Are you ready to handle all this big data?

Sources

 
Copyright © Twig's Tech Tips
Theme by BloggerThemes & TopWPThemes Sponsored by iBlogtoBlog