Wow this was a bit of an "antigravity" moment for me (XKCD #353).
All I really needed to do was choose my poison:
- Pillow (an actively maintained PIL fork which unofficially supports basic EXIF tag reading)
- ExifRead (EXIF reader) v1.4.2 at time of writing
- IPTCInfo (IPTC reader) v1.9.5-6 at time of writing
I'd just like to point out some things I learned the hard way.
- If something isn't showing up in the EXIF tag data, then it's most likely saved as IPTC metadata (like title, subject and tags/comments in the screenshot below).
- The fields XPTitle, XPComment, XPAuthor, XPKeywords and XPSubject are encoded in UCS2 (UTF-16). This will need to be converted, unless you like dealing with hex strings or byte arrays.
- Combining an EXIF and IPTC method together will be your best way of extracting metadata from the file.
This one reads some of the basic EXIF data but tended to be a bit clunky to use.
And to use it, simply pass in the values from Image._getexif(). The helper function convert_exif_to_dict() will make it easier for you to read information off, rather than referring to data by ID.
Would I recommend it? Probably not. ExifRead below is a much better candidate.
This alternative to using PIL can extract much more information and also makes it easier to fetch information, but it also does require you to install another library.
As you can see, this one is much cleaner to use, except for the weird problem where you have to access exif['something'].values instead of exif['something'].
It's not gonna ruin your day, but I guess you could write a helper function for it if it really bothered you.
There's also another bit in the snippet where I use join/map/unichr. That's because the values for those fields need to be converted from an array of bytes to a string which Python can understand.
Lastly there's IPTC, which I've NEVER heard of until I started trying to pull metadata from JPG/TIFF files. This was my antigravity moment where I just looked for a library, installed it and "it just works".
Sadly, it feels a little bit like a Java developer wrote it. Mainly because of little things like iptc.getData() where you can just use iptc.data.
You'll also feel a little lost without knowing which data keys to use, so can get a list of all the possible data key names by examining:
from iptcinfo import c_datasets, c_datasets_r
Those two dictionary objects will tell you what's available so it shouldn't be too hard to write a nice little wrapper for it. These kinda feel like Java enums =\
Are you ready to handle all this big data?