Python & Google Analytics v3: Using google-api-python-client to access Analytics via the API

First up, I don't know what the fuck Google was doing with the docs when this was rolled out. Anything involving OAuth2 is horribly documented, requiring you to trawl through hundreds of pages for a few key lines of information.

Python's gData library is very nice, don't get me wrong. However it seems to be using Analytics v2 and the main issue I had with v2 was the rate limit which was constantly being reached. With v3, the limit was dramatically increased but you're constantly being told to go through the unecessary way of implementing OAuth2.

Well when I finally sledged my way through Google's crazy maze of doc pages, a mixture of obsolete StackOverflow posts and people giving advice for the "user authenticated" way of doing things, I finally came up with a working example to share.

A3DC24F9D84566735B5DA2B2423577
Yes Google, I really enjoyed reading all those pages of writing to get what it working...

API Setup - Google Cloud Console

Google has taken a great amount of effort to centralise all the configuration for their APIs into this one central hub, Google Cloud Console.

In order to get Analytics API working, you'll need a "project".

  • Create a project (with any name and Project ID) and click on it.
  • Click on "APIs & Auth"
  • Enable "Analytics API". Make sure the status is green and says "ON"
  • Click on "Analytics API"
  • "Quota" is where you set your rate limit (I set mine to 10 requests per second because of the regex 128 character limit when filtering).
  • "Reports" is where you can check your usage for the day. You can have up to 50,000 requests a day.

Now click on "Registered apps" under "APIs & auth".

  • Register a new app (or use existing one)
  • Name can be whatever you want, but it should be a "Web Application"
  • When you see a screen with 4 options, click on "Certificate"

image

  • Click "Generate certificate"
  • Download and save the private key file to a safe place. You'll need this to access the API from within your code.
  • Copy the generated "Email Address" to your code somewhere. It should end with "@developer.gserviceaccount.com". This is the important bit.

The email address is your "Service Account". This is not made clear to you in many of the doc pages!

Now we should be done with the Google Cloud Console.

API Setup - Analytics & how to get a table ID

Log into your Analytics admin and edit your tracker. Add in your service account email into the list of users which can view & analyse your data. You need to do this for every tracker the account needs to access.

While you're still in Google Analytics, you will need to find the unique "table ID". This simply refers to each site you have in the account.

This is what the docs describes the table ID as...

The unique table ID of the form ga:XXXX, where XXXX is the Analytics view (profile) ID for which the query will retrieve the data.

The unique ID used to retrieve the Analytics data. This ID is the concatenation of the namespace ga: with the Analytics view (profile) ID. You can retrieve the view (profile) ID by using the analytics.management.profiles.list method, which provides the id in the View (Profile) resource in the Google Analytics Management API.

Not very useful! Link is useless too.

How you get the table ID is relatively simple. There's no need to write code or put your data into a 3rd party site either.

Take for example my (now defunct) "DCX GoogleCode page" tracker.

image

The table ID is NOT the field starting with "UA-..."! Instead, right click on the last item in the list and copy the link location.

The current link has a format like this: https://www.google.com/analytics/web/?hl=en#report/visitors-overview/a11111111w22222222p33333333/

Your table ID sits in the position where "33333333" is after the character "p". Copy that to your code somewhere.

Setting up Python

You'll need to install these libraries. You can use pip or easy_install, or manually. Whichever you prefer, just make sure you install it right. I've specified the versions I used at time of writing in case anybody needs it.

  • google-api-python-client (v1.2)
  • pyOpenSSL (OAuth installs an older version v0.1, but I had to upgrade to v0.13.1)

Code Snippet

This code will help you initialise the connection to the Analytics API v3 service.

import httplib2

from apiclient.discovery import build
from oauth2client.client import SignedJwtAssertionCredentials

def connect_to_analytics(self):
f = file('googleanalytics/your-privatekey.p12', 'rb')
key = f.read()
f.close()
credentials = SignedJwtAssertionCredentials(
'your@developer.gserviceaccount.com',
key,
scope='https://www.googleapis.com/auth/analytics.readonly')

http = httplib2.Http()
http = credentials.authorize(http)

return build('analytics', 'v3', http=http)

As you can see, there's no:

  • secret client tokens
  • "flows"
  • Getting an authorisation URL and storing temporarily credentials
  • "Storage" methods
  • web authentication back and forth bullcrap to deal with

This example can be tweaked in any way you want, just read through to see the important bits for you. I've included an exponential backoff for your convenience in case you hit the rate limit.

def fetch_data():
# Exponential backoff
# https://developers.google.com/analytics/devguides/reporting/core/v3/coreErrors#backoff
n = 1
service = connect_to_analytics()

while True: # Retry loop
try:
# See this URL for a full list of possible dimensions and metrics
# https://developers.google.com/analytics/devguides/reporting/core/dimsmets
arguments = {
'ids': 'ga:123456', # Your Google Analytics table ID goes here

'dimensions': 'ga:pagePath',
'metrics': 'ga:pageviews',
'sort': '-ga:pageviews',

'filters': 'ga:pagePath=~%s' % path_pattern, # Regex filter
'start_date': start_date.strftime("%Y-%m-%d"),
'end_date': end_date.strftime("%Y-%m-%d"),
'max_results' : 1000, # Max of 10,000
}

data_query = service.data().ga().get(**arguments)
feed = data_query.execute()

# Reset retry counter
if n > 0:
n = 0

break # Break free of while True

except HttpError, error:
print error.__class__, unicode(error)

if error.resp.reason in ['userRateLimitExceeded', 'quotaExceeded']:
sec = (2 ** n) + random.random()
print "Rate limit exceeded, retrying in %ss" % sec
time.sleep(sec)
n += 1
else:
raise

if 'rows' not in feed:
print "No results found"
return

data = {}

for row in feed['rows']:
pagePath, pageviews = row

# TODO: Do your stuff here
# example: data[pagePath] = data.get(pagePath, 0) + int(pageViews)

return data

The important bits are setting the right arguments and these two lines:

data_query = service.data().ga().get(**arguments)
feed = data_query.execute()

Just in case

If at any time you see this error, you'll need to upgrade pyOpenSSL:

  File "/usr/local/lib/python2.7/dist-packages/google_api_python_client-1.2.egg/oauth2client/crypt.py", line 106, in sign
    return crypto.sign(self._key, message, 'sha256')
AttributeError: 'module' object has no attribute 'sign'

A little post-coding activity

Please take the time to give Google a big kick in it's ass. Not everybody is drinking the Google-aid so tell them which part of the docs need more explaining. There's a lot of assumed knowledge in there which makes it difficult for people to pick things up.

UDC6yPN
Take THIS shitty Google documentation!

Sources

Not very useful

Useful once you're connected to the API.

This pointed me to the right search term, "Service Accounts"

Finally, the holy grail.

 
Copyright © Twig's Tech Tips
Theme by BloggerThemes & TopWPThemes Sponsored by iBlogtoBlog