Veda Shankar
Veda Shankar

Exploring Session-Level Google Analytics Data - Part 1

Have you felt limited by the capabilities of Google Analytics standard user interface? Would you like to slice & dice your analytics big data with multiple filters to intuitively explore and gain more insights?

If yes, then this multi-part blog series, discussing how to explore Google Analytics with the MapD platform, has the answers for you. MapD is a GPU-accelerated SQL engine (MapD Core) and a visual analytics platform (MapD Immerse). The application that we will discuss in this blog downloads raw, session-level Google Analytics data in CSV format for a specified date range. The downloaded data is then formatted, compressed (gzip), and uploaded to your MapD server. You can also choose not to automatically upload the data to MapD and instead import the gzipped CSV file using the Immerse user interface. The application is written in Python using the Google API Client Libraries and is based on the original work from Ryan Praski and Google Developer’s Hello Analytics API: Python quickstart for service accounts sample program.

Setup

There are a few steps that must be done in order to use this Python application.

Prerequisite 1: Google Analytics Service Account

The Python application uses a service account to call the Google API, so that the users aren't directly involved. You can skip this step if you have already enabled Analytics API and created a service account.

Step 1: Enable the use of API for your Google account by going to Register your application for Analytics API in Google API Console


Enable the use of API for your Google account

Click Continue to create a project.

Step 2:
This will enable the API access, now press Go to credentials.


Go to credentials

Step 3: Then fill the Add credentials to your project as shown in the caption below and press What credentials do I need?


Add credentials to your project

Step 4: Fill the Add credentials to your project as shown in the caption below and press Continue. Make sure to select JSON for the key type. Make sure you only use alphanumeric characters in the service account name, no spaces allowed.


Credentials - role

Step 5: The service account and key will get created, and the private key in JSON format will be automatically downloaded to your laptop. Rename this file to client_secrets.json and keep it safely. We will need this file to run the Python application.


client_secrets.json

Step 6: Close the above dialog box and on the Credentials screen click the Manage Service Accounts in the far right hand corner as shown below.


Manage Service Accounts

Step 7: This will bring up the details of the service account you just created. Note the Service account ID (@xxx.gserviceaccount.com) for the newly created service account.

Service Account ID

Prerequisite 2: Service Account Connected to Google Analytics Account

Add the Service account ID from the previous step to the Google Analytics account.

  • Sign in to Google Analytics.
  • Click Admin, and navigate to the desired account/property/view. /li>
  • In the ACCOUNT, PROPERTY, or VIEW column (depending upon whether you want to add the user at the account, property, or view level), click User Management. Parent permissions are inherited by default (account > property > view). For example, when you set permissions for a user at the account level, that user then has those same permissions for all the properties and views in that account. As you progress down the hierarchy, you can give more permissions, but not fewer, e.g., if a user has Read & Analyze permission at the account level, you can then also grant Edit permission at the property or view level; but if a user has Edit permission at the account level, you can’t limit permission to just Read & Analyze at the property level.
  • This will bring up the Account users page, press the + sign at the upper right-hand corner to show the add new users/groups menu. Press the Add new users option.


Add new users

  • Under Add permissions, enter the service account ID for the user's Google Account. Check the box for Read & Analyze.


Read and analyze

  • To confirm that the service account has been added correctly, Click Admin, and navigate to the desired account/property/view. You should see the newly added service account with the required privilege.

account/property/view

Prerequisite 3: Software Packages

The application has been written in Python using APIs from Google Analytics Client Library, MapD APIs, and Pandas data analysis library. The application was tested on a Linux system with Python version 2.7.12 and with the following packages installed:


# pip install --upgrade google-api-python-client
# pip install matplotlib pandas python-gflags
# pip install pymapd
# git clone https://github.com/mapd/google-analytics.git

Make sure to have a passwordless SSH login account on the MapD server in case you want to automatically load the analytics data into the MapD Core database. The alternative is to use the import feature in MapD Immerse user interface.

Workflow of the Google Analytics Importer Application

Here is what the application does:

  • Create a service object to Google Analytics using the client_secrets.json file corresponding to the service account that was created in the prerequisite step.
  • Traverse the Google Analytics management hierarchy and construct the mapping of website profile views and IDs.
  • Prompt the user to select the profile view they would like to collect the data.
$ python mapd_ga_data.py 
Item#           View ID               View Name
   1            151405570       MapD Community edition AWS_All Web Site Data
   2            151400370       MapD Community edition DL_All Web Site Data
   3            160834939       MapD Community edition DL_Non-Dev
   4            161106277       MapD Community edition DL_Non-MapD
   5            152688532       MapD Enterprise AWS_All Web Site Data
   6            108540717       MapD Enterprise installed_All Web Site Data
   7             94937405       MapD MIT Twitter Demo_All Web Site Data
   8            134050322       MapD Meetup - Enterprise GPU Computing_All W
   9            152769575       MapD OS_All Web Site Data
  10            168795059       MapD Test Drive_All Web Site Data
  11            122232033       MapD Testing_All Web Site Data
  12             93521025       MapD Website_Live Site
  13            123255922       MapD Website_No Filters
  14            160571176       MapD Website_Staging
  15            160593821       MapD Website_Staging (testing)
  16             94956153       MapD YouTube_All Web Site Data
Enter the item# of the view you would like to ingest into MapD:  2
Item # 2 selected
  • Prompt the user to enter the begin and end date range for downloading the data. If you just hit enter without any dates, the application will by default fetch the last 30 days worth of data.
Enter the begin date and end date in the following format: YYYY-MM-DD YYYY-MM-DD
Or hit enter to proceed with the default which is last 30 days data
Date Range:  2017-08-01 2018-02-20
Extract data from 2017-08-01 to 2018-02-20
  • Prompt the user to enter the MapD server specific information which includes the name of the server, database login, database password, database name and server ssh login name. The ssh login name will be used to copy the CSV file to a temporary location on the MapD server prior to loading it into the database. If you don’t want to automatically upload the analytics data to MapD and would like to upload it using the Immerse UI, simply hit enter without any information.
Enter the MapD server information if you want to upload data,
 otherwise simply hit enter to use the manual procedure to upload the data
  Information needed -     
MapD Server Info:  mapd-my-azure-server mapd HyperInteractive mapd john
The data from MapD Community edition DL_All Web Site Data (151400370)  will be automatically uploaded to  the mapd database in the mapd-my-azure-server server
 
Going to download data for MapD Community edition DL_All Web Site Data (151400370) ...
ga:date,ga:hour,ga:minute,ga:longitude,ga:latitude,ga:landingPagePath,ga:networkLocation
Found 48560 records
Profile Name: All Web Site Data
Now pulling data from 2017-08-01 to 2018-02-20.
  < … >
  • The application then starts downloading the records for the selected profile view and saves it to the data directory under the names with the following pattern _.csv. After all the records are saved to the CSV files the function merge_tables() is called to merge all the dimensions from the various tables into a single CSV file .csv. The merge function eliminates records that have empty location (latitude, longitude) fields. The geographic latitude and longitude fields are required for the Pointmap which is used for visualizing the location of data on a world map. The date and time dimensions are combined into a single DATETIME column which is compatible with MapD. If the user entered the MapD credentials while launching the application then the program will invoke the functions in mapd_utils.py file to load the table into MapD. The CSV file is compressed using gzip before calling the MapD load function.
Connect to MapD Server
# Connect to the DB
def connect_to_mapd(str_user, str_password, str_host, str_dbname):
  global connection
  connection = connect(user=str_user, password=str_password, host=str_host, dbname=str_dbname)
  print connection
  • Create table and load data from the CSV file
# Load CSV to Table
def load_to_mapd(table_name, csv_file, mapd_host, mapd_user):
  global connection
  create_table_str = 'CREATE TABLE IF NOT EXISTS %s (ga_date TIMESTAMP, ga_longitude FLOAT, ga_latitude FLOAT, ga_landingPagePath TEXT ENCODING DICT(8), ga_networkLocation TEXT ENCODING DICT(8), ga_pageviews BIGINT, ga_country TEXT ENCODING DICT(8), ga_city TEXT ENCODING DICT(8), ga_source TEXT ENCODING DICT(8), ga_sessionDurationBucket BIGINT, ga_sessionCount BIGINT, ga_deviceCategory TEXT ENCODING DICT(8), ga_campaign TEXT ENCODING DICT(8), ga_adContent TEXT ENCODING DICT(8), ga_keyword TEXT ENCODING DICT(8))' % (table_name)
  print create_table_str
  connection.execute(create_table_str)
  server_csv_file = '/tmp/%s' % (os.path.basename(csv_file))
  command = 'scp %s %s@%s:%s' % (csv_file, mapd_user, mapd_host, server_csv_file)
  print command
  os.system(command)
 
  query = 'COPY %s from \'%s\' WITH (nulls = \'None\')' % (table_name, server_csv_file)
  print query
  connection.execute(query)
  print connection.get_table_details(table_name)

After a successful execution the application exits with the following message:

=======================================================================
Goto MapD Immerse UI @ http://mapd-my-azure-server:9092/
=======================================================================

MapD Immerse Import Wizard

If you want to manually load the table into MapD Core Database then you can use MapD Immerse’s table import feature. Open MapD Immerse -> Click Data Manager -> Click Import Data -> Click the + sign or drag-and-drop the CSV file for upload. Detailed documentation.

+ sign or drag-and-drop

In the next blog we will walk through how to construct a dashboard with various types of charts from the Google Analytics data that we just uploaded to MapD.


Google Analytics data

Read part 2 of the blog to learn more on exploring Google Analytics data with MapD’s powerful crossfilter feature. We welcome your suggestions for improvement and any questions on this topic at our community feedback forum.

Veda Shankar

About the Author

Veda Shankar is a Developer Advocate at OmniSci working actively to assist the user community to take advantage of OmniSci's open source analytics platform. He is a customer oriented IT specialist with a unique combination of experience in product development, marketing and sales engineering. Prior to OmniSci, Veda worked on various open source software defined data center products at Red Hat.