Snowplow vs. Google Analytics

I have been asked many times to give a comparison between Google Analytics and other tracking tools. One of the tools that I have had a chance to work with in the recent past is called Snowplow. The technology impressed me so much, that I have decided to write a...

read more

Country Table on Amazon Redshift

If you would like to quickly create contries dimension on Amazon redshift, here is a handy code. CREATE TABLE IF NOT EXISTS dw.dim_country ( id int(11), iso char(2), name varchar(80), nicename varchar(80), iso3 char(3), numcode smallint(6), phonecode int(5) ) ; INSERT INTO dw.dim_country (id, iso, name, nicename, iso3, numcode, phonecode)...

read more

How To: Get reports via API from iTunes Connect

You can use the new Apple Autoingestion tool to download reports from the command line or through a custom script that you create. The Steps below outline how to use the Autoingestion tool. Note that Java 1.6 or later is required. Requirements: Vendor ID Apple ID Password for the account It is...

read more

How To: Get reports via API from Google Adwords

These instruction are for the PHP library Access Requirements The following information should be saved in a security control file, such as google-adwords/auth.ini: Developer Token (needs to be approved by Google team) User Agent (this is just an account name) Client ID Client Secret Files Download the Google AdWords PHP SDK. Create the...

read more

How To: Get reports via API from Google Analytics

These instructions are for Google Analytics PHP Library Detailed instruction on setting up Google Analytics access are located here: https://developers.google.com/analytics/devguides/reporting/core/v3/quickstart/service-php Access Requirements The following information is required to get API access to Google Analytics: Service Account Email P12 key (download from Google) General Instructions Download the PHP library for Google Analytics. Detailed...

read more

Restoring Redshift cluster from latest snapshot

While some companies implement a 24/7 BI environment, it is quite often the case that not all of your Redshift clusters need to be running around the clock. Consider a scenario where your data warehouse services only a defined geographical location. Or maybe you have analytical Redshift cluster just for...

read more

Redshift Error when restoring from snapshot

Problem I have tried recently to implement an automation in Redshift to restore the cluster from snapshot bey triggering it from EC2. Accomplishing this task, would yield considerable savings on your infrastructure costs, since you would not need to pay for the redshift cluster to run at night, when  it (presumably)...

read more

Improving vacuum on Amazon Redshift

A VACUUM operation is a very memory and CPU intensive maintenance operation which can take an extended period of time based on several factors. How often are you loading data in the table(s)? In what WLM query queue are you running the VACUUM With how much cluster memory allocated? If...

read more

Data Warehouse design on Amazon Redshift

Amazon website provides considerable amount of documentation on their website with best practices for table design on Amazon Redshift. I would like to summarize some key ones here and explain why they are important. Design the Data Warehouse first on Paper Rather then diving head first into creating tables in...

read more