Wednesday, July 16, 2014

Stop changing my data!!

Dear Google,

Every day I do an incremental import of your Analytics data by pulling in the data from the previous full day.
Why is it when I come back the next day and do a new query for the data from the same day, it's sometimes (but all too often) different? Sometimes it's even different when querying the same completed data range twice in the same day.

Your documentation is confusing. I know you do some mysterious processing. I know there are some things related to data sampling that I need to worry about. I know if I give you $150,000 per account per year then I can upgrade to Premium Analytics and not have to worry (as much) about sampling.

Could you please make it clear, for other users who haven't already figured this out the hard way, that anything we query from your API may possibly be an approximation, and is liable to change if we run the same query an hour or a day later?

I have changed my daily import from 4 a.m. Eastern to 7 a.m. Eastern to ensure that any processing you may be doing on the previous day has had three additional hours to finish.

I have changed all my queries to use the highest-precision sampling level.

I am even going to far as to delete any data that was queried with a range including the previous two days, and then I re-import them along with the latest day, on each day I do an import, because I don't have confidence that the data isn't going to change until at least 48 hours after the day is done.

Just please be a little more up-front about this stuff next time, Google, and it will save your users a lot of pain. It's really not cool thinking you're ready to go-live with brand-new reports that use imported Googly data, only to discover that we have a report that is internally inconsistent with itself because one part of it aggregates daily visits over the last 30 days, being the result of 30 separate incremental imports, and another part of the report shows data from the last 30 days aggregated together, being the result of a single query done afresh on each import, using the last 30 days as the date range.

Annoyed,
Samer

_________________________________

The pictures below shows the kind of stuff you get when you do the same query for the same date range twice in Google Analytics. The numbers at the top were done using one query, done yesterday, for the past 30 days. The numbers at the bottom were done using 30 queries of 1 day each over the past 30 days, and then added together. Before you think I'm doing something wrong, when I delete the 30 daily records and do all those 30 queries again in one shot, the aggregated numbers start to match all of a sudden. I can literally run the same query over the same (finished) date range twice and get two separate results. After a while it does stop changing -- probably 48 hours to be safe.


Google Analytics is just giving us trouble heaped on top of trouble!

No comments:

Post a Comment