google analytics datasets for attribution

Why your MCF report doesn’t match anything else in GA

Tyler Mitchell Blog

google analytics datasets for attribution
The Problem

While working on our data-driven multi-touch attribution solution, we noticed that we couldn’t reproduce the attribution results provided in Google Analytics – even using a simple last touch model. To add to the confusion, we noticed that our rule-based attribution models were all capable of reproducing the results in GA.

This was baffling to us because both data-driven and rule-based attribution shared the same implementation for the last touch model. So what was the difference between the two solutions? The answer: the underlying data used to power the models.

We power our data-driven models using sessions from GA. With ChannelMix ID, we can associate unique IDs to users and rebuild the Multi-Channel Funnel dataset in GA. The benefit to this approach is that we have access to customer paths that have not yet or will never convert. This is crucial for our data-driven model. From this dataset, we run data-driven attribution and rule-based attribution.

But when we’re only using rule-based attribution, the models are trained using the standard Multi-Channel Funnel dataset. This is the same dataset used for attribution in GA.

Even knowing this, it doesn’t explain why attribution powered by our custom MCF dataset does not reproduce results in GA while the models powered by the GA MCF do. Especially for last-touch attribution where the only session that matters is the one that converts.

So, we began to dig into the datasets and compare individual paths from each dataset, and that’s when we noticed something completely unexpected. The two datasets show different interactions for the same user at the same times.

The Reason

At this point we began searching through GA documentation and blog posts trying to figure out why the data would be different across two different GA reports. Eventually we found a blurb in the GA documentation and an extremely helpful (and frankly validating) article

The main takeaway from these resources is: GA treats direct interactions in the Multi-Channel Funnel dataset differently than in every other dataset. When someone comes directly to your website, the Multi-Channel Funnel dataset treats this as a direct interaction.

But, in every other dataset, GA will look for a campaign cookie. If one exists, it will change that direct interaction to be the same type of interaction as the previous one. Only if it can not find an existing campaign cookie will it be registered as a direct interaction.

As an example, let’s imagine a person clicks through a Paid Search ad to your website and browses for a while before leaving. The next day, the same person comes directly to your website and converts. In the MCF report, you will see a Paid Search interaction followed by a Direct interaction. In every other report, this will appear as two Paid Search interactions.

And if this isn’t confusing enough, we found a confounding effect that makes it even more difficult to reproduce GA’s attribution results. The global default campaign cookie expiration time is 90 days, where the default MCF lookback window is only 30 days. This means that GA will only look at the most recent 30 days of interactions when reconstructing customer paths, but, in other reports, will be looking back up to 90 days to find an interaction to replace direct with.

Summary

The conclusion is that there is no way to exactly reconstruct the Multi-Channel Funnel dataset including paths that have not converted. The data is irreversibly altered in such a way that the number of direct interactions in the MCF report will always be equal to or greater than the number in all other reports.

The good news is that generally if someone is coming directly to your website, it’s because they’ve interacted with your brand before. So while we wish we had access to the unaltered data, the alteration taking place is a reasonable one. Especially if you go into your global GA settings and decrease the campaign cookie expiration time to something much more reasonable.

For attribution purposes, the removal of Direct interactions makes particularly good sense. In fact, it makes such good sense that we provide a filter to remove even more of the direct interactions because we want to focus on the marketing interactions that push people to your website. In this way, the altered dataset is better for running attribution. We just have to accept that we can not match the numbers against attribution coming directly from GA.