Bot Traffic: Filter Out the Bots Google Analytics Misses

Whether you’re a passive or power Google Analytics user, you’ve more than likely run across bots hitting your site. At best, these irksome scripts are harmlessly sending traffic to your webpage, skewing your data. At worst, they’re maliciously installing viruses and worms that affect those individuals genuinely interacting with your site. Regardless of the intent, bots are an immense irritation.

Based on a 2014 incapsula study, bots account for 56% of all website traffic. And what makes them so much more vexing is the fact that they disproportionally affect small- and mid-sized sites, with estimates suggesting they may constitute 63-80% of all sessions. That’s absolutely staggering! The reason for this, based on both anecdotal and empirical evidence observed by Alight’s Data Scientists, is that these scripts are hitting sites in a regressive manor. That is to say, a bot may fire ten times on a small website and ten times on a large website; however, because the small site has fewer legitimate sessions, the data may appear markedly different from its true value. And for those small- and mid-sized sites that depend heavily on the reliability of the data coming into Google Analytics for their marketing strategies, data cooked by bot traffic can be devastating.

Fear not, though. There are a few solutions.

In mid 2014, Google introduced their own bot and spider filter, available through the Google Analytics “View Settings” tab. It works pretty well, allowing you to strain out most known sources. That said, it does come with its own set of faults. Specifically, Google relies solely on the IAB/ABC International Bots and Spiders List, which is only updated monthly (an eternity in the digital age), and thus isn’t necessarily comprehensive. What’s more, because access to the list’s contents is very expensive (costing between $4,000 and $14,000 a year depending on your membership level), most people have no clue what the IAB considers bots and spiders and what they don’t.

At Alight, we’ve built our own system we call ChannelMix Monitor. It’s a pretty robust little tool that, in a nutshell, searches for anomalies in our clients’ data using a highly advanced statistical algorithm that learns over time. (It’s not quite HAL 9000, but it’s pretty darn smart – and luckily nowhere near as malevolent!) When a bot spikes traffic outside a property’s statistically crafted boundaries, our Data Science team is alerted to the instance, allowing for immediate response. This means that we can segment or filter out that bot contaminated traffic on the same day it happens as opposed to the month it would take for Google’s list to update with the newly discovered bot. With ChannelMix Monitor, we essentially eliminate the game of catch up associated with bot traffic filtering, ensuring clean and clear data. And because we manage quite literally tens of thousands of properties for our clients, we’ve seen just about every bot imaginable. So, our lists are always current and up-to-date.

If you want to learn more about bot traffic or the solutions we offer to help remove them from your reporting, just tool around our Resources page or simply ask an expert. We’re here to help; it’s what we do!