Web Analytics Blogs

Judah Phillips is an experienced web analytics practitioner and Internet expert currently working as a Senior Director at a large, global Internet company. His blog is full of useful, unbiased, actionable insights learned from the real-world practice of a process-oriented, integrated approach to strategic Web Analytics for improving business performance.

Subscribe to Judah Phillips weblog

AVG LinkScanner Bot Executes JavaScript?!?

The  well-researched answer is “no.”  The AVG LinkScanner Bot appears to prefetch the js and the gif (and pretty much everything else on the page), which for certain tools and their tag configurations generates false page views and visits (and the derivatives thereof), just like it’s “legitimate” traffic. 

If your tag configuration is set up with noscript tags, AVG will fetch the content in the tags, including the gif, which means that:

  • The bot may be infesting the data of customers of web analytics vendor who configure page tag-based data collection in this way. 
  • The bot may be inflating the data in such products/services offered by various web analytics companies.
  • Customers may be paying for server calls generated by this bot.

Vendors, of course, could easily filter the user agent to protect their customers:

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;1813) 

But I haven’t heard a peep from any SaaS vendors about excluding the user agent, filtering already collected data, or refunding customers the cost of robotically generated server calls (regardless of AVG). Have you?

Think about this: many SaaS page tag vendors don’t provide detailed visitor-level data and user agent reporting.  That means that their customers have no ability to investigate this bot or detect it by filtering their reported data by the the true user agent.

I’ve been talking about JS executing bots screwing with web data for about a year nowSEOMoz and the folks at SlickSurface confirmed it quite recently (quoting me no less in their fantastic analysis).  So they do exist…

Now let me tell you a little story.  Once upon a time I was at a conference called eMetrics when the CEO of a company came up to me and said “hey I read your blog about bot detection, and I looked in my web metrics tool for traffic with high page view to visit ratios.”  Then he narrated a story to me about how he found a bunch of traffic that had page view to visit ratios of 5,000 to 1.”  I said “do you use page tags” He said “that’s all my vendor provides, so yeah.”  And I said “you’ve found a javascript executing bot in your data.”  “I know” he said. “Well did you call your vendor and let them know?”  I said.  Now for the punch line:  he told me that the vendor (who shall remain nameless) told him “well, the traffic executed server calls”  And they wouldn’t give him a refund!

It’s worth mentioning that this bot definitely affects log file tools and packet sniffer tools.  Both must be configured to filter the AVG LinkScanner user agent.

Now here’s the rub for me.  I use AVG!!!  But I now find it increasingly difficult to support the company or continue using their products.  Why?  Because they are wearing a “bad hat” here:

  • First, they are fully aware of the affect of this bot on web analytics systems. They just don’t seem to care (yet).  UPDATE:  They have set up a Google Group to discuss this issue.  They must understand how companies of all types in all sectors use web analytics data to optimize their sites, set their marketing budgets, determine expected server load, and much more.  What do their Internet Marketers think? 
  • Second, the Link Scanner tool may have a short shelf life and may offer limited protection.  Malware creators will easily adjust. Check out what my friend Steve McInerney, a very smart security expert, said on the Web Analytics Association’s Yahoo Forum:
What strikes me about this particular solution by AVG is how
incredibly … stupid it is on several fronts.
1. Noticeably impacting a users bandwidth is, technically, a security
breach in the first place, aka Denial of Service Attack.
2. Some of us live in countries that have rather severe bandwidth
charges/limits and the like, whom shall I send my excess bandwidth
bill to?
…(this) method is fundamentally
flawed. ie malware ignores any first request and only infects on a
second request - alternate cloaking. Whatever. This type of “solution”
only provides weak protection for a strictly limited period of time.
…not just “no security” but bad
security. Because folk feel they are being protected when they are
not, and hence will take greater risks and hence inflict greater harm
on themselves. :-( 
Ignoring the balance of positive to harm that this problem inflicts on
the users who use this product.
  • Third, AVG just doesn’t seem to “get it” yet.  They are potentially messing with the ability to drive commerce via data driven decision making, e-commerce analytics, site optimization, and online media measurement!  To quote The Register “chief of research Roger Thompson - who designed the AVG LinkScanner - indicated he may do away with that unique user agent. His chief concern is security, and he doesn’t want webmasters or malware writers gaming his scanner. “In order to detect the really tricky - and by association, the most important - malicious content, we need to look just like a browser driven by a human being,” he argues.

WebMasterWorld has some good stuff about to say here.  Read the Register’s first article here.  And check out the dude’s blog who broke the news first and responses from AVG here and here.

Interesting stuff. So what do you all think? Have you seen evidence of this bot in user agent data from your page tag solutions that use the noscript tag for the image? 

Steve added the following ...

(i) :-) Thought I recognised those quotes. That’ll learn me to skim read.
(ii) Oh Joy. It may execute JS. :-(

I was/am kinda hoping they would have some smarts so it didn’t hit, say, the major WA players’ known image tag sites. But that wouldn’t help the smaller players, or self tagged sites.
And if it did whitelist some sites? If they get cracked…

ie Do some MIME magic on the cracked server and serve malware vs an image tag. Or similar. Bleh.

BTW. Not that I doubt my mate Judah P, :-) but do you have a source for the AVG JS executing? Link elsewhere? or was it your internal tech’s having a squiz? or?
Keen to read more! Not keen enough to trial myself. ;-)

On an totally unrelated topic: Firefox3 is the bee’s knees!!! I thought FF2 was a huge improvement….
/advocacy. :-D

Cheers!
- Steve

Michael C. Cook added the following ...

I have recommended a way of dealing with this issues as it currently existing using server side scripting. Naturally, this solution will only be effective if AVG does not obfuscate the identity of their user agent even further. http://analytics.pdxpcd.org/?p=10

Damien Mulley » Blog Archive » Fluffy Links - Tuesday June 24th 2008 added the following ...

[…] AVG antivirus is inflating website stats due it scanning your browser cache. Also bad for bandwidth […]

Judah added the following ...

Steve: Heh. As you mentioned, AVG LinkScanner may be instilling a false sense of security. I’ve personally disabled it.

The jury is still out on whether the bot actually executes javascript. It appears probably not to be. It’s definitely prefetching the js and image in the noscript tag, which causes certain tools to count the traffic.

Yeah, FireFox is taking mad share from IE. Bee’s knees? I didn’t know bees had knees! Just kidding! FF3 does “rock.” :)

Michael: Thanks for sharing a link to your blog, and posting that well-commented code. I hope our readers will find it useful. Good stuff! :)

Wayne Kurtzman added the following ...

There has been a fair amount of discussion about this since The Register published their encounters with “fake traffic” (http://www.theregister.co.uk/2008/06/13/avg_scanner_skews_web_traffic_numbers/).

Log file analytics, while time consuming and perhaps an extra expense, can still provide a “head’s up” to issues like this and other anomalies. I’m not suggesting it as your only analytics source, but rather an element of “QA”.

This “QA check” becomes very helpful if you are fortune enough to be involved in fact-based marketing. (Oh yes, there are other types!).

Thank you all for sharing your opinions and findings in this community.

Wayne

Judah added the following ...

Wayne: The big indicator for this bot is a bandwidth surge beyond “normal” levels, which you can detect from logs. While I do agree that logs are still useful, if AVG obfuscates the user agent to look exactly like a legitimate browser, it will be impossible to differentiate the traffic from the AVG bot from legit traffic in log file based systems. Currently it’s pretty easy to detect/filter based on the user agent in both tag and log file based systems.

If the bot is requesting the gif from noscript tag, as I believe it is, you (or your vendor) can still filter the the agent in page tag systems, but again if the agent is obfuscated that would be impossible (or very difficult using heuristics).

Thanks for your comment. :)


Add to the Conversation

Your email (required) will not be published.

Please note that contributions are moderated and may take a little while to appear.