Why Don’t the Numbers Match?!?
A question any practitioner of Internet-based analytics will be asked by many different stakeholders is “why don’t the numbers match?” Counts of the identically named metrics from ad servers don’t match the web analytics tool, which don’t match the for-pay third party audience measurement tools, which don’t match the free audience measurement tools, which never match any of the homegrown internal measurement tools. And none of them ever match each other.
So it’s a good question certainly valid to ask. The answers are even fairly easy to understand, but the root causes are often difficult to pinpoint and even harder, if possible at all, to remedy. The fact of the matter is that data discrepancies in analytics result for a multitude of reasons, such as:
- Different data collection methods. We have a bunch of tools and services that collect web data using various, non-standardized, proprietary data collection methods. Ad servers use javascript page tags. Many web analytics tools use page tags too, but it’s not uncommon in web analytics to use additional methods, such as log files or packet sniffers. Or perhaps a combination of these methods, called hybrid data collection. And all the tools have different algorithms for processing the data collected.
On the audience measurement side, data is collected from self-selecting panels who install proprietary software (i.e. toolbars and so on) on their computers, perhaps at work or at their university, but most likely at home. Then, the collected data from different panels is rolled-up and combined, and the limited subset of the Internet population that chooses to be monitored, in exchange for some incentive, is inflated and projected to the entire Internet audience using proprietary statistical methods. We also have data collected from a limited set of geographically specific ISP’s. And regardless of whether we’re talking about audience measurement or web analytics, the different data collection methods often, but not always, involve cookies and all their inherent issues of cookie deletion.
- Unique data models. Ad servers aren’t focused on counting page views and the other dimension of web analytics (visits, time, and so on). Rather ad servers focus on serving and counting impressions served (and loads of related derivative calculations, like CTR, CPC, and view–thru). Metrics are based on an ad request and an ad code. Ads may or may not be targeted to a page, and instead to various constructs, like a “zone” or “keyword.” What that means is that the “page” dimension may not even exist in your ad server’s data model. In other words, you aren’t looking at impressions measured on a page, but rather at the number of impressions served in a different conceptual construct. That’s one of the reasons why people say metrics and ad-serving systems “don’t measure the same thing.”
- Untagged pages. Specific to technologies that collect data or serve ads using javascript page tags, there are challenges to ensuring and verifying complete coverage of page tags across every page on a site. When the pages aren’t all tagged with the different tags for the assorted technologies, guess what? The numbers won’t come close to falling within tolerable variances. And questions and skepticism will ensue.
- Non-JS executing clients and ad blocking software. Let’s imagine for the moment, your site is perfectly tagged for all technologies, so the numbers between your ad server will be close to your web analytics system, right? Nope, regardless of data model issues, not all browsers execute javascript and many Firefox users have installed Ad Block Plus.
- Cookie issues. When you’re counting based on cookies, third-party cookies get blocked (often by privacy software). Many ad servers and web analytics tools still serve third party cookies, and many corporations have not tricked out their DNS to accommodate this issue. And we all know how cookie deletion affects unique visitor counts, even if you use first-party cookies.
- Many other issues. Latency from visitors moving off the page prior to the tag executing to latency in the call to pick up an ad from a third party while your ad server counts the traffic (so your ad count differs from the agency’s count), to refresh rates making it hard to correlate page views and impressions, to no rich media installed and no fallback, to robotic traffic not being filtered from logs or tags, to certain types of user agents (such as mobile devices) not executing javascript… there’s a whole host of other factors that cause data discrepancies.
And of course, there’s always the nebulous issue around the complete lack of consensus-based, enforceable standards for online measurement. No industry organization can say what vendors or companies “must” do, only what they “should” do… And no industry body is going to get successful companies to change their secret sauce just because they said so…
So what’s a practitioner to do? Understand the potential sources of discrepancies. Work with your team (from IT to vendors) to prevent and minimize the root causes when possible. Educate your team when discrepancies are not remediable. Ensure you use the different sources of metrics judiciously in the context of your business goals. Finally, realize that none of the tools are more “correct” than any other. All of our analytics tools serve different, and sometimes overlapping, business purposes - from counting ads, to influencing media buying, to sizing audiences, to measuring business performance, and to optimizing the site.
June Dershewitz added the following ...
Hello Judah P. Nice article. I hate to be one of those people who writes a comment just to say, “I wrote about this, too!” but … I wrote about this, too!
http://june.typepad.com/june/2008/04/web-analytics-d.html
Actually I think we covered the same topic in different but complimentary ways: I described the process an analyst goes through in order to match up data from separate sources, and you wrote good juicy detail about *why* it’s so hard to ever hope to get that data to match up.
I appreciate your call for enforceable standards for online measurement. Standards will take some of the guesswork out of the inevitable data reconciliation projects we face in the future.
Darren Shafae added the following ...
Hello, Judah,
I am very excited to know that I am not the only one who experiences the agony of verifying data. We have used several third-party tools over the years and have finally made the move to an in-house analytics tool. I find that our own system provides the most value, but obviously it is more time- consuming and expensive to build and update.
However, the chance to interact with the raw data in order to build custom reports and to integrate YSM and Google APIs is invaluable. I found a disturbing trend in which some of our clients habitually depend on Google AdWords to locate our Web site. Clients use a keyword on Google to locate our adtext and then enter our site through the paid link. This would be okay if it was their first visit, but these are registered clients, who have made purchases in the past. I am not sure if this is widespread across other industries, but it is something to keep in mind when you see keywords with insane conversion rates.
I have also verified that these clients are not bookmarking the CPC destination URLs. I recently switched the destination URLs, yet the same clients are repeat offenders.
I guess in addition to trying to get your numbers to match, you will also need to scrub your conversions for “re-acquired” clients. If we had solely relied on third-party tools, we would have never caught this anomaly. It is one thing to understand the numbers, metrics, and KPIs, but I guess nothing replaces interacting with the raw data.
Best regards,
Darren Shafae
http://www.papercheck.com
Judah added the following ...
Steve: Thanks! Good comment. I agree that the psychology and demographics of an audience will certainly influence the numbers and reconciliation across systems.
For example, I have a friend who runs a site that caters to smart technologists where he claims that like 85% of the folks block ads and probably don’t execute javascript.
And then you have the sites that cater to way less technically savvy audience, and they think cookies are good for eating and blocking is what you do in football. Hah!
Apparently geography has an impact too. I’m told Germans do a lot of ad blocking, but that’s just anecdotal.
How to detect this? That’s a tough one, and a time consuming analysis. But I may take your advice and give my $.02 in another post.
June D: I agree. Complimentary as usual! And I’m glad you linked to your piece.
While I’m a strong believer in standards, and the work at the WAA and IAB, it’s going to be a long slog before we have anything enforceable or standardized in either WA or AM.
The Online Ad business is much furthur along in defining/mapping deeper workflows, guidelines, and “standards.” That’s probably because the ad business is more evolved than analytics and more easily tied/attributable to revenue. In analytics most companies still struggle with the ROI! Then again so do a lot of advertisers! LOL!
Darren: Agony! LOL! Data verification is also complex to do, and requires a company invest time/resources to figure it out. You make an excellent point about an in-house analytics tool and having all the raw data.
The challenge with running “in-house” is that it takes resources, time, money, process, people, savvy to execute on the in-house vision. It’s not easy, but when you can prove that there’s an ROI for doing so, much easier to get all that.
And part of that ROI may be in correcting that problem of “re-acquisition” you highlighted. How frustrating that these customers are eating your PPC budget and bookmarking the URL!
Have you determined a solution for correcting this issue? What search term are they clicking, are you organically optimizing for that term? Perhaps a campaign that gives them some incentive to bookmark a new URL?
Darren Shafae added the following ...
Hi, Judah,
Thank you for your feedback! The reason we started the in-house initiative was to confirm our conversion numbers with AdWords and YSM tracking. The actual conversions we logged in our database, client registrations, were not matching the numbers reported by AdWords and YSM. The paid search platforms do include the re-acquired clients, which obviously is not good for us if we are trying to optimize our budget/keywords using this data.
We have tried using different third party tools, and will try Omniture later this year (still negotiating master service terms). We have developed two solutions for this problem. The first solution we tried was to track registrations instead of sales on Adwords and YSM, but this caused other difficulties. This solved the re-acquisition problem, but then we could not tie the final sale price to the keyword. We were able to determine only that a registration had occurred. That made optimization difficult, hence another reason for the in-house solution; we can now track multipoint conversions (registration and sale) using detailed click stream analysis.
The second solution, which has not yet been implemented, is to incentivize our clients to use the normal login link from our home page. We are also working on a tool bar from http://www.conduit.com/ (custom toolbars from our neighbors on 71 Stevenson Street) that will give our clients a discounted price if they continue to use our Papercheck toolbar that is installed in either IE or FireFox. The solutions have not been tested for a long period of time, so the jury is still out on whether this will eliminate the problem. I will keep you posted.
Regards,
Darren Shafae
http://www.papercheck.com
Yes, I did say Agony. Why is it so hard to get cost and revenue?
Robbin Steif added the following ...
You have *got* to stop composing in MS Word.
Judah added the following ...
Darren: Thanks for sharing! Curious, why are you picking Omniture? You mentioned liking raw data, and Omniture hosted is anything but. It’s aggregates and visitor ID’s based on cookies, different data models, etc. Or are you talking about Omniture Discover On Premise (i.e. Visual Sciences Platform 5). Ask about proprietary data models and how you get data out of the system. If you’re looking for visitor level data from a hosted solution, did you look at Coremetrics? Or in-house based on open standards did you check out Unica (cross platform) or Webtrends (very MS specific)? Any thoughts? I always find it interesting to learn why people pick the tools they do…
I’d love to hear more about what you end up doing to solve your issue. It’s pretty fascinating. Personally, I’ve always had a bad taste for toolbars. Blame folks like the company formerly known as Viewpoint for biasing me in the 1990’s when I equated toolbars with spam/spyware/adware, and still haven’t been able to recover. Another challenge is that if you can carve out the segment of reacquirers, how do you gently get them to do what you want, without freaking them out about what you know about their behavior (i.e. we’ve been tracking you and know what you do type of thing… stop clicking on our paid links!).
Robbin: I try to write in Wordpress, but when I do use Word, I paste it to Notepad, then paste to Wordpress. Probably should employ alternative methods, like upgrading to Office 2007, or using Open Office or Google Docs… I know, I know, but old habits die hard. ![]()
Darren Shafae added the following ...
The reason we are looking at Omniture is simple. They have API agreements for not only the big three (Yahoo!, Google and MSN), but also for several secondary paid search providers. It will make it easier to interact with our accounts on a daily basis. It is also difficult to integrate paid search APIs into our administration interface.
Omniture does have an API to get information into their system without using JavaScript, but not one for getting data out (as far as I know or have been told). The other providers do not work with companies of our size. We spend a fair amount of money on paid search, but if you are not spending in the $40K range, their system doesn’t work. It isn’t that we do not have the funds in our paid search budget, but there is not whole lot of traffic looking for editing services, or at least not at that level.
From my brief interaction with Webtrends and Marine Software, I have determined that you need more traffic, data, and conversions for their systems to perform at an optimal level.
I can understand your frustration with toolbars. This is just the first step in addressing this problem, and yes, I do agree that this approach may freak our clients out. We have overcome much more significant ecommerce problems, and survived. If anyone is facing problems with credit card fraud, let me know.
-Darren
Judah added the following ...
Darren: Thanks for sharing your assessment!
Omniture’s Search Center is indeed a differentiator when compared to the offerings (or lack thereof) of several of the other vendors I mentioned. It’s no wonder, OMTR has such a market share. That’s an interesting assessment of WebTrends too. I’m sure some of my good readers will appreciate and take to heart your comments… especially those from WebTrends! I may be in the minority when it comes to toolbars, as everyone and their brother seems to have been desensitized by the Google toolbar. I have a family member who runs a small brick and click Internet business, and I’ve heard some horror stories about credit card fraud. Way to go, overcoming that!
web analytics and internet india post article reads | Web Analytics India Blog added the following ...
[…] Why dont Numbers match? - Here’s a post which helps you understand why you see the numbers not matching from your adwords account and your web analytics tool. […]
jip added the following ...
every single day I have to explain our customers why numbers don’t match … now I will read this post to them ![]()
WT added the following ...
WebTrend’s Dynamic Search Tool is different from Omniture’s Search Center. Dynamic Search automates and optimizes where as Omniture’s Search Center is a bid management tool. WebTrend’s has API agreements with all the networks and will work with companies that have monthly budgets lower than what was mentioned above.
I obviously agree with Judah in regard to getting at the raw data from a web analytic’s perspective. Omniture is the least flexible compared to the other vendors in the marketplace.

Steve added the following ...
Yeah! What Judah said!
Love the closing paragraph. Yup. Full on agree. All the tools ARE different. Specifically: They’re optimised for different tasks; which leads me to:
The one addition I’d make (and you just *knew* I’d have to make at least one!
), is to be aware of your website’s audience’s preferences.
ie. If you have a site that is heavily visited by paranoi… errr …. privacy aware techos, they will go out of their way to ensure any of the cookie or page tagging techniques won’t work. So you have little choice but to fall back to logs, and if they’re suitably paranoid? Even that won’t work too well for certain types of information. eg IP Addy or Agent.
Whereas if your site caters to the standard Mum, Dad and 2.4 kids, then Page Tagging/Cookies are, probably, more likely to be successful.
How to detect this? Possible topic for another blog posting Judah?
Cheers!
- Steve