Online Metrics need an XML Standard
I’m contributing a monthly article to MediaPost’s Metrics Insider Column. My first contribution was published last week, and I’m reposting it here to get your thoughts. Soon I hope to describe what I mean in more detail… The article was called “The Most Measurable Medium needs and XML Standard.” In case you missed it here it is:
OVER THE LAST TWO WEEKS, my fellow Metrics Insider columnists have correctly pointed out that online metrics are neither standardized nor easily integrated across systems. Vocabulary is muddled. Numbers do not match. Data exists in silos and is isolated from related data. Systems do not adequately or easily talk to each other. Research services, ad servers, and Web analytics tools report similarly named, overlapping and often conflicting metrics. Unfortunately, these problems will not disappear anytime soon, even with emerging “standards” and continued attention paid by the industry to these important issues.
Current industry standards for Web metrics are limited, basic, and come from independent entities. Most recently, the Web Analytics Association released a set of “standards.” The WAA’s standards are elementary definitions of concepts from various periods of Internet measurement. Web 2.0 concepts like “events” are mingled with dated measurements like “hits.” Regardless, these definitions provide a very useful starting point for framing a discussion about metrics. Recently, I’ve learned that the IAB and MRC are developing a set of IAB Reach Measurement Guidelines. Let’s hope the IAB and WAA align their work efforts.
The IAB and MRC are also currently auditing “audience measurement” firms, like Comscore and Nielsen. It’s rather unclear to practitioners what standards the IAB/MRC are applying to the audit. But the hope is that auditing will expose issues of coverage error and selection bias in the black box methodologies used to create the panels and generate the audience measurement data.
It is important to note that the IAB’s audit has two parts. The first is certification, which indicates the company being audited is applying the “standards,” and the second is accreditation, which demonstrates adherence to the IAB standards.
Only time will tell if companies like Hitwise, Compete, and Quantcast will be asked to submit to auditing. It’s worth mentioning that legacy metrics “standards” (and audits) from historic organizations like ABCe still occur and carry weight with publishers and advertisers (especially outside of the United States). It’s entirely possible that newly formed organizations, like the Association for Downloadable Media will offer their perspective on “standards” for online metrics.
The idea of “standards for the standards”–however absurd it sounds on the surface — starts to seem like a good idea when considering that all these parallel efforts aren’t intersecting. Honestly though, I question whether “standards” that are purely “definitional,” even if agreed upon, will solve many of the measurement challenges companies have when trying to understand Web data and take action from it.
Standard definitions are helpful for promoting understanding and creating a controlled vocabulary for discussing online metrics, but they don’t help with what I see as a huge challenge in today ’s metrics technologies. The problem is this: currently available online metrics systems do not adequately separate data from presentation . That’s a huge limitation preventing Web data from being easily integrated with other systems.
Detailed-level Web data (the raw data) is often costly to extract, if available at all. It is nearly impossible to deliver detailed data in real time from Web analytics, ad serving, and research-based technologies in order to feed other systems. The majority of hosted (ASP) metrics systems are closed and do not allow access to key interfaces using open software standards. For the most part, today’s metrics technologies are black boxes where data goes in, but can only be extracted in various file formats after creating a report. Common export formats include csv, pdf, and doc. While XML exports are often available from many vendors, there is no standard XML schema for describing the same type of Web data across different sources!
The industry must begin collaborating and creating a standard XML schema for describing Web data. Creating a widely used, consensus-based, published, and maintained XML standard for online metrics would make it possible to more easily share, transform, and use Web data in other systems.
I firmly believe that current metrics standards must go beyond simple definitions and tackle issues pertaining to data portability and system interoperability. Then we’ll all be in a better position to reuse Web data across the enterprise value chain. Once we all agree on “standard” definitions, I encourage us to start working together to develop a standard Online Metrics Markup Language.
S.Hamel added the following ...
Right on the spot! While working on WASP, it has become evident that the “entities” that can be collected (such as page name, referrer, screen width/height, even mouse location or click coordinates, etc.) could easily be standardized. Every tagging solution uses a similar method, which is passing arguments on a query string (of an image, a script, or whatever). Wouldn’t it be great if those “soooooo web 1.0″ query string parameters became AJAX queries? Or at least, if the query string was formatted as a standardized JSON string with clearly defined elements.
I’ll watch this work very closely!
S.Hamel
http://blog.immeria.net
Judah added the following ...
Thanks Stephane. I am a WASP user. It’s a cool tool. Thanks for creating it.
I completely agree that these web 1.0 parameters could be better enabled as AJAX queries or clearly-defined JSON elements. I’d even like to see more emphasis on web services in web analytics. I even an envision an ontology for analytics (Semantic web stuff) and supporting RDF too. Thanks for reading the blog and your comments are always appreciated! ![]()
Eric T. Peterson added the following ...
Judah,
I agree with Stephane, an excellent post and an excellent article. So what do you think the next steps are for getting an XML standard defined?
Eric T. Peterson
http://www.webanalyticsdemystified.com
Steve added the following ...
Sorry for the delay in replying Judah! You’re forcing me to think too much lately!
Yes! Yes! Yes! and urm.. Yes!
First off, a starting point does exist:
http://www.cs.rpi.edu/%7Epuninj/LOGML/draft-logml.html
“LOGML (Log Markup Language)”
Despite what certain people would have you believe,
even page tagging is still logging - just a funkier format.
Trouble is the problems with any XML at the data layer. Logs are big as is. XML them? Ginormous!!!
I sincerely doubt we’d ever XML at that layer - no ROI. And the data processing would be horrendous.
XML the reports generated? Sure - we’re seriously looking at doing just this for awffull (and use SVG for the graphs! Wheee!
)
While I do agree there is scope for XML in much of the WA world - the raw data probably isn’t it. I also don’t see any vendors stepping up to make something happen - not in their interest. Far better to have walled in silo’s - from their perspective.
We do live in interesting times…
Cheers!
- Steve
Judah added the following ...
Eric: Good question. I think one dude doing it solo won’t make much of an impact (unless it was you or Sterne). What I’d like to see is the WAA collaborate with representatives from major web analytics vendors and perhaps a representative from the IAB. There would be a chair (like me
or, vendor representatives (one each from google, unica, webtrends, omniture, visual sciences, coremetrics, indextools, clicktracks), and a very small list of well-qualified volunteers from the WAA. Define the scope, start putting together the standard, create reviewable text draft, then roll it out for WAA member and Vendor commentary, revise, and relase for public commentary, revise based on public feedback, release. This was a lot easier to write, than I imagine it would be to do… Model after what other technology standards setting bodes, like the w3c do… I may be being too idealistic…. My concern is that most ASP vendors like to control the data so they can “nickel and dime” charge unsuspecting customers who failed to ask the right questions before they bought a simple reporting tool they thought could do segmentation, so it may not be in the monetary interests of vendors to do so… What do you think?
Steve: I hear you about the data layer. I’m running into that issue with some pseudo XML used to describe robots… Does it make sense to put 30,000 bots in a XML file? Not really, so point taken… Such things should be in a db…
XML standard at an aggregate level or at the reporting level would be more manageable like you mentioned. I think the LOGML standard is interesting, especially its focus on SVG and graph theory, but needs to be ripped apart to deal with modern web analytics… Thanks for commenting as always… I hope you’d volunteer to get involved if the WAA starts talking more about data portability standards… You’d have a lot of value to add to the discussions!
Good comment about page tags. Most people don’t know they write logs in the backend, which are then parsed, just like any old log file… Go figure!
Thanks as always for your valuable comments! Both of you! ![]()
Aaron added the following ...
Judah,
I’m glad to see you writing about this. Eric and I began a discussion on this at X Change, which we never returned to. We didn’t see eye-to-eye on the topic. I believe that, in order for WA data to be truly interoperable across vendors and solution types, any standard has to go one step further and include definitions of industry-specific high-value events. Otherwise, all you’re left with is the core dimensions and measures that are the same across all sites. This may work for big media sites, where the primary interest is in content consumption and segmentation thereof (feel free to correct me if I’ve mis-characterized). But work deeply in any other industry (retail banking or travel, for instance) and you’ll find that there are very specific high-value business event models that form the core of the analytics data, and provide all the value. Interoperability means accommodating these industry-specific data models. And that means getting those industries to agree on data definitions and models. Ouch. This just got harder.
http://greaterreturns.blogspot.com/2007/10/more-on-wa-standards.html
Judah added the following ...
Hi Aaron,
Thanks.
I had Eric review that draft at Xchange prior to me submitting it to MediaPost. His only comment (and I paraphrase) was “right on.”
In addition to content consumption and segmentation, which I agree with, Media sites should also focus on conversion events (generated a lead, subscribed to the magazine, visited a microsite, and so on).
You bring up an excellent point about industry-specific data models, and I agree. Obviously, the models that work for Media aren’t identical to a banking site or a seller of laptops. Remember at Xchange when I mentioned “baseline” then “granular.” It’s the “baseline” that gives you the core dimensions and measures, and the “granular” that gives you the industry-specific data model. You’ve said it well, and I agree that we’ve have good deal of work ahead for us, and it won’t be easy either.
Thank you so much for commenting and reading my blog. It’s heartening to know a practitioner of your caliber and experience finds my stuff thought-provoking.
Warm regards,
Judah
Eric T. Peterson added the following ...
Aaron (and Judah, since it’s Judah’s blog): Agreed we should have come back to this in Napa but hey, we live about a mile apart so perhaps some evening over beers … I guess my point is that industry-level stuff shouldn’t really matter if we have good XML-based data interoperability standards.
You say “any standard has to go one step further and include definitions of industry-specific high-value events. Otherwise, all you’re left with is the core dimensions and measures that are the same across all sites” but my read on Judah’s point is that there ** are no ** core dimensions and measures that are the same across all sites. Because the technology vendors (who I don’t particularly fault) each have different storage strategies, different schema, different core definitions for their dimensions, different calculations for certain metrics, etc., there is no opportunity for data portability.
Not that a standard XML schema necessarily resolves this issue, but it would be an important first step to determining where fundamental inconsistencies exist between say, WebTrends and Unica. Imagine if I could easily “unload” my WebTrends data and move it entirely into Unica … then I could make an apples-to-apples comparison of the respective ETL and processing rules. Would that not be cool?
This is the essence behind my post on the recent WAA Standards document — that standards alone are not enough and that if the WAA wants to be effective they’ll need to either A) highlight which vendors are “standards compliant” or B) translate the standards into something technically useful (like Judah’s XML Schema proposal.)
This is a really long way of saying “give me the standard for data collection, storage, and movement” and I’ll figure out the industry-specific high-value events as necessary. Hopefully that makes sense.
Again props to Judah for bringing up the idea. If you want to run for WAA and lead the standards effort, you’ll certainly have my vote.
E.
Judah added the following ...
Eric,
That would be cool. “Give me life, liberty, and the pursuit of happiness” is in web analytics your mantra: “give me the standard for data collection, storage, and movement.”
Businesses are not democracies, and the WAA is no Martin Luther, so the 26 tenets are only a nice read right now, because the vendor’s aren’t doing shite about them. Why? No revenue is on the line. Whose demanding compliance? And what do the vendors even say to potential customers when asked about the standards? I don’t recall any vendor even messaging anything about the standards.
Highlighting which vendors are compliant and codifying it all is the only way an industry body can start pointing the finger at vendors and say “You comply. Good for your customers because by using your technology they can do X faster, Y more profitably, and Z at lower cost” (the three core actions ). Or “You don’t comply, that sucks for your poor customers.” Then we start talking about negative messaging that erodes brand equity, thus, more potential for revenue impact to get vendors moving.
I think Aaron is talking taxonomy, whereas you point out we don’t even have an ontology yet. Like you are saying XML and Aaron is saying XBRL. Both serve business purposes, and I agree that you need to have XML before the XBRL. So yeah, that’s my long way of saying what your saying makes sense.
Thanks commenting on my blog Eric. I appreciate the thoughtfulness.
Judah

The Big Integration » as clear as mud added the following ...
[…] it here. Share and Enjoy: These icons link to social bookmarking sites where readers can share and […]