Web Analytics Blogs

Judah Phillips is an experienced web analytics practitioner and Internet expert currently working as a Senior Director at a large, global Internet company. His blog is full of useful, unbiased, actionable insights learned from the real-world practice of a process-oriented, integrated approach to strategic Web Analytics for improving business performance.

Subscribe to Judah Phillips weblog

Archive for 'Data Quality'

« Previous Entries Next Entries »

Questions to Ask When Assessing Web Analytics and some Random Thoughts…

At some point in the career of a web analyst, you will be asked to investigate, assess, and possibly judge the current state of how a company “does” web analytics.  What are some of the areas you should ask about?  Here are some thoughts and a few questions to ask to help inform your analysis (and grease your mental gears):

  • Business strategy.  Why does the organization do web analytics?  What’s the goal of having a web analytics team?  Who defines the strategy?  What is the strategy?
  • Analytics organization and team structure.  Who is the chief owner of web analytics?  What does the analytics team look like?  How has the team structure been formalized in the organization?  Is the web analytics team effectively staffed and have enough control over resources to do the job?
  • Process.  What analytics processes have been defined?  How does a site or site feature progress from not being measured to being effectively measured?
  • Data collection. What methods for data collection are being used?  How much data is being collected, and for how long is it stored, and at what level (i.e. detail, aggregate)?
  • Reporting.  What data is reported?  What do the reports look like?  Who creates them?  How are they distributed, and in what format?  To whom?  When?  How?
  • Analysis.  What’s the difference in this company between reporting and analysis?  How is analysis communicated to stakeholders?  When?  How?
  • KPI’s.  What Key Performance Indicators are you measuring?  How are they relevant to the business?  What actions have people taken from KPI analysis that improved business performance?
  • Segmentation.  What audience and customer segments exist?  What audience and customer dimensions and attributes are segmented?  Why are they meaningful to the business?  What has the business learned and what action has been taken from the current segmentation analysis strategy?
  • Technology.  What analytics technologies are being used?  What does the schema for web analytics look like?  What homegrown technologies are used?  What external technologies have you bought or deployed for analytics?
  • Integration.  How is web analytics data integrated with other internal and external data?  Is it integrated with other systems, how? 
  • Site Optimization.  Does the company test landing pages, and/or use AB or Multivariate testing software?  If so, whose software, and what business gains have been realized?
  • Advertising/Advertisers. How is analytics used to inform or enable advertisers and advertising?
  • Privacy.  What safeguards does the company take in protecting analytics data? 
  • Qualitative Data.  Is qualitative data contextualized with web analytics data? Do you capture voice-of-customer data?  Use Net Promoter Scores?  Have a research department?  Does web analytics collaborate with research? 

Those are just a few questions to ask.  Many others can be asked.  What would you want to know, and what would you ask?  Please leave a comment.  I’d love to hear your thoughts.

Now for some random thoughts:

  • News from Orem.  API / Fusion / Video Tracking… cool.  I’m pretty psyched that Omniture announced a web services API.  That’s fantastic, and confirms how truly important integration is now and will be in the future for analytics data (as I’ve been saying for years… Google will be next). 

Omniture has announced a new methodology, Fusion, and improved capabilities for tracking video.  All sounds very exciting.  But, like Eric, I’m wondering what revolutionary new methodology Fusion really is?  Or is just what Eric’s been saying for the last 4 yearsbranded by Omniture and delivered by the Great Belkin? 

Regarding the video capabilities, I haven’t seen a real demo yet, but I wasn’t immediately impressed with what I saw on my friend Marshall’s blog.  Instead of quartile tracking, it seems like you track the playhead (the part of the video playing) across audience aggregates in increments of one-twelfth, and you get some bubbly visualization (what would that look like with 10,000 videos on your site?), and better access to forums.

I’m hoping I haven’t seen the whole ball of wax, and I look forward to Omniture giving me the grand tour. 

But for a playhead visualization, I was much more impressed with what I saw from Visible Measures and their engagement curve.  And what the heck are those folks at Divinity Metrics up to for measuring video? 

  • News from Novato.  One of my favorite gangs of web analytics folks reside in Northern California.  My colleagues at Semphonic have just released a rather impressive “Omniture Implementation Toolkit.” 

I was able to procure a copy, and I’m totally impressed.  It’s full of hard-learned and hard-earned real world practitioner knowledge.  If you are trying to implement Omniture, it is well worth the money. 

Now I’m not sure if this document competes with or acts as a companion to Fusion.  All I can say is that I know the folks at Semphonic are smart, savvy, and very experienced, and there are thousands of Omniture customers out there who could benefit from this document.

  • X Change Conference.  I am totally excited for X Change brought to us this year by Semphonic and Web Analytics Demystified.  The last X Change in Napa at COPIA was one of the most intimate, educational, stimulating, and enjoyable conferences that I’ve been too (and did I mention the wine?).  It was pure “class” all the way (in both the sense of style and learning, and did I mention the wine? ;-). 

This year attendance is limited to 100 folks (99 if you count me ;).  Last year, I huddled on “Deploying Measurement Systems in Globally Distributed Enterprises.”  

If you aren’t familiar with X Change or Semphonic  check them out, and make sure to read a few of my favorite bloggers - the prolific deep thinker and expert Gary Angel, the always impressive (and fun) June D(ershewitz), and bright author and web analytics veteran, Phil Kemelor.

Web Analytics Data Collection for Beginners

I’ll get back to talking about the web analytics team soon, but I’ve been getting a few emails from folks just starting out who are a bit confused about data collection.  So I figured I’d blog about it…

When web analysts talk about data collection, they are referring to the method by which counts and measures of things, like page views and durations, are captured by a web analytics tool.  If you’re new to web analytics, data collection can be slightly confusing.  There are three “generally-accepted” methods for data collection in the web analytics industry: 

  • Page tags.  Client-side data collection involves using little snippets of HTML code that reference a JS file and communicate via a beacon to a “page tag server” - the machine that collects the data so it can be sessionized by the web analytics tool (it may not be called that by your vendor).  As a web analyst, if you are using page tags you will have lots of fun tagging every page on your web site and instrumenting the tags with custom variables and campaign codes.  Reasons why people like page tags are numerous, and include the fact that they are fairly efficient in filtering out non-human traffic (as long as the robot doesn’t execute javascript) and can count proxy cached pages (improving accuracy). Page tags are probably the most ubiquitous method for collecting web data today.
  • Log files. Server-side data collection involves parsing text-based log files generated by Web servers.  The server, when instructed to do so, logs every request received by clients in a file called the “log file.”  There are many formats for log files.   Each line in a log file is called a “hit” and contains lots of different stuff - from the ip address, a request date/time stamp, the item requested, user agent, referrer, and more.  Many “hits” make up a single page view - that’s why it’s incorrect to use the term “hits” to refer to “page views.”  As a web analyst you will be defining the format of the log file within your tool and moving and synchronizing log files so that they can be processed by your tool.  Some people will claim log file analysis is dated (historic may be more appropriate), or less accurate than page tags (due to caching issues).  Other people like logs because they can reprocess their data. 
  • Packet sniffers.  Network data collection involves deploying either software or hardware that intercepts and logs traffic coming over a network.  Every packet is captured and decoded according to a configuration you define.  Your web analytics tool can be configured to process the data captured and decoded by the sniffer.  Packet sniffers are a less common approach for data collection by web analytics vendors.  

Interestingly some vendors offer “hybrid” data collection, which combines multiple data collection methods.  This mode could be considered a “fourth type” of data collection.  Most commonly hybrid data collection means using logs and page tags to collect different data elements, but other combinations are possible as well. 

As you investigate the best data collection method for your implementation ensure you deeply consider the pros and cons of each method.   For example page tags capture information about the browser (like screen resolution) that logs just can’t.  But what about if you need to measure non-javascript executing clients, like some mobile devices?  Log files capture information about crawlers (i.e. robotic traffic) that page tags just can’t.  But can you adequately filter robotic traffic and maintain host exclusions?  Packet sniffers capture pretty much everything, but can be challenging to customize to your exact data needs (and you’ll need a fair amount of IT support). 

Which one is correct for your implementation?  It depends on your business goals defining what you need to measure…  

onlinedata.jpg

Anil Batra needs your help with a Bounce Rate Survey!

My friend Anil “Batman” Batra, over at ZeroDash1, created a new survey on “bounce rates.”  He’d really like you, good reader, to take the survey, and so would I.  It doesn’t take very long to complete.  I’m looking forward to him sharing the results, for free, with the entire industry. 

His survey can be found here: 

http://www.surveymonkey.com/s.aspx?sm=IFDf5Jtenl_2fsq_2fuwemHmJA_3d_3d

If you’ve never met Anil, he’s a wonderful dude who keeps up an awesome blog over at: http://webanalysis.blogspot.com.  I’m a regular reader of all his thought-provoking blogviations.  He batted 5 for 5 on his Web Analytics 2007 predictions, and has posted his predictions for 2008 too.  Check it out.  And thanks in advance for taking his survey!

The Yin and Yang of Online Metrics: Audience Measurement and Web Analytics

I write a monthly column for Mediapost’s Metrics Insider.  This month I wanted to talk about the different schools of thought in online metrics because at the end of the day we are all in Internet measurement together. Hope you enjoy the read:

Audience measurement and Web analytics systems are like the yin and yang of online metrics. Yin and yang are different, opposing forces, but they also complement each other. Think of Web analytics and audience measurement data in the same way: different, sometimes in opposition, but complementary.

The major difference between these systems is data collection:

  • Audience measurement companies don’t collect data directly from the sites being measured. They all rely on proprietary methods. Hitwise gets data from ISPs. Compete uses a toolbar that you can download as well as ISP and panel information. Nielsen and comScore use data collected from panels to create online metrics that they believe accurately represent overall Internet usage. Due to all these different data collection methods and no shared standards across companies, metrics from audience measurement firms are never identical with each other.
  • In Web analytics, data is collected directly from actual site activity. Methods include client-side data collection via javascript page tagging, server-side data collection via log file processing, or network data collection via packet sniffing. Sometimes methods such as page tagging and log file processing are combined in what’s called “hybrid data collection.” Vendors include Coremetrics, Webtrends, Unica, Visual Sciences, Omniture, Google, and others. The challenge with Web analytics tools is that each tool will calculate different numbers from the same source for identical metrics. In other words, Omniture numbers won’t match Google’s. That’s because each tool has its own “secret sauce” for “sessionization” — the fancy term for the way metrics are counted and measured by analytics technology. For example, certain tools may be configured to include or exclude certain filetypes or server responses. Robotic traffic may or may not be filtered.

It’s worth noting that a company named Quantcast uses panel data and also enables a site to add page tags to collect actual site data, which are then merged together in a completely different type of “hybrid” model.

All these different approaches to data collection lead to opposition when these systems are used for the same purpose. For example, conflict arises between the yin and yang when identifying reach using unique visitor metrics. Audience measurement firms may cry “cookie deletion” when analytics tools are used to count unique visitors, and Web analytics firms may shout back “coverage error” and “selection bias” at the unique visitor numbers from panel-based firms. Another area of opposition is demographics. I’ve been told that only audience measurement firms provide demographic data, and that you can’t get demographic data from Web analytics systems. That’s not true at all.

All enterprise-level Web analytics systems provide demographic location information at the country, city, state, and MSA levels. This information will be different than that provided by audience measurement companies.

Demographics that are harder to elicit from a Web analytics system, but are easily provided by audience measurement, include attributes like a visitor’s age, gender, occupation, income, and education.

But it is possible to integrate very detailed demographic attributes per visitor into a Web analytics system! Once demographic information is captured in a registration database, it can be joined with behavioral data in the Web analytics system and reported on. For a real-world example of analytics/demographic integration, take a look at what Microsoft is doing with Gatineau, the company’s free Web analytics offering currently in beta. Microsoft is joining Web site behavioral data with rich demographic data from MS Live profiles.

Even with differences and oppositions between these online metrics systems, companies find ways to use the data in complementary ways:

  • Audience measurement data is useful for competitive intelligence. All the paid and free services provide data for comparing the performance of a site to other sites, for understanding audience behavior across one or more sites by demographics, and for understanding generalized Internet traffic trends and search terms.
  • Web analytics data is useful for understanding site effectiveness, for defining key performance indicators, for determining conversion rates for marketing campaigns by channel (such as search, email, rss), for understanding what sites and keywords are driving traffic to your site, and for segmenting and reporting online metrics.

You can even use both data sources as part of the same site optimization activity. For example, you could use audience measurement data to determine that a competitor is gaining ground on a particular product or search term. Then you could look at your Web analytics tool to see how you’re doing for the same term and how visitors who searched for that keyword behave on your site. You may find a high bounce rate and low conversion rate for the keyword, so you segment that data perhaps by demographics! Next you suggest a hypothesis to minimize bounce and maximize conversion for each segment. Then you test your hypothesis, and reexamine the data. Based on the results, you then continuously improve your online performance through controlled experimentation. At the end of the day, you will drive more online revenue by understanding how the yin of audience measurement and the yang of Web analytics complement each other, than by worrying about how they differ and oppose.

yin_yang.jpg

Web Analytics and Targeting: A Quick Blogviation

Targeting refers to the process of identifying characteristics of a segment so that relevant content may be matched to it and delivered at a time when the segment is most open to the message. The idea is the right content to the right visitor at the right time (optimally in real time). 

For example, you may visit a site, and see some type of ad unit calling out at you to “meet singles in <insert_your_city>.” When browsing real estate you may see ad units for realtors and mortgage companies.  After entering a keyword such as “car prices” and clickingthrough the SERP, you may see an ad for a local car dealer.   That’s targeting in a nutshell.  It’s simple: 

  1. Visitor X has these attributes. 
  2. We have content that we think will appeal to Vistor X’s attributes. 
  3. Let’s show that content. 

While targeting has helped to increase ad clickthrough rates, it’s far from an ideal science.  Current methods for targeting have inefficiencies.  What if Visitor X just bought a new car after his recent marriage?  Unless the targeting engine is made aware of the visitor’s current state, the targeting may be off and not yield desired results. 

Even with limitations around “current awareness” targeting is perceived in the Internet industry as a crucial activity for maximizing the effectiveness of advertising and content.  Targeting is the next stage after A/B and multivariate testing.  Once you determine the preference of segments based on testing, you identify content to target. 

In new media, targeting is something associated with paid search campaigning, ad serving, and content optimization.  It’s not uncommon for targeting activities to be based on:

  • Category and sub-category.  Conceptual constructs like “categories” of topics on a media web site or products on an ecommerce site can be targeted to include certain types of ads or messages.  The notion of a “zone” fits in here as well.  The idea is that if visitors are browsing in your category for “hardware floors” you could offer them an ad or content specific to “flooring installation services.” 
  • Geography.  Country, region, city, state, DMA are all targetable constructs.  You may choose target people surfing from 02141 (Cambridge, MA) an ad for pre-sale Red Sox tix or content about Mike Lowell’s recent contract.
  • Browsing environment such as the connection speed, type of browser, operating system, user software, domain, and ISP.  An ad network serves an ad for Verizon DSL to a modem-based surfer by detecting the visitor’s browsing environment.
  • Time.  The idea of only showing content during specific periods of time is called “parting.”  Common types include day-parting and season-parting.  For example, a B2B site only choosing to show ads for a particular manufacturers product during business hours – the site’s busiest time of day – would be an example of day parting.
  • Keyword.  There are many different types of keyword targeting.  Google does fantastic things with targeting ads based on the keywords in queries.  Content Management Systems can target content based on on-site search keywords or referring keywords.  “Keywords” may be associated as metadata with site sections or pages, similar to a zone or a category targeting on an ad server.  Once a page is associated with “keyword” metadata, you can tell your server to target that keyword (and all pages where it exists as metadata).  If two categories each with different content share a targetable keyword, I can target ads across both categories to pages tagged with that specific keyword.
  • Language.  When a language is set, you can target ads to visitors with that setting. Think Google.  Keep in mind that when you target by language, the creative copy is not translated. 
  • Demographics. If the ad server is aware of a segment’s demographics, such as age, gender, income, title, purchasing power, and so on, an ad can be targeted on that basis.  Sometimes this is called “profile targeting.”
  • Context.  Think of Google AdSense and how it matches ads based on the semantics in site content.  Now you understand content targeting based on context.
  • Profile.  Targeting is possible based on conclusions drawn and rules created from the known attributes (such as purchasing propensity) about and individual or segment.

Enter one of the holy grails of online advertising and new media: “behavioral targeting” – an advanced form of targeting. Behavioral targeting refers to the process in which content is shown to a visitor based on the web sites they visit (or have visited) and the actions they take on those sites.  

Behavioral targeting involves:

  1. Knowing where a visitor “comes from” and what they’ve done in the past. 
  2. Determining the context of the visitor on the site. 
  3. Detecting the visitor’s current behavior.
  4. Serving relevant content and/or ads matched to the behavior.

By understanding the visitor’s past history, current state, and most recent behavior the marketer can target content in order to influence some point in the customer buying cycle- often at the stages of awareness and consideration.

So where does web analytics come in?  You would think web analytics data from “web analytics” technology would provide the seed data for enabling “targeting.”  It can be but in most cases, targeting is a function provided by the ad server or network or another technology called the “behavioral targeting platform,” not the analytics tool… the data does not come directly from the web analytics tool.  I’d love to hear how well (or if at all) Omniture TouchClarity is integrated with Omniture Discover or other offerings. 

In order to make web analytics data useful for targeting (if you can at all), you will need to use your web analytics data to:

  1. Define segments to target (hard to export from web analytics tools)
  2. Feed those segments and associated behavioral data to another tool (achievable if you own your data and run a tool in-house.  Harder and more costly if not).
  3. Report on segment performance after targeting (that requires employing the right people and enabling them with the right tools)..
  4. Analyze segment performance after targeting (again employ the right people and enable them with the appropriate tools and resources).

While I’ve only covered a very little bit about “targeting” and even less about “behavioral targeting” in the context of web analytics, I hope that my simple description of current methods for targeting and some thinking about “what is BT” will help you understand the emerging ecosystem in which analytics tool are interoperating now and will interoperate in the future.

bt.bmp

Web Analytics Data is Free. Where are the Web Services?

Web analytics data is the raw material from which companies will realize new online products and deliver differentiated services that generate future value.

Right now as I type I can get web analytics data from so many sources.  Google Analytics and Open Web Analytics provide the data for free (once I spend the overhead to set it up).  So does Compete and Quantcast.  Many other companies are willing to broker this potential commodity to me at various price points - from the low four to seven figures. 

The price from web analytics firms for what is essentially the same data is all over the map!  Why? Perhaps because enterprise vendors know that you the customer will no longer pay for just data (thanks Google!). 

Features and services provided on top of the core data that’s valuable to a practitioner.  To the company employing the practitioner, it’s the insights generated from the data that’s valuable.

We’re seeing a lot of web analytics data operationalization via features for:

  • Extension into business intelligence (and in the future leading to business analytics).  The best web analytics firms are providing open relational databases and creating methods for joining data from other systems to “extend the data model” or feed the enterprise data warehouse.  
  • Automated testing and optimization.  The notion that “you aren’t doing web analytics, if you aren’t testing” provides evidence that siloed, lonely data won’t do much for your business.  In that light, automated testing is only as useful for prediction as the people setting up the tests.
  • Targeting.  Using analytics data during the session or after it to automatically target content based on key visitor attributes will increase conversion.  While targeting technologies use analytics data, the value derived isn’t from the data, but from the potential conversion lift of the activity we call “targeting.”
  • Proxy scoring.  Assigning a value to an event, interaction, page view, or visit can identify high-value segments and customers.  Scoring abstractions operate on the data to indicate value. 
  • Profiling.  Building a picture of your online audience by aggregating data from various sources including web behavior, customer transactions, and demographic data enables one-to-one marketing.  Web data is part of the profile.  The value is in the profile.
  • Integration.  Joining analytics data with data from other systems in a unified data model, or enabling machine-to-machine communication of analytics data will yield value.  Again the data is important, but the value is in the outcome.
  • Alerting.  Indicating when data exceeds pre-defined upper and lower bounds and where those thresholds have been exceeded is valuable.  Once again, the data is crucial, but the alert is catalyst for value creation.

Data SEPARATED from the application, from the presentation, is extremely valuable.  When “unsiloed” and described or made available using open standards, it can be reused by other applications.  Insights realized from moving/sharing/synching data that drives enterprise value 

Yet functionality on top of the data layer doesn’t make web analytics easy and instantly drive value.  Few corporations have the slightest clue about how to take advantage of all this functionality. 

To creation value the modern web analytics practice requires:

  • Dedicated in-house professionals. No duh, here.  You need people who understand the data, use ”features” to help analyze it, and who can then test hypotheses to optimize and measure outcomes.   
  • Vendor and third-party professional services.  Consultants must go beyond “repeating back what you say, then claiming they can solve the problem” and deliver quantifiable, measurable value that improves business process.
  • Web services. Web analytics tools need to use web services.  The “web analytics” tool of the future will take advantage of technologies that provide platform-independent protocols and standards used for exchanging data between applications. 

We’re going to continue to see highly-specialized web analytic’s “experts” at companies work with services firms (or professional services staff from vendors) to combine the ”off the shelf” web analytics products with web services technologies to create the automated marketing architectures of tomorrow. 

These “marketectures” will:

  • Bridge quantative analytics data with qualitative data to identify the 360-degree view of the customer experience.  Ahh buzzwords…
  • Use web services to operate on analytics data to solve highly-specific and specialized technology and business problems.  For example, using a WS to pass custom interaction events in Adobe AIR/FLEX to understand behavior in your RIA. 
  • Enable HUMANS to realize new *predictive* insights from data.  Yeah, automated testing will help, but think about it… who sets up the tests? 

porkbellies.gif

A Note on Web Analytics and Ad Server Metrics…

In wild world of online metrics, it’s a well known fact that metrics from web analytics tools and ad servers never match. Variances can be substantial. 

What I mean is that, given no “refresh rate,” the total impressions for a single ad unit, which should be served on every page request, never matches the number of total page views on the site during the same period of time.  Sigh.

Reasons why identically-named metrics from these two tools (like page views and unique visitors) don’t add up are numerous:

  • Different data collection methods.  Ad servers use page tags.  Many web analytics tools use page tags, but it’s not uncommon in web analytics to use additional methods, such as logs or packet sniffers.  The methods have no shared standards for collection or storage of the same data (like visit-level data).  Thus you get apples to strawberries comparisons when attempting to correlate the dimensions from different systems.
  • Unique data models.  Ad servers aren’t focused on counting page views and the other dimension of web analytics (visits, time, and so on).  Rather ad servers focus on serving and counting impressions served (and loads of related derivative calculations, like CTR, CPC, and the coolness of view–thru).   Metrics are based on an ad request and an ad code.  Ads aren’t targeted to a page (though that’s possible), but rather to a “zone” or “keyword.” What that means is that “page” dimension may not even exist in your ad server’s schema.  In other words, you aren’t looking at impressions measured on a page, but rather at the number of impressions served in a different conceptual construct.  That’s one of the reasons why people say metrics and ad-serving systems “don’t measure the same thing.” 
  • Untagged pages.  Just like analytics implementations suffer from challenges related to complete code coverage of page tags, so do ad serving implementations.  Companies need to determine how to centrally manage the deployment and orchestration of page tags *of all types* and verify all the pages have tags!  Don’t just expect it to work because tagging sounds so easy!  Suspect it won’t work, and determine what you’re going to do *before* you deploy.  Too late?  Time to reengineer. 
  • Non-JS executing clients.  Ad servers use page tags.  Not everyone and not all user agents execute javascript.  Everyone needs to realize that page tagging misses traffic as efficiently as it excludes it.  Period.  What percentage of the traffic you miss, you’ll never know… running and filtering your logs may provide an indication…
  • Ad blocking software.  Firefox’s Adblock Plus software is a big problem for sites that have a big techie audience, and it affects all sites.  Check your browser reporting and realize a good majority of those Mozilla users may be blocking your ads.  Look at the attitudinal data you have about visitor’s to gauge whether that’s a big issue for your online audience. 
  • Cookie issues.  Third-party cookies get blocked (often by privacy software).  Many ad servers still serve third party cookies, and many corporations have not tricked their DNS to accommodate this issue (ahem, CNAME).  We all know how cookie deletion affects unique visitor counts.
  • Refresh rates. One page rendered in the browser and many banner “refreshes” makes it really hard to correlate page views and impressions served.
  • No rich media installed, and no fallback.  If the client doesn’t have certain plug-ins, and you have no fallback, you miss ad revenue.  Meanwhile the tag executes and you count the traffic.
  • Robots, spiders, and crawlers, oh my.  The web is so robotic.  The problem is amazingly understated, especially by companies who want to bill you on page views.  Different data collection methods allow some level of bots to dirty the data.  Logs are harder to efficiently filter.  When the ad server uses tags, and the analytics tool uses logs, you may get some wildly different numbers. 
  • Mobile, Mobile, Mobile, Mobile.  Not all Internet-connected mobile devices will display ads, but web analytics tools will track the behavior of mobile visitors.
  • Latency.  Visitors who move through the site too quickly may not execute the tag, thus no data is sent back to the server(s).  Ever wonder why vendors tell you to put the tag “high” on the page?

The influence these issues have on your site varies depending on audience.  Investigate factors causing variance and deviation between metrics systems, and educate your audience on why the numbers differ.

adserver.gif

Whence the Metric!? Riffing on the Basics of Web Analytics Data Sources

The concept of web analytics data sources isn’t discussed nearly as often as web analytics “data collection.”  With that in mind, I’m often asked by people just beginning to explore the wild and wonderful world of web analytics  “where does this metric come from?” 

When people ask me that question I often think back to someone who I may have seen walking in the local village a few miles away if I had lived about 170 years ago.  Ralph Waldo Emerson.  The father of American Transcendentalism once asked “whence is the flower?”  From where did it come from?  Ralphie-boy was questioning the accuracy of long-standing 19th century beliefs.  Keep in mind Mr Emerson parted ways with Harvard for questioning the Trinity (not Avinash’s ;)!

In that spirit, people are really asking me “whence is the online metric?”—from where does the metric come from?  Advertisers and your colleagues want to know because they may be questioning the origin and accuracy of our 21st century metrics! 

While folks who have been “doing web analytics” for some time know how to answer that question, I thought it would useful to blogivate on data origination for folks who are new or aren’t dealing with web analytics in full-time, day-to-day role. 

The answer is that online metrics originate from one of three data sources:  internal data sources, external data sources, and hybrid data sources.  Each source has its own particular challenges to “accuracy.”

  • Internal data sources.  Web analytics technology that collects data from websites, mobile phones, or other Internet-connected devices via javascript page tagging, log file processing, or packet sniffing falls into this category.  Major vendors include Unica, Visual Sciences, Omniture, CoreMetrics, Webtrends, Google Analytics, and others.  The issue with Web analytics tools is that two tools will yield different numbers for the same data source.  That’s because each tool has its own “secret sauce” for “sessionization.”  That is, each tool counts traffic in slightly unique ways.  For example, tools may be configured to include or exclude certain filetypes or server responses.  Robotic traffic may or may not be filtered.  Web Analytics tools also depend on cookies for attributing “uniqueness” to visitors; thus, cookie deletion can overinflate unique audience numbers. 
  • External data sources.  Data collected from panels, toolbars, and ISP’s–not from the actual site– are examples of external data.  Companies like Comscore, Neilsen Netratings, Compete, and Hitwise provide metrics generated from external data.  These companies provide some sort of incentive (i.e. gifts) or perceived value to entice people to participate in panels and download software that observes their Internet usage.  Self-selecting panelists are monitored, and metrics related to their behavior are projected to the entire online universe using statistical methods.  External metrics are never identical with each other because of differences in consistencies of their panels.  As you’ve probably noticed, Comscore data does not match Nielsen.  Significant divergences across panel-based measurement systems when compared to each other and web analytics tools has led to an audit by the IAB and MRC.  The hope is that auditing will vet methods and identify any potential coverage error or selection bias inherent in sources of external data.  Personally, I’m really curious to see what all the hubba bubba about auditing without guiding standards will really accomplish.
  • Hybrid data sources.  Sources of metrics that use some type of internal data collection and some form of external data collection are considered hybrid data sources.  Microsoft Gatineau and Quantcast offer free services that fall into this category.  Reports in the highly anticipated web analytics offering from Microsoft include data collected from Javascript page tags combined with anonymous demographic data from Microsoft Live profiles.  Quantcast’s panel-based measurement system (i.e. external) may be augmented by adding a javascript page tag (i.e. internal) to every web site page.  Behavioral data is collected via javascript and combined with demographic data from the Quantcast panel, then reported to end-users.

In real world practice, companies use many of these different, overlapping data sources to understand their online presence.  Given the free nature of companies like Quantcast and Compete, and the pervasiveness of firms like Neilsen and ComScore in sales and agency cultures, expect that different sources regardless of type will never ever absolutely match site-specific Web analytics tools.  And that’s okay because data from all these data sources have different utility:

  • Metrics from internal data sources derived from web analytics tools are very useful to site owners for identifying the behavior of their online audience.  The data can be used for site optimization, to understand what sites are referring traffic to the site, to identify conversion rates for particular marketing campaigns, to understand the broad content themes and particular search keywords driving traffic on your site, to segment and give context for other metrics and attitudinal data and more.  Metrics from site-centric sources should be provided to advertisers for comparison with external data.  Be prepared to discuss why the numbers differ!
  • Data generated from external sources are useful to advertisers and agencies.  These metrics can be used for comparing the performance of a site to its competitors and for understanding audience behavior across one or more sites by demographics.  Media buyers and web strategists desiring to understand generalized Internet traffic trends and measures of site popularity use external sources.  Metrics from “audience measurement” sources should be used for comparison with internal data.  But the data won’t match, and that’s okay because you should be looking at the site performance of your competitors, not using that data to optimize your site.  Use the external data for insights into demographic makeup of your audience.  Then compare that information to data from your own internal research teams (who don’t report to web analytics). 
  • Numbers from hybrid data sources blend both external and internal metrics together for both site owners and advertisers.  No duh, ay?  New insights about the online audience can be realized from segmenting visitors based on demographic and behavioral data within the same source.  We’re just entering the “early adopter” phase of this market, so I’m curious to see how it all plays out.  How will Microsoft Gatineau differentiate its hybrid analytics service and communicate the value proposition?  Will publishers adopt Quantcast’s hybrid service (and how will they make money)?  One barrier to adoption is that some companies have already combined web analytics behavioral data with audience demographic data using business intelligence and data warehousing technologies (or the more flexible and open web analytics tools that suppor open software standards).   Companies bridging data internally hope to enable a grand orchestration of automatic site optimization and content targeting (with an eye toward behavioral targeting).  Microsoft already seems to be doing this targeting to some degree on based on what I learned about “personamous” content targeting on MSN.com (at Emetrics).  So I’m curious what other online products (for example advertising offerings and site optimization technologies) the gentlemen at MSFT have up their strategic sleeves based on this hybrid source data. 

As you immerse yourself in the wondrous world of overlapping, hardly-standardized, online metrics, it’s critical to consider the source of the metric, bias from the source, and the audience for the metric.   Most critically, understand how a metric, regardless of source, relates to business goals and site objectives before using it as measure for identifying your online performance to internal and external stakeholders and taking action from your insights. 

GO RED SOX! 

 soxyank_renamed1.jpg

Online Metrics need an XML Standard

I’m contributing a monthly article to MediaPost’s Metrics Insider Column.  My first contribution was published last week, and I’m reposting it here to get your thoughts.  Soon I hope to describe what I mean in more detail… The article was called “The Most Measurable Medium needs and XML Standard.” In case you missed it here it is:

OVER THE LAST TWO WEEKS, my fellow Metrics Insider columnists have correctly pointed out that online metrics are neither standardized nor easily integrated across systems. Vocabulary is muddled. Numbers do not match. Data exists in silos and is isolated from related data. Systems do not adequately or easily talk to each other. Research services, ad servers, and Web analytics tools report similarly named, overlapping and often conflicting metrics. Unfortunately, these problems will not disappear anytime soon, even with emerging “standards” and continued attention paid by the industry to these important issues.

Current industry standards for Web metrics are limited, basic, and come from independent entities. Most recently, the Web Analytics Association released a set of “standards.” The WAA’s standards are elementary definitions of concepts from various periods of Internet measurement. Web 2.0 concepts like “events” are mingled with dated measurements like “hits.” Regardless, these definitions provide a very useful starting point for framing a discussion about metrics. Recently, I’ve learned that the IAB and MRC are developing a set of IAB Reach Measurement Guidelines. Let’s hope the IAB and WAA align their work efforts.

The IAB and MRC are also currently auditing “audience measurement” firms, like Comscore and Nielsen. It’s rather unclear to practitioners what standards the IAB/MRC are applying to the audit. But the hope is that auditing will expose issues of coverage error and selection bias in the black box methodologies used to create the panels and generate the audience measurement data.

It is important to note that the IAB’s audit has two parts. The first is certification, which indicates the company being audited is applying the “standards,” and the second is accreditation, which demonstrates adherence to the IAB standards.

Only time will tell if companies like Hitwise, Compete, and Quantcast will be asked to submit to auditing. It’s worth mentioning that legacy metrics “standards” (and audits) from historic organizations like ABCe still occur and carry weight with publishers and advertisers (especially outside of the United States). It’s entirely possible that newly formed organizations, like the Association for Downloadable Media will offer their perspective on “standards” for online metrics.

The idea of “standards for the standards”–however absurd it sounds on the surface — starts to seem like a good idea when considering that all these parallel efforts aren’t intersecting. Honestly though, I question whether “standards” that are purely “definitional,” even if agreed upon, will solve many of the measurement challenges companies have when trying to understand Web data and take action from it.

Standard definitions are helpful for promoting understanding and creating a controlled vocabulary for discussing online metrics, but they don’t help with what I see as a huge challenge in today ’s metrics technologies. The problem is this: currently available online metrics systems do not adequately separate data from presentation . That’s a huge limitation preventing Web data from being easily integrated with other systems.

Detailed-level Web data (the raw data) is often costly to extract, if available at all. It is nearly impossible to deliver detailed data in real time from Web analytics, ad serving, and research-based technologies in order to feed other systems. The majority of hosted (ASP) metrics systems are closed and do not allow access to key interfaces using open software standards. For the most part, today’s metrics technologies are black boxes where data goes in, but can only be extracted in various file formats after creating a report. Common export formats include csv, pdf, and doc. While XML exports are often available from many vendors, there is no standard XML schema for describing the same type of Web data across different sources!

The industry must begin collaborating and creating a standard XML schema for describing Web data. Creating a widely used, consensus-based, published, and maintained XML standard for online metrics would make it possible to more easily share, transform, and use Web data in other systems.

I firmly believe that current metrics standards must go beyond simple definitions and tackle issues pertaining to data portability and system interoperability. Then we’ll all be in a better position to reuse Web data across the enterprise value chain. Once we all agree on “standard” definitions, I encourage us to start working together to develop a standard Online Metrics Markup Language.

Running Multiple Web Analytics Tools has Risks and Rewards…

Running more than one web analytics tool on a site or across a portfolio of sites is an increasingly common practice these days.  The majority of the companies that run multiple tools probably run one “for pay” tool and at least one “for free” tool.  Based on my experience, the cost of running two “for pay” solutions would be prohibitive for companies still trying to realize the “ROI” from web analytics (but it’s not unheard of in large, solvent, multinational companies).  Not surprisingly, the most common “free tool” ran next to “enterprise-level” :-) tools like Omniture, Visual Sciences, and Unica NetInsight is Google Analytics.  Data from my pal Eric Peterson’s Vendor Discovery Tool shows that GA and Visual Sciences code were found on 6% of tracked URL’s, GA and Omniture code on 4% of tracked URL’s, GA and WebTrends Hosted code on 4% of tracked URL’s.  That’s great for Google and quite an edification of their excellent product!

I’m sure that there are companies running multiple “for free” tools, and/or running multiple big ticket tools (like HBX Analytics and Visual Sciences), and/or multiple homegrown tools built from Business Intelligence technologies and databases (Oracle/Cognos).  Yet running multiple tools has risks and rewards.

Some of the risks of running more than one web analytics tools include:

  • Lack of control over data. If you trying to foment a data driven culture, nothing could be more frustrating than someone outside of the web analytics team downloading a tag, linking it to their personal account, then questioning why X number in Y tool doesn’t match X number in Z tool.  To promote adoption of new technology, running a competing tool has the potential to compromise data believability.
  • Numbers not matching across tools.  Different vendors “sessionize” differently so numbers will never be identical between tools.  Dynamic sites, different underlying site technologies, and unique tool configurations mean numbers won’t match.  Never ever.  Check out Eric Enge’s highly-recommended 2007 Vendor Shootout. Run two tools and be prepared to answer questions about data discrepancies from those who consume reports on the same site from both tools.
  • Conflicting vocabulary.  Different tools use different terminology.  One tool may use “sessions,” while another may use “visits.”  Some tools talk about “views,” while others reference “page views.”  Some tools use the term “unique visitors,” and other tools just talk about “visitors.”  When you are rolling out “web analytics” to people who need to speak the same language, having multiple vocabularies for expressing the same or similar concepts confuses discussion and muddles actionability.
  • Apples and Mangoes Comparisons.  Some tools provide only snapshots of aggregated data, while other tools let you drill down, drill up, and slice and dice on detailed level data.   Some tools enable you to add metrics on the fly to any report, and then filter and cross dimensions until your heart is content.  Two people looking at two tools on the same data may conclude different actions are warranted based on the depth of their analysis.  While a good manager can sort that out, it’s a bother.
  • Potential to misallocate resources leading to needless redundancy.  Companies have limited resources. If I need to apply tags to all my sites so that I can get to the real business of analysis, then why spend valuable time applying multiple sets of tags to enable tools that serve the same purpose.
  • Licensing issues.  Google Analytics or Quantcast account on a corporate site associated with a personal account whose owner would prefer not to give up the password.   
  • Training issues.  When rolling out systems, training is necessary.  Tools take time to learn.  Why have resources learn multiple tools that do more or less than same thing when you got real business to take care of?

Some of the rewards from running more than one web analytics tool include:

  • Comparative data.  If you’re being charged by page views, it’s nice to have an alternate reference point to validate the charges.
  • Differentiated reporting.  Some tools are just better at custom and ad hoc reporting than others.  If you have an inflexible tool not fully loaded with features then maybe it makes sense to get a tool that can do all that stuff a lot cheaper than paying for additional incremental features.  Hello GA!
  • Potential to enable a different level of integration.  Lots of people tell me they download Google Analytics so that they can track their AdWords campaigns. 
  • Ability to leverage different features.  Several major tools are technically and functionally challenged when it comes to simple things like showing the keywords used to drive traffic to the particular page, the number of unique visitors per page, or a bounce rate.  Instead of dealing with complexities, sometimes it’s just easier to download and install a free solution like GA that does all of these things at no cost.
  • Ability to leverage different data collection methods.  Time-based metrics and file downloads are inordinately easier to measure and count using log file tools than using page tag tools, imho.   Why fiddle with some esotericisms in tagging when you can just run the logs?  Or better, yet, use a hybrid approach in one tool and get the best of both data collection methods.

Is it a good idea, as a site owner or manager of a web analytics team, to run more than one tool?  The answer is it depends on your organization’s capability maturity for web analytics and how you balance risks and rewards.

The most mature companies have a centralized web analytics function.  That means the company has one “master user” and “strategic owner” for web analytics and related technologies.  The centralized web analytics function has its own resources dedicated to “doing web analytics.”  Resources may come from other groups within a company, but, regardless, the company executives have identified and placed positional power around a “web analytics champion.”  Since you’re reading my blog, you may be this person!  Cool!

When you have centralization, you control key elements of doing web analytics:

  1. Measuring
  2. Reporting
  3. Analyzing
  4. Testing
  5. Evaluating outcomes

If you’ve centralized your web analytics team, you should select ONE web analytics tool as your Primary Web Analytics Tool

Then I think it is then I think it’s safe to use more than one web analytics tool along these guidelines:

  • Standardize on one tool as your primary tool.  This tool should become the ”bible” for web analytics data at your company. 
  • Give people outside of the web analytics department access to the primary tool and ONLY the primary tool.
  • Use the secondary tool within your web analytics team as a supplemental tool for comparing measurements, data reconcilation/verification, or analysis that you can’t accomplish with your primary tool (such as detailed roll-up reporting). 
  • Keep the overall enterprise standardized on the interface and numbers from your primary web analytics tool.  That way you prevent confusion when reporting to stakeholders outside of the web analytics team. 
  • Do not provide access to the secondary tool to people outside of the web analytics team (unless the numbers match 100%! ;- ) ).

If you have a decentralized web analytics organization, I recommend that you:

  • Standardize on a primary tool (whether free or paid).  Remember Google Analytics is an awesome place to start!!  And so it seems is Microsoft Gatineau!
  • Work toward centralization before introducing another tool that has the potential to undermine current measures, reports, analysis, tests, and outcome evaluations.

On a final note, you may have multiple tools for reporting web analytics data (perhaps from companies like Business Objects, Cognos, Microsoft and so on).  As long as the data is synchronized with the metrics from your primary web analytics tool, that’s fantastic! 

Am I off-base?  Absolutely right?  Do you run two tools and love it, hate it, don’t care?  What do you think?

risk_reward.gif