Web Analytics and Data Collection: The Page Tag

Many methods exist for collecting different types of web analytics data- some much more accurate and useful than others: page tags, log file analyzerspacket sniffers, audience panels, and toolbars.  At this point in time, the page tag certainly seems to be the preferred method adopted by the web analytics industry for data collection.  It’s also core to other technologies, like multivariate testing. 

Various sources will tell you a multitude of “things” over time about page tags, depending on whether they think you are a noob or advanced practitioner.  There’s lots of useful information from many sources about page tags.  Still I find a lot of the average discussion about the utility of page tags to be somewhat true and false, depending on context.  For example, if you’ve never tagged pages, but you have processed log files, does it makes sense to entirely throw away legacy processes?  Or if you’ve mastered change management across all pages on your site but you’ve never parsed log files, do you want to have to deal with synching, moving, parsing, filtering log files?   These questions yield fantastic answers for fruitful discussion as you plan or extend your web analytics implementation.

I’ve heard the following “things” about page tags:

  • It’s easy to page tag your sites.  That really depends on the technologies used to build the site.  A static site with few simple pages is a different animal than a beast of a database-driven site with millions of complicated pages glued together using different technologies. 
  • Page tags are more reliable.  Reliable in what way I ask?  Page tags in hosted environment are processed by a data center many miles away.  I have no idea of the availability of the servers to support the load they are receiving from all the customers, nor do I have any insight into the raw data collected by the page tags, errors, and so on. 
  • Robots and spiders are always removed from the metrics.  Page tags do a better job than log file analyzers out-of-the-gate and especially if you don’t maintain your filtering, but I don’t really think page tags are infallible at all.  Bots and spiders get through.
  • You don’t have to deal with IT when page tagging.  It’s just a global include right?  You just give IT the code, and they include it.  For the most part that’s true, but IT has processes and procedures and your tagging needs to be QA’ed and perhaps even put through a “versioning” process.   Regardless, you’ll need to prep the corporation and your friend’s in IT for the big page tagging effort.
  • Change management is easy with page tags.  If you only use the vendor’s out-of-the-box page tag and have a global include, it’s somewhat easy to manage change.  But in my experience, a web analytics implementation requires using the tag in different ways across the site.  You’ll need to add attributes to the tag or script values into the tag.   In order to manage change, you’ll need to follow corporate processes.  In the case of web analytics page tagging, you may need to create those processes before you manage change… and process creation in corporate environments is sometimes not easy.  You may even need to follow a process for doing so.  :)

The biggest challenges I’ve had with page tagging include:

  • Ensuring complete code coverage across all pages.  The number of pages in your site, the way the site is built, and the technology used to build your site all need to be carefully assessed prior to beginning the page tagging effort.  The larger the portfolio of sites, the more difficult it will be to tag all of your pages.  Other challenges include the number of domains and subdomains, the technology standards used across your portfolio of sites, and whether, in more controlled environments, the web analytics team actually has the positional power to influence the change management process.
  • Determining a method for centralized tag management.  If you are running an internal solution, you’ll need page tag server(s) to collect the page tag data.  If you’re depending on your vendor’s servers, you may need to modify or update tags for various reasons (new campaigns and so on).  You may find new sites that need to be tagged.  New stakeholders may want special tags added to the site to support other purposes (such as multivariate testing).   How are you going to centrally manage all that?   
  • Orchestrating changes to tags across different site sections.  You may want to pass a new value in the tag or change something in the script on page X, but not page Y, or subdomain A, but not subdomain B.  A new campaign may need to be enabled.  You may have to update your tag to take advantage of new vendor functionality.  You’ll need a technology solution and process for centrally orchestrating and controlling tag changes.  Wielding a mighty CMS helps.
  • Reconciling tag metrics with log metrics and determining correct filtering.  It’s great fun to collect tag data then compare it to your log files to determine if all pages are being counted and how effectively you are filtering bots.  While looking at different data sources is a time consuming activity, certain business cases may demand it. 
  • Integrating tagging with a Content Management System.  Major sites use expensive CMS’ to create web sites.  The web analyst should work with the CMS team to build page tags into site pages.  Integration will help you with two points raised above: centralizing tag management and change management/orchestration.
  • Challenges with decodes and lookups using tags.  I’ve learned that it is difficult and in many cases impossible to decode a value or use lookup tables with a page tag.  For example, if I had a page that had a URI “/er45rw/e42f45erfwrq3r.html,” I can’t decode it to read “Web Analytics Blog” in my reports.  Or if it is possible to do simple decodes, I have to hardcode the decode on each page.   Hard coding on an evolving web site is never manageable over the time.
  • Latency.  If the page partially loads without executing the javascript, or if the user clicks through the page before the javascript has fired, the page view won’t be counted.  You must test to make sure your tag is firing properly.
  • Javascript turned off.  If the browser doesn’t execute javascript, the tag won’t fire and the visitor won’t be measured.  What effect will that have on your numbers? 
  • Cookie issues.  Tags may set third-party cookies.  Privacy policies don’t like the third party cookie.  As Justin Cutroni points out in the comments his tool sets first party cookies by default.  That’s good.
  • DNS changes.  To prevent third party cookie issues, you can trick out the DNS with a CNAME entry.  Say hello to someone in IT called the DNS admin!

In discussing the page tag in such a manner, you may think I don’t like the page tag.  Not true at all!  The page tag is a very useful data collection method in context.  I like it very much, especially in hybrid data collection.  Like any technology, just make sure you understand the implications of your data collection method on your implementation.   

page_tags.gif
Courtesy of Zeus.com.

Eric Hansen added the following ...

Hey Judah,

great, educational post. I’d like to add some info about SiteSpect WATTS, which you’ve mentioned in previous posts.

While there’s no doubt a self-interest component to what I’m about to write, I think it’s helpful for web analysts to understand that certain solutions exist to make easier work of implementing, integrating and maintaining the myriad data collection technologies that exist.

In a nuthshell, SiteSpect WATTS alleviates alot of the headaches that are common with tags by doing the following:

(a) dynamically injecting WA vendor tags into a site (think global-include, but smart enough to know which flavor of tag goes on which page/section/url).

(b) injecting 3rd party segment identifiers directly into the WA tag (mainly for multivariate test groups, but we also pull category or quantitative data from the page itself, parse it, then drop it back into the tag).

(c) morphing tags from vendor A to vendor B (e.g. turnkey pilots or migrations - mapping hc* vars to eVar*, etc.)

(d) detecting and fixing tagging errors - we recently had a customer who had implemented the wrong version of their WA vendor’s tag within a certain section of their site. Whereas switching the tag would have taken several weeks waiting in the IT queue, the customer made the correction in 5 minutes through SiteSpect.

We see alot of site operators who are frustrated with tag maintenance. It doesn’t have to be that way. :)

Eric

Judah added the following ...

Eric: No worries about sharing information about your company, SiteSpect, on the blog. WATTS sounds like impressive technology very applicable to the main challenges of page tags: centralized management and change orchestration. I’d love to read a business case about how you applied WATTS beyond MVT and to WA. Thanks for commenting! :)

admin added the following ...

Dude, I gotta say I am LOVING your posts lately … great content and well written! I think this post can serve as the definitive “what about page tags” document out there.

Thanks again for contributing at Web Analytics Demystified.

Eric T. Peterson

Judah added the following ...

Thanks Eric! I appreciate the kind words. :) I’m glad to blog on WAD. I’m hoping my blog readers take away a few good ideas and become better and smarter web analysts from their “total time spent” here. ;)

Very Silly Steve added the following ...

To echo and ditto Erics plug… err flattery… err praise! ;-)

I wrote a Q&D perl script to track the amount of time I spend reading your posts. Cross index with a few spare tags lying around. Pivot through a few logs. Matrix, and Mean it. Even the odd factorial, integral and differential to expose any odd trends exposed or engaged.

And I too can conclude that there is a real and quite special ROI in reading your postings Judah.

+/- 23.9756% Give or take.

Cheers!

Justin Cutroni added the following ...

Hey Judah,

As always, a thorough and well written piece. I think one statement needs some clarification:

Cookie Issues: Tags set third-party cookies unless programmed otherwise. Privacy policies don’t like the third party cookie.

I’m not sure that _every_ tool out there uses third party cookies by default. I know that the tool that I use sets first party cookies by default.

Also, I think you’re spot on with many of the process related statements that you make. I can completely relate to many of your experiences. The two biggest issues I’ve seen are tag management (many enterprise sites consist of multiple web applications utilizing different templating engines) and dealing with IT workflow issues (TESTING).

Justin

Judah added the following ...

Steve: Thanks. That’s an excellent methodology! Those figures closely correlate with my ROI as well! :)

Justin: Thanks, and excellent points. While it’s not always easy to get complete code coverage, tag maintenance and extension are the biggest challenges I’ve had.

GA uses first party cookies, right? I have revised that bullet to say: “Tags may set third-party cookies. Privacy policies don’t like the third party cookie. As Justin Cutroni points out in the comments his tool sets first party cookies by default. That’s good.” :)

Thanks for commenting!

Judah Phillips at Web Analytics Demystified » Blog Archive » Web Analytics Data Collection for Beginners added the following ...

[...] Page tags.  Client-side data collection involves using little snippets of HTML code that reference a JS file and communicate via a beacon to a “page tag server” - the machine that collects the data so it can be sessionized by the web analytics tool (it may not be called that by your vendor).  As a web analyst, if you are using page tags you will have lots of fun tagging every page on your web site and instrumenting the tags with custom variables and campaign codes.  Reasons why people like page tags are numerous, and include the fact that they are fairly efficient in filtering out non-human traffic (as long as the robot doesn’t execute javascript) and can count proxy cached pages (improving accuracy). Page tags are probably the most ubiquitous method for collecting web data today. [...]


Add to the Conversation

Your email (required) will not be published.

Please note that contributions are moderated and may take a little while to appear.