Five Rules for and some Thoughts on Deep Packet Inspection
One of the many things on my mind in the online world these days is “deep packet inspection.”
First, let me digress, packet sniffing isn’t new to web analytics. From Accrue to Omniture (Visual Discover Sensor?) to AuriQ to Metronome Labs. Packet sniffers are used to “do web analytics.” It’s an uncommon method when compared to javascript page tags.
Web analytics packet sniffers are used to write logs for sessionization (and thus measure) the traffic on behalf of site owners (who don’t want to use tags or logs). Once you’ve logged and sessionized you know what content people have looked at or downloaded on your site.
“Deep packet inspection,” like WA sniffers looks at the entire payloadof packets in real-time across a huge number of simultaneous sessions. Deep packet inspection, like regular packet sniffing, examines the files downloaded and the content of the pages viewed - the whole ball of wax.
Deep packet inspection is being offered as a hardware/software technology by companies like FrontPorch and Sandvine (in the US) and Phorm(in the UK). These companies are selling the technology to ISP’s (like Charter, Comcast, and Virgin Media) so that they can monitor the sites visited and the keywords used by customers, and then use the data collected for behavioral targeting. The ISP’s want a slice of the juicy, lucrative online ad business.
What’s the difference? Site owners collect data about what you do on ONE site (or a portfolio of their sites). ISP’s collect data about what you do on EVERY site you visit. As I understand it, some of these companies create an anonymous profile of your surfing activity by assigning a unique key to your browser. Then they monitor the site’s visited by your browser, and use that data so that the ISP, or the companies to which they sell your data, can serve you what they conclude to be relevant, behaviorally targeted ads.
Get it? Packet sniffing by site owners = knowing about one site you visit. Deep packet inspection by ISP’s = knowing about every site you visit.
Now to digress… In web analytics, we know that web analytics data is collected anonymously. Unless there’s a login, you don’t know exactly who is coming from that IP address. And in many cases, most companies data warehouses only contain purchase information, not the entire clickstream. Once the data is collected, if you have the right architectures you can decode cookie values to people, and make that data non-anonymous (i.e PII). Not difficult to do with some smart BI folks on your side.
An ISP already knows who you are and can already identify the sites you visit. Probably not that easily though on individual level. They can dig through the logs, etc…
So what’s the big deal and all the hoo-hah about the “deep packet inspection” Phorm and FrontPorch are doing? It’s the data they are collecting and the repository they are building containing data about every site you visit and all the content you view and download… Of course, these companies say that it’s all done anonymously and that your “privacy” is preserved “to the greatest extent possible.”
Now let me quote Sir Tim Berners-Lee about the data collected from Phorm’s ISP tracking: “It’s mine - you can’t have it. If you want to use it for something, then you have to negotiate with me. I have to agree, I have to understand what I’m getting in return.”
And that’s the point of the blogviation, Tim is correct. In web analytics, we do this - we try to operate within Tim’s constraints. We enable opt-in with P3P statements and disclosures when you register/login. Privacy policies disclose what we are doing with the data. It’s just ethical and smart business practice to do so.
Thus, I think FrontPorch and Phorm and all the ISP’s who want a piece of online advertising should adhere to the following five rules for their services.
- Move to an obvious “opt-in” model with full disclosure. Tracking via “deep packet inspection” should be an all opt-in model. If you want anonymous data from your browser collected so that you can be behaviorally targeted, then you should opt-in to be. Right now, it’s seems to be all opt-out. You probably don’t know if it’s being done to you. It’s buried in fine print you’ve probably never read. Is that your fault you didn’t read the fine print? Yeah, but the point is it shouldn’t be buried in the fine print…
- Provide me with access to the data collected. If I opt-in, I should be able to see the data collected from my browser. It’s very simple. I demand to see what you are collecting about my browser. If you are building a profile, then I demand to see the data collected in the profile. If it’s all anonymous, then explain how it is in detail, and then follow rule #1.
- Enable me to edit or prevent the data from being collected. If I opt-in, I want to be able to edit or prevent certain types of data from being collected. If you’re tracking my browser, alert me before the data is transmitted, so I can decide if I want to share it. If a profile is built, I want to be able to edit it!
- Let me opt-out at any time EASILY. If I’ve opted in, and I’m unhappy with the service, allow me to opt-out simply. Having to set an opt-out cookie on my browser is absolutely and completely absurd. I want to be able to fully opt-out at the ISP level, just once forever, not at the browser level every time cookies are deleted. Make it easy and permanent, not easily deletable.
- Disclose who you sell my data too. Like online list rentals, the next step in all this ISP profiling is selling the data to third-parties. Let me know what you’re doing with my data-before you do it- so I can opt out or prevent it from being sold to parties to which I don’t want it being sold.
Consumers must be given a choice for preserving their privacy. Anonymity to the “greatest extent possible” is not enough and neither are short-sighted opt-out cookies. Companies like Phorm and Front Porch would be wise to apply these rules to regulate themselves. Otherwise freedom-loving governments will almost certainly regulate them.
And I haven’t even mentioned the issues with net neutrality and deep packet inspection (i.e. traffic shaping and access restrictions (called “throttling” as Clint points out in the comment), have I?
Judah added the following ...
Hi Clint,
Yes, same technology, different purpose. To quote the Wikipedia:
“Because ISPs route all of their customers’ traffic, they are able to monitor web-browsing habits in a very detailed way allowing them to gain information about their customers’ interests, which can be used by companies specializing in targeted advertising. At least 100,000 US customers are tracked this way, and as many of 10% of US customers have been tracked in this way. Technology providers include NebuAd, Front Porch and Phorm. US ISPs monitoring their customers include Knology, Charter Communications[6], and Wide Open West, and probably also Embarq. In addition, the UK ISP BT has admitted testing technology from Phorm without their customers’ knowledge or consent. [7]”
Judah
Clint added the following ...
yep. I wonder, would the ISPs have had this idea if HitWise hadn’t approached them to get the data for their service?
Judah added the following ...
LOL Clint! And as you correctly pointed out “throttling” is another big issue with all this stuff. I mean how are people going to see the new “Indiana Jones” movie if they can’t download it??? Just kidding…
Daniel Waisberg added the following ...
Interesting post Judah. Very informative.
However, you say “We enable opt-in with P3P statements and disclosures when you register/login.” when we collect data about anonymous users we also don’t ask them if we can use their data. Isn’t it the same?
It also made me think about big companies buying social websites (Yahoo buying Flickr is a good example) and providing all the users’ info to the buying company. Isn’t it similar? I remember that Flickers were really mad when Yahoo bought the company. Shouldn’t Flickr have asked users if they want to be excluded from the database before selling? Although in this case it is probably written somewhere in the T&C…
Eric Hansen added the following ...
Hey Judah,
Good thought-provoking post. I read something related to this recently, an article by Esther Dyson: http://www.huffingtonpost.com/esther-dyson/release-09-dont_b_85822.html
Also, add to your list of suspects companies like Neuralytics that have a similar play to provide subscriber’s behavioral data, but for cell phone usage. This type of technology resides “on-deck” where the carrier also gets access to your cell phone number and call history.
Eric
Steve added the following ...
I’ll grovel-apologise up front for being horribly pedantic, but it might help explain the technical and thus some of the “political” differences between the “Deep Packet” and normal Sniffing WA.
Quick Internet networking tech overview: Is like the Matryoshka Dolls (Nested Russian Doll) . A bigger doll wraps a smaller one.
So “IP” (Internet Protocol) wraps “TCP” (Transmission Control Protocol). To put it in house terms: IP will give you the street address; TCP will then tell you if you should interact with this particular house via:
* The mailbox for Letters
* The Garage Door for Cars
* The Window for Cricket Balls
* Chimney for Santa; and so on.
But even then you don’t know what is being… “said”. Is Santa telling you that you’ve been naughty or nice? Is that letter a bill or a birthday gift voucher?
You have to open the letter (the data) and see what’s inside.
The part of the request for any given web page in the HTTP TCP/IP transaction is part of the data portion of the TCP portion of the IP packet. Like the contents of the letter itself - to discover if bill or gift.
So any WA Sniffing tool is looking inside the DATA as well. Given the modern methods of reducing latency (and thus appearing faster) to use KeepAlives or Pipe-Lining requests, you can no longer look at just the first packet or two in the data stream. You have to look at ALL the data packets … if you wish to see all the requests made.
So the tools used (in either case) are little different from each other. The general technical concepts and problems are much the same. The differences are more in the fine details. Other products like Network Based Intrusion Detection systems; or Network Usage Accounting tools are also similar - just not specific to pure Web traffic.
—
So what’s the beef?
As you state Judah - it’s the many-to-one (Users to Website) model that we all know and love, being changed to a many-to-many model. Double-Click and co have such a bad rap because they try and become a many-to-many model.
One of the biggest problems we used to come across in my full-on IT Security days was that of dealing with Aggregation. A given system may only hold up to “xxx-in-confidence” data; but if you have access to the entire data set, you can know the logistic (for example) capabilities of Australia’s entire Defence Forces - ie “Top Secret” (hyperbole for clarity) - soldiers can get *killed* from exposing that much information. So how do you then protect these two conflicting states? Thus the problem…
And this many-to-many model is the same issue. Made worse than Double Click and co as they only have access to a subset of all sites the generic “I” visit.
If “you” know a bit about me, that’s one thing; if “you” know everything about me; that’s FAR greater value proposition. Hence the beef.
—
Now ISP’s have *always* had the ability to aggregate customer data for any given person(s) in near real time. They’re able to track your Internet Volume Usage, so extrapolating from there isn’t hard.
And as Clint oh so rightly points out, Hitwise has been doing this since forever.
Personally? Where I think most people get steamed up on this topic is the view of “You’re Guilty till proven Innocent”. ie We take your data unless you opt out.
And don’t get me started on the HTML stream being modified by the ISP’s or their agents to display Adverts….
/pedantic mode
Cheers!
- Steve
Judah added the following ...
Daniel: You do bring up a good point. But I don’t think it’s the same. An ISP and the volumes of data they collect about users is much more similar to the volume of data from an eBay or Amazon or Yahoo, not a little web site that runs GA. Take for example Amazon, which discloses they collect information such as:
“the Internet protocol (IP) address used to connect your computer to the Internet; login; e-mail address; password; computer and connection information such as browser type, version, and timezone setting, browser plug-in types and versions, operating system, and platform; purchase history, which we sometimes aggregate with similar information from other customers to create features such as Purchase Circles and Top Sellers; the full Uniform Resource Locator (URL) clickstream to, through, and from our Web site, including date and time; cookie number; products you viewed or searched for; your Auction history; and the phone number you used to call our 800 number.”
Which they disclose that they use for:
“such purposes as responding to your requests, customizing future shopping for you, improving our stores, and communicating with you.” Among other things…
Or Ebay:
http://pages.ebay.com/help/confidence/cookies-web-beacons.html
That said, every web site should disclose. And the whole industry, imho, should be better self-regulating with every site that uses analytics disclosing why they are using it for, so we don’t get regulated by law. Wasn’t there a call for a WAA Privacy and Ethics committee at one time? Whatever happened to that?
Thanks for reading and commenting, Daniel!
Eric: I hadn’t heard of Neuralytics, but they need a better name and new web site. It sounds like something from that 80’s movie “Scanners.” Your description of what the company does kind of “bugs me out.” If I opt-in to the service, great, but if some carrier wants incremental revenue from profiling my calling habits via an opt-out, I’ll switch carriers. I need an iPhone anyway.
Thanks for the Esther Dyson link. She’s awesome. I met her once, and it was cool. Like meeting a legend. She was very nice.
Thanks for reading and commenting on blog!
Steve: Love the comment, and thanks for reading and for being pedantic.
I am sure the readers dig it too.
Here’s to Ubuntu!
Brad Warthan added the following ...
Good read, Judah. Clear to see the ISPs are taking advantage of Porter’s 5 forces. The points you outlined really are in the best interest of consumers.
Judah added the following ...
Brad: Thanks! Agreed about Porter’s model, but have they correctly estimated the forces at play, such as the bargaining power of buyers, barriers to entry (i.e. legality), and the threat of substitutes…
Take for example, this interesting article about this very topic referencing potential violations of the Electronic Communications Privacy Act:
http://news.cnet.com/8301-13578_3-9947499-38.html?part=rss&tag=feed&subj=TheIconoclast
According to Charter (from the comments):
We are monitoring the site that you go to in order to have the advertising you see on website become customized to the website’s you are looking at. if you wish to not take part in this you can got to http://www.charter.com/onlineprivacy to stop it. We are currently only doing this in 4 cities, so this may not affect you. the cities are Newtown, CT, Fort Worth, Texas, San Luis Obisop, CA, and Oxford, Massachusetts. If you wish to discuss this as it regards your particular account, you can chat in at charter.com or call us a 888-438-2427. Due to FCC regulations, we can not discuss individual accounts by email. Thanks, Don
When you go there, you enter your name and address. They return the following:
Opt-Out is Complete Your opt-out request has been received and processed. Please note that the opt-out cookie is specific to the browser and computer you are using right now. Your opt-out choice cannot be honored if you access this site using a different browser on this computer or from a different computer. Additionally, your opt-out choice cannot be honored if the cookies on your computer(s) are deleted. As a result, you should repeat this process with each browser and computer you use to access this site and whenever cookies are deleted from your computer(s).
Thanks for reading and thanks for commenting!
inkhorn » Deep Packets added the following ...
[...] Phillips points to a trend of ISP employing packet sniffers on user traffic: Site owners collect data about what you do on ONE site (or a portfolio of their sites). ISP’s [...]


Clint added the following ...
Judah,
aren’t these “deep packet inspection” tools the ones that ISPs, comcast in particular, are using to throttle and block unwanted traffic? e.g. interfering with P2P and the whole net neutrality thang?