This last weekend, I gave into my vanity and decided to add a counter to my website. I wanted to know how many people were checking out frantzworkshop.com. (For the benefit of those rare few who are less computer savvy than I, a counter is a device that counts the number of times that a website is visited) Turns out, counters are very easy to acquire. All I had to do was go to one of the many free web counter providers and agree that in that I will show the web counter’s logo on my website in exchange for the counter. After agreeing, the counter company sends html code to imbed into the website. The code enables the counter to work. Getting the counter code took all of 5 minutes with the aide of a google search. The hard part was trying to figure out how to get the code into my website. Normally, slipping new code into a website is a synch. However, my website is far more complicated than my tender webmaster experience can handle. Movabletype, the organization that created the program that was used to create this website is the culprit of my difficulty (man that was a mouthful…). Er maybe, it is just that my ignorance is the culprit…
I spent some time fumbling through the administrative options on the control panel of my website in an effort to find the source code (fumbling is definitely the operative word). In the mist of my almost random clicking, I came across an already existing counter on my website. Only this counter, which only the webmaster can see, is more than a counter. It is a comprehensive accounting firm. It counts, categorizes, and identifies every single visit and visitor to this website. It tells me which IP address clicked on which picture and at what time. Additionally, the imbedded-counter groups all of the acquired information into graphs, pies, and charts. Looking at the information made feel like I was at a WorldCom board meeting…it felt uncomfortable.
I was/am shocked. How can it be that every single click of the mouse is tracked, cataloged, and plotted? For some reason I had it in my head that it was not possible to tell who was visiting a website. Well, so much for anonymity. If my website dose this (and I didn’t even know it!), then it is only reasonable to assume that every web site can do this. The good news is that generally speaking, it is not possible to tell which IP address belong to whom. However, some IP addresses leave many clues. For instance, movabletype, is able to distinguish between various kinds of servers (military, educational, organizational, etc). It is only fair for me to let every one know that every click on the internet has the potential to be seen. Seen in detail!
For my part, all I can say is that I will not seek to find out which IP address belongs to who. I would consider this a violation of your privacy. Furthermore, if I am able, I will turn off all of this statistics gathering shenanigans.
Anyways, as you can see, I didn’t add the counter…
Here's some clarity!
Unfortunately, most of the information that you see is pretty much required. It's saved in the server log in a format that looks like this:
24.31.245.159 - - [02/Sep/2002:04:16:57 -0400] "GET /birnamlabs_horiz.jpg HTTP/1.1" 304 - "http://www.birnamlabs.com/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
About a third of this is required for the server to operate, another third is necessary if an administrator has any open of figuring out what's happening when something goes wrong, or to find an abuser of a system. It happens. And the last third is information that will let you know statistics on what browsers and what operating systems are viewing your site. This can also be necessary, for a web developer to full accomodate all of the users (since every browser/platform/operating system and so danged different...)
What you were looking at, with the graphs and charts and everything, is a third-party piece of software that takes millions of lines of log code like that one above and parses them out into something that makes sense to somebody interested in seeing statistics on their site. This is good for many reasons, and my clients love it. Seeing how popular their site is is just the tip of the iceburg. They can see what section of their site is the most popular, and what needs work (marketing-wise). They can see what errors happen and where, so they can find broken links. They can see what path users take through their site so they know how well their navigation is working. And perhaps most importantly (since it's all about money) they can use the popularity information on certain pages to adjust costs of advertising on those pages. So advertisers have to pay more, to be shown on a more popular page.
And what you were looking at wasn't even a very detailed stats viewer. It's good enough for being something that people can use freely on their site. But the really beefy ones cost a lot of money and give you WAY more information. But the thing is, they all interpret their information and charts from the same lines of log code. So when I say "WAY more information" there technically isn't any additional data -- it's just a better, more informative way of looking at the same data.
But...and this is a BIG but...none of this can be identifying. All you get is an IP address. Now, an IP address can tell you a lot of information. IP addresses are 'sold' in blocks, and somebody can use information on who owns what block to identify a person's ISP from their IP address.
In some cases, an IP address can also be turned into something that looks like a domain name, like "host195-36.pool80117.interbusiness.it". When you can do this, you can theoretically determine the nationality of the user by looking at the TLC (top-level-domain, like .com or .org) of the domain name. In this case, you can tell that the user was connecting from Italy because it ends with '.it'.
But in order to turn that into really identifying information you have to subpeona the ISP that gave out the name. There's no other way to do it. Believe me, if there was, the RIAA would have sued a lot more people by now. but as it is, their lawsuits have to identify people by their IP address until they can get the information from the ISP -- which unfortunately, the DMCA allows them to do without much fuss, but until then their lawsuits look like, "164.143.240.33 is sharing 240 MP3s".
Plus, on top of that, 90% of the time and IP address changes frequently anyway. All dial-up users, and most broadband users use what's called DHCP. Basically, whenever a dial-up users gets online, their IP gives them a randomly assigned IP address, just the first one available out of their block of addresses. It's a little different for broadband users, but not much. In their case, the ISP refreshes their DHCP connection every three days or so (on average) and they get a new IP address. So not only is there no personal information available from an IP address, but chances are, the IP address you're using right now has been used by hundreds or thousands of other people at some point.
So anyway, don't get freaked out by the stats for your website. They're a good thing. There is zero potential for abuse. Now I've actually had clients ask me if they can figure out users email addresses from their weblogs, so if there was a way to abuse it, believe me, it would be done. Not by me, but somebody would.
Posted by: David at April 7, 2004 12:34 PMIn other words, I have confused a flashlight with a forest fire. One is good for camping, and one is not... alas, what will you do with me? Thanks for setting matters in the proper light.
~ A