
One thing I didn't mention in my blog yesterday but which I think with hindsight I should have, is that we don't actually know that the AOL searches by a particular anonymised user are all by the same person. They may all be performed on the same AOL account, but of course many people share an account among their family, lodgers and so on, and may even let their PC-less neighbour come round from time to time to do some surfing.
Just over a week ago AOL published the search histories of around 650,000 user accounts over a three month period, ostensibly in the name of search research. It didn't give you the user's name, but instead assigned a random number to each AOL account, the aim being to make it harder (or in their opinion impossible) to identify the user from their search histories.
What that means is that multiple users of a single AOL account are given just one number between them.
So in practice AOL is probably right - it probably is nigh on impossible to prove someone's identity from the search histories. Because unless there is only one person using that AOL account - and you couldn't know that from the data AOL published - it is impossible to conclude that two searches, for 'A' and then 'B', mean that a single person is interested in or even linked to both A and B. One member of the family may be interested in 'A', a neighbour or lodger in 'B'.
There's another problem that this highlights, and it's one of context. Just because someone searches for "Leonardo DiCaprio pictures" and then for "teacher's curriculum coursework materials", does not suggest a teacher with a possibly unhealthy interest in Leo, because we have no way of knowing that the first search was not by his teenage daughter looking for pictures of her favourite star.
Does all this mean that it was OK for AOL to publish all that information? Actually, no. People did not give their express permission for this data to be published, even with their names omitted. There are search histories that could cause embarrassment, especially given the context problem discussed above: a search may appear to identify someone with embarrassing or even illegal acts, whether or not the context just makes it look that way, or whether it was even them doing the searching.
Plus although it is difficult to prove all searches in a history are by one user, it is still possible to build up a mosaic of information that could breach someone's privacy, or their family's privacy. Also, there are social security numbers, names and addresses in the data that could be useful to spammers and other ne'erdowells.
So it was right and proper that AOL yesterday made this apology in a statement: "This was a screw-up, and we're angry and upset about it. It was an innocent enough attempt to reach out to the academic community with new research tools, but it was obviously not appropriately vetted, and if it had been, it would have been stopped in an instant… Although there was no personally identifiable data linked to these accounts, we're absolutely not defending this. It was a mistake, and we apologize. We've launched an internal investigation into what happened, and we are taking steps to ensure that this type of thing never happens again."
Here here.
Excellent point Jason.
An interesting question now should surely be, did AOL break any laws, given that they may not have had permission to publish this data?
I expect they'll be sued, whether they did break the law or not, lawyers being lawyers, but sued for what?
Jason, this is a crosspost from accmanpro.com where we started this discussion, but I though your readers might be interested to know as well. The first person has been correctly identified:
http://www.nytimes.com/2006/08/09/technology/09aol.html?ei=5090&en=f6f61949c6da4d38&ex=1312776000&partner=rssuserland&emc=rss&pagewanted=all
Thanks for the comment Justin… yes I did see that story. I think they didn’t mind giving her search history because they knew there was nothing scandalous in it. I suspect they could find other people with more embarassing search histories but deliberately chose someone who didn’t, so that they themselves could not be in the firing line of a possible defamation case.
I’d also argue that although it is clear you can find people, proving that it was they that did all of the searches in their search history would be another matter if they can show the computer is shared….
Incidentally did you see that someone has put a web interface on the data? http://www.aolsearchdatabase.com/
hello. i have a question, and i dont think many people ask it. ok, there's one computer. there is a main aol account, and there are other aol sub-accounts with different usernames and passwords than the main account. are all the search histories the same as the main account? or are they each different because they all have different usernames and passwords? i cant find the answer on the internet, so i would really appreciate some information on this subject. thank you, alicia.