Sunday, March 14, 2010

Netflix Fails Data Anonymization

According to this story from the wired threat level blog, Netflix has shut down the sequel to its original $1,000,000 Netflix prize as a result of a privacy lawsuit. The problem for Netflix was that there is a specific law which prevents disclosure of a person's video rentals, and Netflix provided enough information about individual users in their supposedly anonymized training data that at least some of that data could be de-anonymized.

So, was Netflix wrong to give out the data it included in the second contest? Well, the second contest indicated what movies people had watched, and what ratings they had been given. The people weren't identified by name, but their ZIP codes, ages and gender, were provided. As it happens, there is an 87% chance that, if you have someone's birth date, zip code, and gender, you can uniquely identify that person (as related in this article, also from threat level). Does that mean Netflix's second contest ran afoul of the law? Well, it was settled, so we don't know what a court will say. However, it was certainly a significant enough risk that Netflix decided to cancel the well-publicized sequel to its earlier successful efforts, which probably means that Netflix made a bit too much public.

Now that it's all over, given the benefit of 20/20 hindsight, what should Netflix have done with the second contest? Well, from a conservative standpoint, it could probably have avoided the type of privacy complaints that came up if, instead of just removing names, it had followed the anonymization guidelines provided for medical research on human subjects (a good summary of which can be found here). That has the benefit of being the gold standard for data anonymization, and also including specific items to exclude, including the zip codes included in Netflix's data set.

5 comments:

Anonymous said...

My reading of the "HIPPA and Common Rule" document you reference is that it permits disclosure of ZIP codes. (cf pg 23).

electronic signature said...

Although the above commenter is right that it permits disclosure of ZIP code.But in most cases it is usually not sure that what can and what should not be disclosed by keeping inline with security Laws and people need to be made more aware of this

Runescape Money said...

So as I see the gamers which asked the issues on the web I noticed these people the players are just need to get RS GP the particular itema for his or her accont, of course, if that they acquire enough runescape items like runescape precious metal, they're going to mail it along with other folks or they'll market it with other Runescape Gold web page pertaining to acquire genuine cash on the web.

Anonymous said...

When a player creates a WOW account, they are asked to choose a username and cheap wow gold password. Whenever that player plays WOW, they are asked to supply both the same username and password

Anonymous said...

Cheapest GW2 Gold current heroes look at provides the usual introduction to existing range shield and also weapons, and problem. Additionally, you will also find precisely the same features of your current persona. Guild Competitions 2 is bound in essence upon power, detail, strength and energy source - this specific results in your current invasion electrical power, essential possibility and your life details.