Analysis: It’s surprisingly easy to identify individuals from credit-card metadata

MIT News release: “In this week’s issue of the journal Science, MIT researchers report that just four fairly vague pieces of information — the dates and locations of four purchases — are enough to identify 90 percent of the people in a data set recording three months of credit-card transactions by 1.1 million users. When the researchers also considered coarse-grained information about the prices of purchases, just three data points were enough to identify an even larger percentage of people in the data set. That means that someone with copies of just three of your recent receipts — or one receipt, one Instagram photo of you having coffee with friends, and one tweet about the phone you just bought — would have a 94 percent chance of extracting your credit card records from those of a million other people. This is true, the researchers say, even in cases where no one in the data set is identified by name, address, credit card number, or anything else that we typically think of as personal information. The paper comes roughly two years after an earlier analysis of mobile-phone records that yielded very similar results. “If we show it with a couple of data sets, then it’s more likely to be true in general,” says Yves-Alexandre de Montjoye, an MIT graduate student in media arts and sciences who is first author on both papers. “Honestly, I could imagine reasons why credit-card metadata would differ or would be equivalent to mobility data.” De Montjoye is joined on the new paper by his advisor, Alex “Sandy” Pentland, the Toshiba Professor of Media Arts and Science; Vivek Singh, a former postdoc in Pentland’s group who is now an assistant professor at Rutgers University; and Laura Radaelli, a postdoc at Tel Aviv University.”

See also Scientific American – Shopping Habits Reveal Personal Details in “Anonymized” Data and the introduction to the special issue of Science – The End of Privacy – “At birth, your data trail began. You were given a name, your height and weight were recorded, and probably a few pictures were taken. A few years later, you were enrolled in day care, you received your first birthday party invitation, and you were recorded in a census. Today, you have a Social Security or national ID number, bank accounts and credit cards, and a smart phone that always knows where you are. Perhaps you post family pictures on Facebook; tweet about politics; and reveal your changing interests, worries, and desires in thousands of Google searches. Sometimes you share data intentionally, with friends, strangers, companies, and governments. But vast amounts of information about you are collected with only perfunctory consent—or none at all. Soon, your entire genome may be sequenced and shared by researchers around the world along with your medical records, flying cameras may hover over your neighborhood, and sophisticated software may recognize your face as you enter a store or an airport. For scientists, the vast amounts of data that people shed every day offer great new opportunities but new dilemmas as well. New computational techniques can identify people or trace their behavior by combining just a few snippets of data. There are ways to protect the private information hidden in big data files, but they limit what scientists can learn; a balance must be struck. Some medical researchers acknowledge that keeping patient data private is becoming almost impossible; instead, they’re testing new ways to gain patients’ trust and collaboration. Meanwhile, how we think and feel about privacy isn’t static. Already, younger people reveal much more about their lives on the Web than older people do, and our preferences about what we want to keep private can change depending on the context, the moment, or how we’re nudged. Privacy as we have known it is ending, and we’re only beginning to fathom the consequences.”

Facebook Tweet LinkedIn

M	T	W	T	F	S	S
« Mar
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30