Social networks can learn about you through your friends | Science News for Students

Social networks can learn about you through your friends

True privacy is not possible when social networks can stalk you through friends’ accounts
Oct 12, 2017 — 6:50 am EST

You don’t have to be on a social network for your information to find its way there. Information from your friends can be used to infer information about you, creating a “shadow profile.” 


Some people might think that online privacy is, well, a private matter. If you don’t want personal information getting out online, then you can just not put it out there. Right? Wrong. Keeping your information private isn’t solely your choice anymore. Friends can play a big role in your privacy, new data show. And the more they share on a social network, the more that social network can figure out about you.

Someone who joins a social network — such as Snapchat, Instagram or Facebook —wants to find their friends. Often, the social network can help. Many apps offer to import contact lists from your phone or e-mail. These apps then use that information to find matches with people already in the network, and suggest them to you.

It’s very convenient. And sharing those contact lists seems harmless, notes David Garcia. He studies how people interact with social networks at the Complexity Science Hub Vienna, in Austria. “People giving contact lists, they’re not doing anything wrong,” he says. “You are their friend. You gave them the e-mail address and phone number.” Most times, you probably want to stay in touch with this person. You might even want to Snapchat them or see their Instagram pics.

But once that person shares a contact list with the social network, some information on everyone in that list is now being shared around. Even if someone on that contact list — you — didn’t want that information shared.

A social network can now use that information to create something called a shadow profile. This is a set of predictions about you. It’s based on all of that information from other people. The concept of a shadow profile first came to light with a Facebook bug in 2013. That bug shared the e-mail addresses and phone numbers of some 6 million users with all of their friends. Unfortunately, that information wasn’t supposed to go public. Oops.

Facebook fixed the bug. But it was too late.

Some users noticed that the phone numbers on their Facebook profiles had been filled in. But the users had never given Facebook their phone numbers, and had never put them on their profiles. The social network merely filled in the missing information for them. Facebook had collected those numbers from the contact lists innocently provided by a user’s friends.

A shadow profile had become reality.

It’s creepy. It is not, however, surprising that a social platform could take names, e-mail addresses and phone numbers and match them with users already on the network. But Garcia wondered if social networks would also be able to build shadow profiles of people who had never been on the network at all. 

Internet archaeology

To find out, he turned to a now-defunct social network called Friendster. Launched in 2002, it was a social-networking site that preceded Facebook. In 2008, the social site boasted more than 115 million users. But the next year, people began jumping ship for other sites. By 2015, Friendster had shut down. Millions of abandoned public profiles vanished.

Or did they?

The Internet Archive is a nonprofit online library. It keeps records of more than 200 billion web pages. Web pages like Friendster. Garcia was able to use this site to retrieve data on 100 million public accounts from Friendster.

Story continues below video.

This video was originally supposed to be ironic, but internet archeology is a real thing!

Garcia dug through the records in a process he calls Internet Archaeology. He named it after a satirical video from The Onion. In it, an internet archaeologist announces that he has (ironically) discovered Friendster. But internet archeology can be a real thing, Garcia explains. “The time scale of online media is very fast. But it’s still studying things in society that don’t exist anymore,” he adds.

Garcia hunted for friend links within Friendster’s data. Most people don’t have a random assortment of friends. Married people tend to be friends with other married people, for example. But people also have connections that complicate the ability to predict who is connected to whom. People who identified as gay men were more likely to be friends with other gay men. But gay men were also likely to be friends with women. Straight women were more likely to be friends with men.

From all of this information, Garcia was able to show that he could predict characteristics of people, even if those people were never on Friendster. He could predict things like whether someone was married, or whether they identified as gay. The more people in the social network who shared their own personal information, the more information the network received about their contacts. And its predictions about people not on the network got better, too.

“You are not in full control of your privacy,” he now concludes. If your friend is on a social platform, so are you. And you don’t have a choice in this matter. Garcia published his findings August 4 in the journal Science Advances.

Rethinking privacy

The new findings do not mean that social platforms are really making shadow profiles, Garcia notes. But with the data that people share with social networks, those platforms (such as Facebook or Snapchat) certainly could.

To prevent the data from his study being used this way, Garcia only used the most basic, public information from Friendster. He never predicted anything about specific people. He only checked to see if he could. Garcia also made his predictions very general. He chose to never create anything that could produce a shadow profile. This was his way of making sure that others cannot misuse his test data.

But his results do show that information from your friends on a social network could accurately predict many things about you. That could include if you were married, where you lived and your political opinions. And that’s information that you may not want anyone to know, let alone strangers in a social network to which you don’t belong.

hand prints
In the digital world, private information can be spread around as we connect with people, kind of like leaving handprints on our friends and family members.

“It’s a good illustration of an issue we have in society — which is that we no longer have control over what people can infer about us,” says Elena Zheleva. She is a computer scientist at the University of Illinois in Chicago. “If I decide not to participate in a certain social network, that doesn’t mean that people won’t be able to find things about me on that network.”

And that means we might need to think differently about what privacy means. “We’re used to thinking of having a private space,” Garcia says. “We think we’ve got a room with keys and we let some people in.”

But it might be more accurate, he argues, to imagine our personal information as wet paint. We are covered in wet paint of our own personal color. If we touch someone else, we leave a handprint in our unique paint color. “The more you touch other people, the more you leave on them,” he explains. Touch enough people, and anyone who looks at those people and the paint on them will be able to pick out your personal shade of teal or pink or gray.

And because we are no longer in full control of our privacy, Garcia notes, it also means that protecting privacy isn’t something any one person can do. “In some sense it resembles climate change,” he says. “It’s not something you can solve on your own. It’s everyone’s problem or its no one’s problem.”

So if we’re going to solve the privacy problem, we can’t just keep our information to ourselves. We’re going to have to change the digital world, too. How we will do it, though, will prove a big challenge.

Power Words

(more about Power Words)

app     Short for application, or a computer program designed for a specific task.

archaeology     (also archeology) The study of human history and prehistory through the excavation of sites and the analysis of artifacts and other physical remains. Those remains can range from housing materials and cooking vessels to clothing and footprints. People who work in this field are known as archaeologists.

archive     (adj. archival) To collect and store materials, including sounds, videos and objects, so that they can be found and used when they are needed. People who perform this task are known as archivists.

climate     The weather conditions that typically exist in one area, in general, or over a long period.

climate change     Long-term, significant change in the climate of Earth. It can happen naturally or in response to human activities, including the burning of fossil fuels and clearing of forests.

data     Facts and/or statistics collected together for analysis but not necessarily organized in a way that gives them meaning. For digital information (the type stored by computers), those data typically are numbers stored in a binary code, portrayed as strings of zeros and ones.

digital     (in computer science and engineering)  An adjective indicating that something has been developed numerically on a computer or on some other electronic device, based on a binary system (where all numbers are displayed using a series of only zeros and ones).

gay     (in biology) A term relating to homosexuals — people who are sexually attracted to members of their own sex.

infer     (n. inference) To conclude or make some deduction based on evidence, data, observations or similar situations.

internet     An electronic communications network. It allows computers anywhere in the world to link into other networks to find information, download files and share data (including pictures).

journal     (in science) A publication in which scientists share their research findings with experts (and sometimes even the public). Some journals publish papers from all fields of science, technology, engineering and math, while others are specific to a single subject. The best journals are peer-reviewed: They send all submitted articles to outside experts to be read and critiqued. The goal, here, is to prevent the publication of mistakes, fraud or sloppy work.

media     (in the social sciences) A term for the ways information is delivered and shared within a society. It encompasses not only the traditional media — newspapers, magazines, radio and television — but also Internet- and smartphone-based outlets, such as blogs, Twitter, Facebook and more. The newer, digital media are sometimes referred to as social media. The singular form of this term is medium.

network     A group of interconnected people or things.

online     (n.) On the internet. (adj.) A term for what can be found or accessed on the internet.

random     Something that occurs haphazardly or without reason, based on no intention or purpose.

social     (adj.) Relating to gatherings of people; a term for animals (or people) that prefer to exist in groups. (noun) A gathering of people, for instance those who belong to a club or other organization, for the purpose of enjoying each other’s company.

social media     Internet-based media, such as Facebook, Twitter and Tumblr, that allow people to connect with each other (often anonymously) and to share information.

social network     Communities of people (or animals) that are interrelated owing to the way they relate to each other. In humans, this can involve sharing details of their life and interests on Twitter or Facebook, or perhaps belonging to the same sports team, religious group or school.

Web     (in computing) An abbreviation of World Wide Web, it is a slang term for the internet.


Journal:​ ​​D. Garcia. Leaking privacy and shadow profiles in online social networks. Science Advances. Vol. 3, published online August 4, 2017. doi: 10.1126/sciadv.1701172.