Websites often don’t disclose who can have your data | Science News for Students

Websites often don’t disclose who can have your data

Analyses suggest most data actually get shared widely with unnamed partners
May 23, 2018 — 6:45 am EST
phone security

The idea that people can learn who tracks where they go on the internet just by reading a website’s privacy policies “is pure fiction,” reports the author of a new data-privacy study.

oatawa/iStockphoto

People often feel anonymous on the internet. They believe their browsing behaviors and what they buy or write can be a private as they want. In fact, that’s far from true, a new study finds.

Websites usually offer a statement that describes what they may or may not do with data about a user’s activities. You might be tempted to read through that entire document. But be prepared for disappointment. These documents tend to list only a small share of the sites allowed access to your data.

This new discovery suggests it may be all but impossible for website users to make informed judgments about how private their online activities are.

The new research probed disclosures on data-sharing by more than 200,000 websites. These included, for instance, the Arkansas state government homepage and the Country Music Association site. The study focused on how these sites shared data with so-called third parties. Such recipients of your data could be advertisers or companies that make money selling personal data (such as buying behaviors). The study also examined how those sites had described their policy for protecting the privacy of a user’s data.

Timothy Libert works in England at the University of Oxford. There, he studies data privacy. For this analysis, he used a software tool called webXray. It traced data shared by each of those websites with third-party data collectors. In all, it tracked 1.8 million sharings of data. Only 14.8 percent of those data shares went to third parties that were named in the sites’ privacy policies. The rest of the data went to unnamed third parties.

Data transfers to widely familiar third parties — Google, Facebook and Twitter, for instance — were more likely to be disclosed than transfers to obscure entities. Take Google. Libert found that 38.3 percent of data transmissions sent to it had been disclosed. In contrast, the disclosure rate for data shared with data-broker Acxiom was only around 0.3 percent.

Story continues below graph.

graph on tracked data transfers
Information on tracked data transfers between a few of the more than 200,000 tracked websites and third-party data collectors are depicted here. They show that the websites rarely disclose in their privacy policies exactly where they are sending your data. The data collector most likely to be disclosed was Google. Fifteen data collectors tracked in the study didn’t even disclose where 1 percent of your data might be shared.    
Graph source: T. Libert; T. Tibbetts

Even if a website listed all of the third parties it shared your data with, users still might never learn how widely their data had been shared. The reason? Third parties that receive user data from websites can themselves later share those data again. Think of your data now moving on to anonymous fourth and fifth parties. Getting online is “sort of like tossing confetti in the air,” Libert concludes. “There’s no way to know where your data ends up.”

Web world evolving ever faster

Data sharing between websites and third parties change so rapidly that it’s almost impossible even for the people who craft a site’s privacy policies to keep up. That’s the assessment of Christo Wilson. He’s a computer scientist at Northeastern University in Boston, Mass., who was not involved in the new work. “The only true disclosure,” he says, “is, ‘We sell your data, and we don’t know where it goes.’”

People still inclined to read privacy policies will have to set aside a lot of time. Reading a website’s privacy statement (along with the policies of its known third-party data collectors) takes nearly 90 minutes, on average, Libert found. “The idea that users can keep track of this, read policies and make decisions is pure fiction,” he concludes.

Internet users can try to keep their data out of advertisers’ hands, says Wilson. Programs offering “hardcore ad-blocking” do exist, he notes. But such software may not ward off all advertisers, he adds. “It just gets more and more clear that we need things like GDPR.” Those initials refer to a new European law known as the General Data Protection Regulation. Beginning this month, it sets rules that restrict how tech companies can collect and use personal data.

Libert says the United States needs an agency to oversee the rapidly evolving data-sharing environment. He likens this to how the U.S. Food and Drug Administration monitors prescription-drug makers. “I can buy medicine at the store and not have to sit down with a chemistry textbook and look up every compound and see its effects,” he says. “Somebody at the FDA does that.”

Libert shared what he just learned at the 2018 World Wide Web Conference on April 25 in Lyon, France. That’s where he described these challenges to data-privacy on the internet.

Power Words

(for more about Power Words, click here)

ad     Short for advertisement. It may appear in any medium (print, online or broadcast) and has been prepared to sell someone on a product, idea or point of view.

average     (in science) A term for the arithmetic mean, which is the sum of a group of numbers that is then divided by the size of the group.

chemistry     The field of science that deals with the composition, structure and properties of substances and how they interact. Scientists use this knowledge to study unfamiliar substances, to reproduce large quantities of useful substances or to design and create new and useful substances.

compound     (often used as a synonym for chemical) A compound is a substance formed when two or more chemical elements unite (bond) in fixed proportions. For example, water is a compound made of two hydrogen atoms bonded to one oxygen atom. Its chemical symbol is H2O.

data     Facts and/or statistics collected together for analysis but not necessarily organized in a way that gives them meaning. For digital information (the type stored by computers), those data typically are numbers stored in a binary code, portrayed as strings of zeros and ones.

ecosystem     A group of interacting living organisms — including microorganisms, plants and animals — and their physical environment within a particular climate. Examples include tropical reefs, rainforests, alpine meadows and polar tundra. The term can also be applied to elements that make up some an artificial environment, such as a company, classroom or the internet.

entity       A person or thing that exists and can be defined or characterized on its own (meaning separate and apart from some general group).

fiction     (adj. fictional) An idea or a story that is made-up, not a depiction of real events.

Food and Drug Administration (or FDA)    A part of the U.S. Department of Health and Human Services, FDA is charged with overseeing the safety of many products. For instance, it is responsible for making sure drugs are properly labeled, safe and effective; that cosmetics and food supplements are safe and properly labeled; and that tobacco products are regulated.

General Data Protection Regulation     A law passed by the European Union in April 2018 that provides the same data-protection rules for internet users in member countries. It also lets users see their data and move them, as desired, from one site to another. It affects any data related to a person or subject “that can be used to directly or indirectly identify the person. It can be anything from a name, a photo, an email address, bank details, posts on social networking websites, medical information, or a computer IP address.”

internet     An electronic communications network. It allows computers anywhere in the world to link into other networks to find information, download files and share data (including pictures).

monitor     To test, sample or watch something, especially on a regular or ongoing basis.

online     (n.) On the internet. (adj.) A term for what can be found or accessed on the internet.

policy     A plan, stated guidelines or agreed-upon rules of action to apply in certain specific circumstances. For instance, a school could have a policy on when to permit snow days or how many excused absences it would allow a student in a given year.

software     The mathematical instructions that direct a computer’s hardware, including its processor, to perform certain operations.

third party     An agreement generally takes place between two individuals or groups. Each of these is considered a “party” to the agreement. When an additional person or group becomes involved in some way — but did not sign or verbally take part in the initial agreement, it becomes a so-called third party. In some instances, a third party may not even be known to one or more of the individuals who made the initial agreement.

tool     An object that a person or other animal makes or obtains and then uses to carry out some purpose such as reaching food, defending itself or grooming.

transmission     Something that is conveyed or sent along.

Twitter     An online social network that allows users to post messages containing no more than 280 characters (until November 2017, the limit had been just 140 characters).

Web     (in computing) An abbreviation of World Wide Web, it is a slang term for the internet.

Citation

Journal: T. Libert. An automated approach to auditing disclosure of third-party data collection in website privacy policies. Proceedings of the 2018 World Wide Web Conference. April 25, 2018, p. 207. doi: 10.1145/3178876.3186087.

Website: European Union’s General Data Protection Regulation (GDPR) homepage.