The data flood

The amount of recorded information grows by the split-second — and may be used to improve health care, change education and even boost store sales
Dec 13, 2013 — 7:35 am EST

Modern life generates huge volumes of data. That data can yield detailed information — and provide valuable insights. This image visualizes the volume of Internet data that flows between New York City and cities around the world over a 24-hour period. The larger the glow at any particular location, the larger the volume of data. 

MIT Senseable City Lab

There is a huge amount of information available online. And its volume is growing at lightning speed. Each minute on average, more than 200 million emails move across the Internet (though most are spam). Twitter users post more than 300,000 new tweets. People across the globe share more than 38,000 Instagrams. YouTube users upload another 100 hours of video. Google processes more than 3.6 millionwebsearches. And 2.2 million things on Facebook get a “like” or a comment.

But the Internet isn’t the only numbers-driven environment packed with information. Scientists, too, have more information than ever before. It comes from the study of volumes of raw facts, called data.

For example, biologists collect enormous numbers of measurements on millions of cells and everything inside them. Astronomers fill banks of hard drives with observations of stars, galaxies and energy in deep space. Earth scientists assemble detailed snapshots of weather, including patterns of winds and waves throughout the world.

According to the computer company IBM, 90 percent of all recorded data was created in just the last two years. Most of those data are stored on computer hard drives, phones and other digital devices. What about traditional libraries? Sources such as books and audiocassettes contain less than two percent of all stored information, according to Big Data. It’s a 2013 book by Viktor Mayer-Schönbergerand Kenneth Cukier.

If such numbers don't boggle your mind, consider this: One Internet research firm estimates that every three years the volume of digital data nearly doubles. That means people are expected to generate as much new data between January 2014 and December 2016 as exist today.

“The challenge is to take those data and turn them into a useful product,” says DJ Patil. He’s a scientist in Palo Alto, Calif., who has worked on managing data at companies such as LinkedIn, a professional networking website.

Online data offer more than just a record of our time. Researchers build computer programs to analyze, organize and process those data. Then statisticians search for patterns and connections in those data to predict the future.

For instance, companies sift through data about what people spend their money on and when. That helps them find new ways to sell more products. In the 2012 U.S. presidential election, statisticians analyzed polling data to accurately predict the outcome in each state. Earth scientists track and analyze weather data over time to help predict, prepare for and perhaps even prevent catastrophic changes in climate. Information gleaned from big data can even identify disease outbreaks in time to impose quarantines that will prevent an epidemic.

Experts say big data also could completely change education, health care and many other fields. Some studies point to even stranger potential uses, such as identifying a criminal before a crime is committed. That’s an idea explored in movies such as Minority Report.

Yet what we’ve been describing is no movie. It’s life in today’s Information Age.

Stores gain from big data

The big leaps forward in data collection and use have created opportunities for a new kind of researcher. Experts like Patil, who work with big data, often call themselves data scientists. In many ways, the staggering volume of data being collected is less important than what data scientists do with it. It’s their job to find value — useful information — buried within it.

Manufacturers and advertisers have been quick to use such information to make more money.

Almost a decade ago, researchers at Walmart found that sales of strawberry Pop-Tarts increased by seven times when people learned a hurricane was on its way. Right before the storm hit, beer became the store's top-selling item, a company official told The New York Times. East Coast Walmart stores now stock up on both items before hurricane season.

Similarly, in the early 2000s, statisticians at Target, a department-store company, identified more than 20 lotions and other products that women tend to buy shortly after learning they are pregnant.

The company tracked those purchases among customers who paid with a credit card. It now can predict a woman’s due date within a short range. And it may send her ads and coupons for things that new parents need, such as a baby stroller or crib, according to a 2012 New York Times article.

Police too are investigating the value of big data. Some cities with high crime rates and overworked police officers now make predictions. Using past crime data, they have begun figuring out where and when patrols would be most useful in preventing future crime.

While all of those predictions may show what could happen, they don't show why, notes Mayer-Schönberger, the Big Data author. An expert on information law, he works at Oxford University in England. In some fields, knowing why something occurs doesn’t matter much, he says. But in others, such as medicine, it can prove pivotal. Studying data from cells and the whole body might even help doctors one day prevent or treat disease.

“Life will change dramatically for the next generation,” Mayer-Schönberger predicts. “Today’s 9- to 14-year-olds will be full recipients of the developments that come from the use of big data in health care.”

Personal health, personal learning

Data on health can run very deep. Physicians generally start by collecting data about a person’s general health. Scientists may then probe much deeper, gathering more complex data — such as about a person's blood or tissues.

“We love to know what causes a particular complicated disease. How does the body’s system fail when you get cancer? What's gone wrong when you get incurable diabetes?” asks Winston Hide. He’s a researcher at the Harvard School of Public Health in Boston, Mass. Hide turns large amounts of data collected from cells into information that biologists can use.

For instance, biochemists can collect data on the genes and the proteins that those genes instruct a cell to produce. Hide and other computational biologists then analyze vast stores of those data from particular types of cells — and from vast numbers of people. They’re scouting for patterns that pop up again and again.

“We interrogate millions of cells,” Hide says, meaning they collect data on genes and their activity. He calls the resulting mountain of data a “treasure trove.” Researchers process those data. The information this yields can identify which genes, proteins and other things in cells change with disease — or appear to play a role in preventing disease.

A certain protein, for instance, may appear only in people with breast cancer. Does that mean it plays an important role in causing the disease? To test that, biologists might develop a drug to block that protein from being made or used.

Hide imagines a time when sick patients will present their doctor digital data, such as a read-out of all of their genes. When compared against other data — such as how often the patient dines out, where she shops and who she meets up with — her doctor may be able to predict which of the many available treatments will offer her the most help.

“In medicine, we patients are always compared to an average human being,” notes Mayer-Schönberger. When doctors take a patient’s pulse, heart rate or some other measurements, they compare these to values for some average person. But that “average human being doesn’t exist,” he says. Data have the potential to personalize medicine. Patterns that emerge by sifting through huge amounts of data could help doctors design treatments that are suited to each individual — not to some bogus average patient.

Big data could personalize education, too, by making student evaluations an ongoing process. Teaching might even be tailored to the needs of individual students, says Mayer-Schönberger.

Imagine a classroom where teens read materials on a tablet computer or e-reader. Along the way, each student would highlight important ideas or unfamiliar terms. Then the computer could report how long it had taken the students to read the material and what words they highlighted. Using such information, teachers could focus more time, for instance, on aspects of a topic that had given students the most trouble.

Careers in data mining

Data don’t just cover a broad range of topics, they also provide new levels of detail. To make sense out of complex mountains of data, statisticians and data scientists must go prospecting. They become like miners hunting for gems or valuable ore. That’s why Hal Varian, the chief economist for Google, often calls statistics an exciting and important career of the future.

Mayer-Schönberger agrees. “I hope kids will become [data scientists],” he says. “We need people who can make sense of the data. When you’re a data scientist, it’s almost like you’re an alchemist of the Data Age.”

In the Middle Ages, alchemists claimed they could turn lead and other low-value metals into gold. In the Information Age, data scientists can turn large, chaotic sets of data into information. That information isn’t gold, but its value can sometimes exceed that of gold. And future demand for data miners will be huge: One study published in 2012 estimated that four million jobs related to big data will be created by 2015.

The teens and tweens of today have the potential to enter careers that will allow them to shape the future of big data. It’s a field that rewards people who follow their own interests, Mayer-Schönberger says.

Curiosity,” he argues, “is your strongest asset.”

Power Words

biology  The study of living things.

computational biology  A field in which scientists use mathematics and computer programs to better understand living things.

data  Facts and statistics collected together for analysis but not necessarily organized in a way that give them meaning.

digital  The recording, storing and retrieving of information as a series of ones and zeros.

e-reader  An electronic device that contains books, reports or other types of text-based materials.

information  (as opposed to data)  Facts provided or trends learned about something or someone, often as a result of studying data.

statistics  The practice or science of collecting and analyzing numerical data in large quantities. A professional who works in this field is called a statistician.

Word find (click here to enlarge for printing)

Further Reading

S. Ornes. “Genetic memory.Science News for Students. Feb. 8, 2013.

A. Ossola. “The Energy of an Internet Search.Science News for Students. Dec. 15, 2010.

J. Raloff. “How would Carnegie judge our digital libraries?Science News Online. June 17, 2008.

J. Raloff. “Family snaps in peril.Science News Online. June 17, 2008.

J. Raloff. “Digital Data Cry Out — Save Me!Science News. June 16, 2008.

Learn more about Google’s data centers

Teachers questions: Questions for The data flood