May 7, 2012
We all know there’s a whole lot more information in our worlds than there used to be. As to how much more, well, most of of us are pretty clueless.
Here’s a priceless nugget about all that info, compliments of Dave Turek, the guy in charge of supercomputer development at IBM: From the year 2003 and working backwards to the beginning of human history, we generated, according to IBM’s calculations, five exabytes–that’s five billion gigabytes–of information. By last year, we were cranking out that much data every two days. By next year, predicts Turek, we’ll be doing it every 10 minutes.
But how is this possible? How did data become such digital kudzu? Put simply, every time your cell phone sends out its GPS location, every time you buy something online, every time you click the Like button on Facebook, you’re putting another digital message in a bottle. And now the oceans are pretty much covered with them.
And that’s only part of the story. Text messages, customer records, ATM transactions, security camera images…the list goes on and on. The buzzword to describe this is “Big Data,” though that hardly does justice to the scale of the monster we’ve created.
It’s the latest example of technology outracing our capacity to use it. In this case, we haven’t begun to catch up with our ability to capture information, which is why a favorite trope of management pundits these days is that the future belongs to companies and governments that can make sense of all the data they’re collecting, preferably in real time.
Businesses that can interpret every digital breadcrumb their customers leave behind will have an edge, the thinking goes–not just who bought what where in the past hour–but whether they tweeted about it or posted a photo somewhere in the swirl of social networks. The same goes for the cities that can gather data from the thousands of sensors that now dot urban landscapes and turn the vagaries of city life, such as traffic flow, into a science.
Not suprisingly, political campaigns already are taking the plunge, furiously mining data as part of their focus on “nanotargeting” voters so that they know precisely how to pitch them for their votes and money. Among the conclusions analysts have drawn, according to New York Times columnist Thomas Edsall, is that Republicans show a preference for “The Office” and Cracker Barrel restaurants while Democrats are more likely to watch “Late Night With David Letterman” and eat at Chuck E. Cheese.
This rush to interpret digital flotsam explains why Google last week announced that it will start selling a product it calls BigQuery, software that can scan terabytes of information in seconds. And why a startup named Splunk, which has technology that can analyze huge amounts of customer and transaction data, saw the value of its shares soar almost 90 percent the day it went public last month. This, for a company that lost $11 million last year.
Rise of the data scientist
But even access to the best data deciphering tools is no guarantee of great wisdom. Very few companies have people on staff with the training not only to evaluate mountains of data–including loads of unstructured tidbits from millions of Facebook pages and smart phones–but also to actually do something with it.
Last year the McKinsey Global Insitute issued a report describing “Big Data” as the “next frontier for innovation,” but also predicting that by 2018, companies in the U.S. will have a serious shortage of talent when it comes to the necesssary analytical skills–as many 190,000 people. And it contends another 1.5 million managers will need to be trained to make strategic decisions with the torrent of data coming their way.
Not everyone, though, is a believer in the magic of Big Data. Peter Fader, a professor of marketing at Penn’s Wharton School of Business, isn’t convinced that more data is better. Not that he thinks a company shouldn’t try to learn as much as it can about its customers. It’s just that now there’s so much focus on aggregating every bit of data that he thinks volume is valued over true analysis.
Here’s Fader’s take from a recent interview with MIT’s Technology Review: “Even with infinite knowledge of past behavior, we often won’t have enough information to make meaningful predictions about the future. In fact, the more data we have, the more false confidence we will have…The important part is to understand what our limits are and to use the best possible science to fill in the gaps. All the data in the world will never achieve that goal for us.”
Who’s your data?
Here’s a sampling of how Big Data is being used to solve big problems:
- They know when they’ve been bad or good: While most companies are focusing on analyzing their customers, Amazon is scoring points by using Big Data to help theirs.
- The study of studs: You want to know which bulls spawn the most productive milk cows? The dairy industry has devised a way to crunch the numbers.
- Diagnosis by data: Researchers at SUNY Buffalo are analyzing massive sets of data in their effort to determine if there’s a link between multiple sclerosis and environmental factors, such as not enough exposure to sunlight.
- Looking for trouble: A company called Recorded Future is mining info from social networks and government and financial sites to make forecasts about how population growth, water shortages and extreme weather could lead to future political unrest and terrorism.
Video bonus: Capturing data is one thing. Making it look appealing and understandable is a whole other challenge. David McCandless waxes on the power of “information maps” in this TED talk.
Sign up for our free email newsletter and receive the best stories from Smithsonian.com each week.