clock menu more-arrow no yes mobile

Filed under:

Metaphors of Big Data

What do bacon, oil, tsunamis, exhaust, deluges, nuclear waste and teenage sex have in common?

What do bacon, oil, tsunamis, exhaust, deluges, nuclear waste and teenage sex have in common? They are all things to which “Big Data” has been likened.

Many excellent essays have addressed Big Data metaphors. They include “Data Is the New ‘___’,” by Sara Watson; “Big Data Metaphors We Live By,” by Kailash Awati and Simon Buckingham Shum; “Big Data, Big Questions: Metaphors of Big Data,” by Cornelius Puschmann and Jean Burgess; and “Swimming or Drowning in the Data Ocean? Thoughts on the Metaphors of Big Data,” by Deborah Lupton. Those articles, however, discuss the “metaphors of Big Data” as if they’re all efforts to describe the same thing. But they are not.

The metaphors and similes cited above refer to at least three distinct things. The “tsunami” and “deluge” are attempts to illustrate the challenges of handling vast and ever-changing datasets. The teenage-sex simile is a comment on the hype surrounding the notion of Big Data: In 2013, Dan Ariely said that “Big data is like teenage sex: Everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it …” The better-known “oil” and “bacon” metaphors refer to the large datasets themselves, which are being collected by various entities these days, regarded as assets, and mined or chewed up for insights.

It might be more useful to treat those three groups as distinct, and to address separately the metaphors commonly applied to each of them. In particular, I want to focus on the metaphors that are squarely directed at big datasets, at collections of information — as opposed to Big Data-related processes or hype. Because even Big Data sets are not the same, and our metaphors should reflect that. We can’t just discuss even this one subset of the Big Data phenomenon as if we all know what we’re talking about. The kinds of data matter.

Take, for example, the metaphor of Big Data as nuclear waste. This metaphor has been applied as a response, a corrective, to the much better known mantra of “data is the new oil.” The nuclear waste metaphor is, however, a reference to a particular kind of Big Data: Personal data about individual human beings. (Privacy professionals talk a lot about “PII”: Personally identifiable information — which is a broader concept. They/we have long discussions about what constitutes PII. This is not one of those discussions.)

There are many large data sets, such as data about atmospheric or oceanic conditions, or about production outputs in various companies, or energy consumption by particular vehicles, that would probably not be described, even by Big Data critics, as “radioactive material.” Let’s separate those out. Let’s clarify that there’s a distinct problem when intimate personal data about individual human beings is what’s being described as “the new oil” or “the new bacon” and treated like an ordinary asset.

Technology critic Evgeny Morozov has argued that the commodification of personal details is not a matter of property rights. In a New Republic article titled “Selling Your Bulk Online Data Really Means Selling Your Autonomy,” he writes:

“Our data constitutes our very humanity. To voluntarily treat it as an ‘asset class’ is to agree to the fate of an interactive billboard. We shouldn’t unquestionably accept the argument that personal data is just like any other commodity and that most of our digital problems would disappear if only, instead of gigantic data monopolists like Google and Facebook, we had an army of smaller data entrepreneurs. We don’t let people practice their right to autonomy in order to surrender that very right by selling themselves into slavery. Why make an exception for those who want to sell a slice of their intellect and privacy rather than their bodies?”

Is that true for any personal data, though? Should we draw even finer distinctions? Strangers have long had access to some details about most of us — our names, phone numbers and even addresses have been fairly easy to find, even before the advent of the Internet. And marketers have long created, bought and sold lists that grouped customers based on various differentiating criteria. But marketers didn’t use to have access to, say, our search topics, back when we were searching in libraries, not Googling. The post office didn’t ask us to agree that it was allowed to open our letters and scan them for keywords that would then be sold to marketers that wanted to reach us with more accurately personalized offers. We would have balked. We should balk now.

Maybe some personal data can be sold without undermining our autonomy, and some can’t. Access to a person’s name and phone number is not the same as access to his or her Social Security number, or search topics, or communications with his or her coworkers, friends, family or lovers. The intimate details of our lives, and in particular our communications (including those on any social media that does not clearly describe itself as “public”) should be differentiated from “the new oil” or “the new bacon.” They should, indeed, be off the market.

At the same time, we should acknowledge that not all Big Data is radioactive. We need to separate our metaphors, and maybe come up with some new ones, too, in order to give clarity to the issues we now face in the new data economy.

Irina Raicu is the Director of the Internet Ethics program at the Markkula Center for Applied Ethics, Santa Clara University. Follow the Internet Ethics program on Twitter at @IEthics.

This article originally appeared on