Facebook made a gesture toward transparency this week with the launch of its Off-Facebook Activity tool. This offers users a glimpse into the many ways details of real-world transactions get shared with Facebook, regardless of what you’re doing on the Facebook platform. The tool lets you monitor and (sort of) delete some of the information it collects about this activity. It also serves as a reminder of the frightening amount of information Facebook knows about you.
This isn’t a surprising revelation; it’s not a revelation at all. Facebook is a free service that allows you to do fun stuff like set up a profile and send messages to friends, and it uses a variety of tools to track your every move both online and off. The social network then uses everything it knows about you (read: more than you’d like) to sell hyper-targeted ads.
This business model has been around for decades, and it shows how free software (not to be confused with the free software movement) and services are never really free. You’re paying with your data. This week’s Facebook news as well as a number of recent reports about data collection also help us better understand the so-called alternative data industry. There’s a good chance that the free app or software you love is funded by the proceeds of your data. All this business, as you might have already guessed, isn’t always great for your online privacy.
What is alternative data?
Alternative data refers to data collected from nontraditional sources including web browsing activity and social media posts. The industry revolves around companies exchanging the tidbits of information collected — much of which can be collected through free software and services, including anything from antivirus software to social media. Alternative data companies essentially package this data and resell it to different brokers or mine that data for a wide variety of insights for investment firms. Typically, this focuses on how we spend our money and time. According to industry group AlternativeData, there are currently about 450 alternative data providers. That number has quadrupled in the past decade, and Wall Street is expected to spend about $1.7 billion on alternative data this year.
A recent article by Motherboard and PCMag showed how a marketing analytics firm called Jumpshot collects browsing data — data as specific as websites visited and YouTube videos watched — then packages and sells that data to companies and investors wondering how certain products are faring or how consumers behave. The catch is that the data was collected through free antivirus software made by Jumpshot’s parent company, Avast. Users have to opt in to data collection, but the investigation claims many still don’t even realize it’s being collected. The company has long been public about this data acquisition and doesn’t collect names, email addresses, contact details, or location data. Following the Vice article, Avast ceased data collection and is shutting down Jumpshot.
Yet, as big as the alternative data industry is becoming, the public knows relatively little about how powerful the information at play can be. Late last year, the New York Times did a deep dive into a gigantic dataset that included 50 billion location pings from more than 12 million American smartphone users. Based simply on research about their locations, the newspaper’s reporters were easily able to connect the anonymized data with real people. Those unsuspecting users included a Microsoft engineer going to a job interview at Amazon as well as a Secret Service agent who spent a lot of time in the West Wing of the White House and at other locations where the president was also present.
These companies collecting and using this data isn’t necessarily a problem. The issue lies in how aware people are about what is happening with their data. Most of us don’t read the print before accepting — and that’s by design. Terms of service can be intentionally long and dense. Real transparency — say, ‘in exchange for this free software we are going to log which sites you visit and sell an anonymized version of that data to third parties that use it to see how popular certain products or trends are’ — would certainly turn some people off. But I’d bet it wouldn’t bother everyone. Free, however many asterisks, is a big selling point.
These articles reinforce that the existence of such information databases is still surprising to many people. Such data collection has been going on for a long time, but companies’ analysis is getting more sophisticated as our number of digital devices and the amount of potential data they can collect grows.
Types of “free” software
Free software — and its trade-offs — have been around as long as software itself. Generally, a free version will come with ads, but it can also come with data collection. Free web services and apps collect loads of data about their users. Data collection isn’t exclusive to free software, either. Paid software companies — and your internet providers and wireless carriers — can also collect and sell information about you. It’s just that, typically, data collection fuels the primary business model for free software and web services.
The types of alternative data collection are as myriad as the types of free software, but generally fall into four buckets: web activity, email receipts, credit card transaction data, and geolocation.
These kinds of data tend to reveal a startling amount about you. Your web browser and some free software, like Jumpshot’s antivirus app, can track everything you do on the web: which sites you visit, what you search, what you buy, and so forth. Meanwhile, some free email clients, in exchange for using their email service, monitor your inbox for things like receipts to get an idea of user spending. Several alternative data companies buy anonymized credit card data from sources like personal finance apps to get more detailed information about how and where people are spending money. The apps on your phone, from your weather provider to your coupon saver, can sell location data to third parties to see what stores people visit and how long they spend there.
Generally this data doesn’t include any obvious personally identifiable information. But as you may know, it can be pretty easy to figure out who someone is if you have a few pieces of information about them.
When you use Google’s free suite of services like Gmail, Maps, or YouTube, Google collects information about you in order to target you with ads. You can actually go into your Google settings and see what Google thinks it knows about you — details as specific as the types of books you like to your personal grooming preferences. But that’s usually not the full extent of the transaction. Google and data hoarders also collect information about consumer behavior in aggregate, which they can sell to interested parties.
As many have put it before: If you’re not paying for the product, you are the product.
Why divulging all your data could be dangerous
It certainly feels creepy to know that seemingly every detail about your behavior can be tracked in some way. Our sense of privacy, not to mention the definition of privacy itself, is in constant flux, while advances in artificial technology and machine learning mean that data mining and analysis are becoming increasingly precise and powerful. That said, we don’t yet know all the ways that divulging so much data could be dangerous.
There is the very real possibility that companies will abuse their access to your data. Think of Cambridge Analytica, the political consulting firm that harvested data on 87 million Facebook profiles without their permission in an effort to help Donald Trump’s 2016 presidential campaign. The incident led to billions of dollars in fines for Facebook but also served as a jarring wakeup call for internet users everywhere that their personal data could be used in unwanted ways.
And when companies fail to properly steward your data, it leaves you and your information open to hacks. Credit reporting agency Equifax in 2017 exposed the confidential financial information of 143 million customers to hackers. A Yahoo data breach a year earlier affected 500 million accounts, and the incident was later tied to Russian hackers. These are obviously just a couple of examples, and the list keeps growing.
Perhaps what’s most terrifying is that we can’t anticipate what advertisers, hackers, governments, tech companies, or anyone who gets their hands on our highly sensitive data, will ultimately do with it. An increasing amount and variety of information about us — our health, our behavior, even our emotions — is being tracked, and it’s so far unclear how the many entities that can access this information will use it. Most likely, the rapid growth in artificial intelligence will make seemingly benign information incredibly valuable and possibly dangerous when it’s combined with other information or analyzed to new purposes.
How to avoid getting more than what you didn’t pay for
One thing you can do that doesn’t involve as much legalese is to be proactive about monitoring and limiting the amount of personal data being collected. Frequently, companies are somewhat transparent about their data collection, and many like Google, Facebook, and Amazon offer tools in your account settings — typically under some sort of “privacy” banner — that help you understand what data they’re collecting about you. Sometimes, you can even opt out of certain kinds of data collection.
There’s also a slew of privacy-conscious software, like the suite of tools from Duck Duck Go, which limit tracking. There are numerous privacy browser extensions that can shield you from ads and tracking, including Privacy Badger, uBlock Origin, AdBlock Plus, Ghostery, and Noscript.
A number of paid software versions don’t delve into data collection at all. Of course, paying for otherwise free software does turn privacy into a luxury good. Case in point: Apple markets itself as a privacy-centric tech company, one that doesn’t collect and sell data about its users. Apple products also carry a higher price tag than many competitors. You can buy a cheaper phone or tablet from Google or Amazon, but you might also be selling some access to your personal data.
The thing to remember is that none of these companies are giving you free stuff out of the goodness of their hearts. At the very least, they are going to try to recoup the money they made creating the service you’re using and hosting it on their service. In all likelihood, they’re trying to make a lot of money, because capitalism. Your privacy is simply the currency helping them get there.
Open Sourced is made possible by Omidyar Network. All Open Sourced content is editorially independent and produced by our journalists.