clock menu more-arrow no yes mobile

Filed under:

Facebook's study of news revealed its plans to be the next top search engine

Michael Gottschalk/Photothek via Getty Images

Last week, Facebook posted a study on what Americans see in the News Feed from "ideologically diverse" sources. The company created a bot to categorize news based on headlines and the first few words of an article. However, the bot revealed more about Facebook's future plans to expand its search capabilities than it did about user echo chambers.

Here's how the reading bot works

Social networks are often criticized for creating silos of thought, echo chambers in which people only consume and see ideas they agree with. Online social networks tend to reflect real-life social networks: they grow when early adopters invite their friends, family, and colleagues. Facebook data scientists wanted know whether an echo chamber effect exists on Facebook, especially pertaining to political news in the United States.

They created a Support Vector Machine (SVM) to correlate the political affiliation of a set of users to shared news sources. The SVM (or bot, as I simplistically call it here) analyzed patterns in the "first few words" of stories that were shared on Facebook by a small percentage of American users:

"...we trained a support vector machine classifier which uses the first few words of articles linked for each URL shared on Facebook. This allowed us to identify more than 226,000 unique hard news articles that had been shared at least 100 times."

How are people responding to the study?

This isn't the first time Facebook has studied its users despite criticism. Facebook was hit with a $15 billion lawsuit in 2010. And when Facebook revealed it was playing with our minds by changing what we saw in our feed back in 2012, it led to a pointed criticism of the company's ethical standards. Neither of these events, however, led to user drop-off.

The recent study, "Exposure to Diverse Information on Facebook," is the third significant time Facebook has focused on what American users are up to (that we know of). Shortly after its publication, a minority of journalists and academics alike questioned its construction and results. As Mathew Ingram puts it, "Not only does the study not actually prove what it claims to prove, but the argument that the site is making in defense of its algorithm also isn’t supported by the facts — and in fact, can’t actually be proven by the study as it currently exists."

Why the study is so easily criticized

The sample set doesn't reflect the average American. Facebook found that only about a quarter of politically affiliated American users are exposed to ideologically diverse news.

The study sample was limited to users who post a political affiliation, which is an option in your profile that looks like this:

Facebook's political views section.

Facebook's political views section.

Only 9 percent of US-based Facebook users post a political affiliation. A study about a self-identifying minority is a study about silos. If they show little to no correlative relationship to their sources of news, the silos would be debunked. But the study shows the opposite. Yes, there is an echo chamber on Facebook, and this study proves it. Three out of every four politically affiliated Facebook users in the United States do not see or share stuff they'd disagree with in their feed. Echo chamber, confirmed.

Analysis isn't just a keyword. Context matters more in modern politics than anywhere else in the news. It doesn't need to be said that reading the first few words of an article will save you a ton of time and also have little contextual value. But in general, we can assume (or hope) that the more words an article has, the more information and hopefully context it likely includes.

The bot sees everything in black or white. There's a more fundamental problem with using an SVM than that it is a terrible reader. SVMs tend to work best with data sets that only have two categories. So Facebook needed to divide all news into two categories. They chose an interesting duality: hard versus soft news. But what is soft news to one is hard news to another, and vice versa. If Kim Kardashian announced that she supported Ted Cruz's presidential campaign, we have no idea which category Facebook would assign it.

As a University of Wisconsin Madison study summarized the challenge of using SVMs for multi-category data: "The size of the problem is bigger than that of solving a series of binary problems." Translating their conclusion into this context: determining how people form and keep political ideologies, and are exposed to views different from their own, is a much bigger challenge than assigning articles into one of two categories.

There's inconclusive data to correlate exposure to news, consuming news, and sharing the news. The longer you read something (which is a form of exposure), the less likely you are to share it, according to Web analytics firm Chartbeat.

Chartbeat found there's little correlation between scrolling down on an article page (and being exposed to more words) and sharing a link, at least on Twitter:

Facebook seems to only define exposure by what you see on the News Feed — a place where the platform readily admits we spend less than a few seconds reviewing headlines or videos before moving along.

What the bot is really telling us

Facebook is building a search engine to rival Google. The bot is a preview of how that search will rank stories. This week, John Constine and Kyle Russell at TechCrunch shared screenshots of a newly discovered way to use Facebook search. The "Add a Link" function lets you post search results to your page that originally come from outside of Facebook.

And how do you see webpage search results inside Facebook of webpages that live outside of Facebook unless you first index those pages? And how do you target those search results based on user preference? You use web crawling bots that read quickly and correlate user data and content. Facebook has already crawled 1 trillion link posts inside Facebook. Now it just has to crawl everything else on the World Wide Web. What will it look like? Something like what Google+ was supposed to be, but if the "+" part came first.

Did you read this until the end? Congratulations! You're not Facebook's read-hating bot.

Sign up for the newsletter Sign up for Vox Recommends

Get curated picks of the best Vox journalism to read, watch, and listen to every week, from our editors.