clock menu more-arrow no yes mobile

Use big data to explain politics rather than predict it

There’s a great article by Michael Gaynor in last week’s Washington Post Magazine on using big data to predict legislative and policy outcomes. The article focuses on Tim Hwang’s FiscalNote, a company that uses masses of data to predict the outcomes of bills and other forms of policymaking. The article highlights several other companies that have engaged in similar endeavors. However, all of these are focused on prediction. While accurate prediction is a tempting prize, it may not always be the right goal.

The benefits of using big data to glean insights has been around since the digital age advanced to the point of everyday people being able to manage many gigabytes of data on personal computers. This strategy was made famous in the 2004 book Moneyball by Michael Lewis, which Steven Zaillian and Aaron Sorkin adapted into a screenplay that became a popular movie (2011), starring Brad Pitt, Robin Wright, Jonah Hill, and Philip Seymour Hoffman.

In politics, Nate Silver rose to fame by applying the same principles beyond the sports arena and making highly accurate forecasts of presidential elections. The algorithms used by Silver and his team at FiveThirtyEight have been highly transparent and useful, and their work has in many ways transformed the ways pundits and scholars view the horse race of campaigning.

The practice of using big data to forecast election outcomes has also been criticized. Some have questioned whether forecasting itself can affect election outcomes. While there is heavy skepticism about the magnitude of any impact, the idea is that real-time, publicly available information about who is leading at any given point in a campaign may have outsize influence on voters’ and candidates’ strategic choices about whether to vote and to whom to campaign, respectively.

But what if making predictions is the wrong goal? It’s very human to want to know who or what will win a contest. Humans can be naturally competitive and even voyeuristic, characteristics that have helped our species thrive. I totally get that predictive knowledge is enticing and exciting. It’s like the dopamine rush from licking the buttercream icing straight off the cake.

I have no problem with the horse race, and I certainly have done my share of consuming plenty of predictive analyses. But I find that I’m often left with more questions than answers. Sometimes, the predictive information is dissatisfying.

When we find it dissatisfying, it may be because what we really seek is understanding, or information about how to influence outcomes.

When social scientists use big data to engage in analyses of this type, the primary goal is to explain rather than predict. Prediction is fun but may not allow us to understand the underlying causes of a phenomenon or outcome. This is where the dissatisfaction comes in. Using the data to focus on developing a clearer understanding of how the world works, how humans interact in it, and how these interactions produce outcomes, can provide enlightenment. Ultimately, this enlightenment can arm us with higher quality information than prediction alone.

For those who seek to use these tools for influence, or to help achieve particular policy outcomes, wouldn’t it be better to know how and why something works, rather than whether or not it is likely to happen?

I admittedly come at this with the bias of a social scientist, and my scholarly field is charged with providing explanations about how the world works. I certainly don’t expect or want the fun predictive stuff to go away. But if it seems lacking, then it could be because we are asking the wrong questions. Rather than asking, “Who will win?” or “Which will pass?” perhaps we should ask, “Why is one candidate ahead?” and “What are the conditions that make a policy gain traction?”

Big data can help us do both. Finding causal mechanism and developing plausible explanations are challenging, and may be more work than building predictive models. But if we use our masses of data to help us engage in both of these, we can have our cake and its icing.