What do the pre-election polls tell us? We've had some big swings: After the Democratic convention, Hillary Clinton seemed to be locking the election up, then Trump came back to a near-tie, then came a series of events — three debates, sexual assault revelations, and a war within the Republican party — which seemed to knock Trump out of the race. Then, more recently, a series of FBI leaks brought the polls back to a near-tie. Put this together, and you get the impression of a volatile electorate: Anything can happen, and all may depend on voters' reaction to last-minute news of Chris Christie, Melania Trump, or whatever Julian Assange may have up his sleeve.
Actually, though, not many voters change their opinions during the general election campaign. This finding is borne out by my research with Sharad Goel, Doug Rivers, and David Rothschild on surveys during the 2012 campaign, Alan Abramowitz's analysis of polls during the 2016 campaign showing a nearly perfect tracking between Clinton or Trump support in a survey and the proportion of Democrats or Republicans in the sample, and an analysis by Ben Lauderdale and Doug Rivers of surveys during the recent campaign.
Why, then, do the polls swing so much? This can mostly be explained by differential nonresponse to pollsters: Clinton goes up when more Democrats answer a survey, and Trump goes up when Democrats are less likely to respond. As Lauderdale and Rivers put it, "when things are going badly for a candidate, their supporters tend to stop participating in polls." This is how new events can have big effects on the polls even if they aren't changing many vote intentions.
For example, after the FBI letter on Clinton a week or so ago, I predicted that Clinton would fall in the polls — not because she was going to lose many votes but because the news would pump up a lot of Trump supporters who would then become more enthusiastic about the election and respond to surveys. The preceding weeks had been full of bad news for the Republican candidate, hence his supporters were dejected and not participating in polls. During that period, I and others suspected that Clinton's lead in the polls had been exaggerating her strength among the electorate.
Four questions remain:
- Why are we talking about all this now; why was differential nonresponse not a part of the conversation in previous elections?
- Is this bias a problem with poll aggregators such as Nate Silver?
- Does likely voter screening fix the problem?
- Supposing that my colleagues and I are right that much of the swing in polls is explainable by differential nonresponse: Would this then also show up as differential voter turnout? In other words, would a failure to respond to polls predict failure to turn up on Election Day?
Here are my answers.
- Differential nonresponse is a bigger deal now than it used to be, for two reasons. Survey response rates are lower. Not too many decades ago, quality polls had response rates over 50 percent; now a survey is lucky to get 10 percent participation. As a result, responding to surveys is much more optional, and we'd expect differential nonresponse to be a bigger deal. At the same time, the electorate is more polarized, and fewer people change their minds during the campaign. Thus, compared with previous decades, the "signal" of actual swings is lower and the "noise" of nonresponse is higher, and we need to be concerned about this source of bias more than ever before.
- Yes, poll aggregators are subject to this bias, because a poll average or poll-based model is only as good as the surveys that go into it. It would be possible for a forecasting model to attempt to correct for differential nonresponse bias by adjusting surveys based on the partisanship and recalled vote of their respondents, but this would require more information than is usually collected by poll aggregation sites.
- No, likely voter screening won't fix differential nonresponse bias. Actually, it can make the problem worse. Poll fluctuations are driven by fluctuations in enthusiasm, and I'd expect that screening for likely voters — which is just another measure of interest in the election —would just exacerbate the bias and increase these artifactual swings.
- Finally, what about voter turnout? Voter turnout rates in the general election for a US president is about 60 percent. Survey response rates are below 10 percent. Survey response is a much more optional thing, hence it makes sense to see much bigger swings in differential survey responses than in differential turnout. So, yes, differential turnout in voting is a thing, it’s just not as big as differential nonresponse in surveys.
Put it together, and what do we have? Very little evidence of opinion swings and months of polls that are consistent with a narrow but stable Clinton lead.
Andrew Gelman is a professor of statistics and political science and director of the Applied Statistics Center at Columbia University. He blogs at Statistical Modeling.
The Big Idea is Vox’s home for smart, often scholarly excursions into the most important issues and ideas in politics, science, and culture — typically written by outside contributors. If you have an idea for a piece, pitch us at email@example.com.