Trump, Clinton, and the failures of election forecasting

Election season has come and gone. Donald Trump pulled off what has been repeatedly characterized as a “stunning upset” over Hillary Clinton. The web has gone rampant with postmortem analysis about the failures of election polling and forecasting. But was it really that surprising?

Several polls conducted in the lead up to the election reported on the virtually deadlocked race, all well-within any reasonable margin of error:

Quinnipiac University reported on the situation in key battleground states (Nov 2)

Democrat Hillary Clinton’s October momentum comes to a halt as she clings to a small lead in Pennsylvania, while Republican Donald Trump moves ahead in Ohio, leaving Florida and North Carolina too close to call.

Probability forecast models on the other hand were remarkably off-mark and predicted Clinton well-ahead just the night before the election:

  • New York Times Upshot: Clinton 84%, Trump 16%
  • FiveThirtyEight: Clinton 66.9%, Trump 33%
  • PredictWise: Clinton 89%, Trump 11%

I was watching the blitz of last-ditch rallies held by Trump and Clinton the night before election day. The candidates expressed incompatible sentiments about their chances to win. Here is a revealing segment from Trump’s penultimate rally in New Hampshire (8pm on Nov 7):

We are going right after this to Michigan, because Michigan is in play… The polls just came out: we are leading in Michigan; we are leading in New Hampshire; we are leading in Ohio; we are leading in Iowa; leading in North Carolina; I think we are doing really, really well in Pennsylvania; and I do believe we are leading in Florida.

In the meantime, according to the New York Times:

Mrs. Clinton’s campaign was so confident in her victory that her aides popped open Champagne on the campaign plane early Tuesday.

Either way, each candidate and their popular base was clearly happy to live in their own reality right down to the wire. Some personal take-aways from this whole affair:

  • The law of total variance for polling. Uncertainty about polling results is best understood not by not only assessing the variance within a single poll, but the studying the variance across multiple competing polls.

  • Forecasting is an enormously complex empirical problem. Formal statistical methods are unable to accurately quantifying uncertainty around the web of non-statistical effects that ultimately influence a voter’s decision at the ballot box.

  • A common reaction was for people to blame journalists and cancel their newspaper subscriptions. Well-informed voters should not derive the bulk of their information from cursory reading of social or national news media. They are conscious in their critique of information, especially of numbers and “facts” that are confirmatory of their own hopes and beliefs.