November 14, 2016

Whose afraid of the big bad US election pollsters?

By Paul Laughlin

US electionMany column inches of newspapers this week have been filled with diagnosis of what went wrong with polls predicting US election outcome.

Why were most media outlets, most polls predicting a Clinton win. More relevantly to this blog, should it actually matter to Customer Insight leaders.

I believe some of the lessons that can be learnt from this & other recent failures are very relevant for insight leaders. There are two main reasons for that.

Firstly, not everyone got it wrong or got it wrong as much; so there are examples to learn from. Secondly, some of the lessons to learn relate to relative use of analytics & research methods — so their relevance is much wider than just election polling.

As might be expected, the New York Times ran a well written piece, about the failure of polls & many forecasters.

The Problem

The NY Times lists the polls that got it wrong & why most polls predicted that Clinton would win.

But one of the main reasons I like this article, is this timely quote:

But data science is a technology advance with trade-offs. It can see things as never before, but also can be a blunt instrument, missing context and nuance. All kinds of companies and institutions use data quietly and behind the scenes to make predictions about human behaviour. But only occasionally — as with Tuesday’s election results — do consumers get a glimpse of how these formulas work and the extent to which they can go wrong.”

Very well said. A more balanced assessment of the potential for Data Science needs to be heard, especially amidst the enthusiasm of conferences & pundits.

How Data Failed Us in Calling an Election

It was a rough night for number crunchers. And for the faith that people in every field – business, politics, sports and academia – have increasingly placed in the power of data. Donald J. Trump’s victory ran counter to almost every major forecast – undercutting the belief that analyzing reams of data can accurately predict events.

Casual factors in US election

One can normally rely on Andrew Gelman for a balanced critique of statistical methods & the weight of evidence for claims.

In this post, he does not disappoint. Drawing our attention to it being a smaller swing than you might think (2%), he usefully highlights several factors at play.

Andrew critiques a number of popular theories (for once it looks like it wasn’t another case of “Shy Tories” or “Shy Trumpsters”). Then he goes on to list numerous factors are work, including voter enthusiasm, collapses of this party vote (despite what some papers say), long queues for voting putting people off & potentially an underestimating of the effect of non-traditional campaigning (e.g. Twitter & TV shows).

There is no simple conclusion, but I take away the evidence for no one simple cause for models getting it wrong. There were several factors at play in reality.

Statistical Modeling, Causal Inference, and Social Science

The title of this post says it all. A 2% shift in public opinion is not so large and usually would not be considered shocking. In this case the race was close enough that 2% was consequential. Here’s the background: Four years ago, Mitt Romney received 48% of the two-party vote and lost the presidential election.

Well worth all statisticians reading that article.

Research Methods

As much as Andrew can be trusted for statistical robustness, GreenBook is a trusted source for research best practice.

In this reflection from Tom Anderson, he reminds us that not everyone got it wrong. His text analytics of social media sentiment proved accurate (even if published with a very cautious understatement of conclusion). Like Nate Silver (of FiveThirtyEight), their results showed a very real possibility for Trump to win.

What is interesting in this approach though, is that the ‘hero’ is not clever model tweaking by skilled statisticians, but a change in research methods used. Tom makes a good case for increased use of behavioural analytics, social media analytics or other non-traditional methods – may of which did predict a Trump win.

I agree with his conclusion, that could also have been made after Brexit polls.The learning point is the growing evidence that: “conventional quantitive Likert-scale survey questions—the sort used in every poll—are generally not terrific predictors of actual behaviour”.

What’s Really Wrong With Polling?

Whatever your politics, I think you’ll agree that Tuesday’s election results were stunning. What is now being called an historic upset victory for Donald Trump apparently came as a complete shock to both of the campaigns, the media and, not least, the polling community.

Well worth all research leaders reading that post.

Who is the real villain?

OK, if it’s not true that all pollsters got it wrong, who is really to blame? Should the public & analysts shift their focus to another villain?

As usual, Tim Harford’s More or Less podcast points us in the right direction, in highlighting again the critical role of the Electoral College system.

But the most devastating critique I have read, of the impact this system actually has on US election results, is this piece from Forbes magazine:

In that, statistician Meta Brown (author of great books on Data Mining & Storytelling), lays bare the real villain of this result. I had not realised how much the Electoral College system favours white voters, nor its dubious history in compromises during abolishing slavery.

Compared to those flaws, perhaps the selecting of better research & statistical methods is a minor improvement. So, don’t scapegoat the analysts.