An additional source of data to calibrate forecast models for the forthcoming general election this time around is the sudden abundance of constituency level polls, almost exclusively thanks to Lord Ashcroft. This undoubtedly is an awesome resource, but there’s at least two problems:
- Some of them must be inaccurate, writes Stephen Tall: On the basis that 1 in 20 statistical tests will produce an error if we choose a 5% level of significance, so 1 in 20 polls, statistically speaking, must be wrong. Hence with close on 200 constituency polls thus far, at least 9 must be wrong – which ones, though?
- How do we calibrate constituency polls into forecast models? In order to do so, we need some historical precedent – a previous election, for example.
As with Stephen Tall’s article, I don’t wish to reduce the importance of, and the welcome addition of Ashcroft’s polls. However, I do wish to try and dig a little deeper into both of these questions.
The only historical precedent we have for Ashcroft’s polls are by-elections, where we know the outcome. Wikipedia’s page on constituency polling, which can with a little bit of pain be turned into a use-able spreadsheet, and marshalled for this purpose.
There have been six by-elections for which constituency polling was carried out in this parliament: Clacton, Eastleigh, Heywood and Middleton, Newark, Rochester and Strood, and Wythenshawe and Sale East. For these by-elections we can plot the opinion poll vote share against actual vote share each party received in the by-election.
The 45-degree line represents a polling ideal: opinion poll vote shares are exactly equal to outcomes. Clearly this is unrealistic for every poll, but pollsters must aim to be near to this line, assuming voting intent does not change between the polling date and election date. Points above the line show that a party got more votes on election day than they were polled to, while points below suggests they got fewer.
Plots are undoubtedly informative, but quantifying potential biases needs more serious statistical work; a linear regression of by-election vote shares on poll shares can reveal the extent to which polls may be biased towards or against particular parties.
The purple dots above the 45 degree line are indicative of a downward bias in polls for Ukip’s vote share; linear regression analysis shows that this is significant, and represents about six polling points: Ukip’s actual vote share in these by-elections was six points more than it was polled to get. Hence pollsters under-estimated Ukip support. Equivalently, Labour’s red dots are generally below the line; pollsters over-estimated Labour’s vote share by three points in these by-elections.
Now, to some extent, it can be argued by-elections are not representative of reality since they often constitute protest votes by fed up voters. And these two biases (the rest are insignificantly different from zero) definitely suggest a protest vote away from the major party (Labour) to the fringe party (Ukip). But were this to be the case, it should be that pollsters pick up this sentiment when polling likely voters?
Nonetheless, this mini-analysis does suggest that, by and large, constituency polling is accurate – deviations from the 45-degree line are marginal at best (except for Labour and Ukip)…