# Forecasting the election – does anyone have any idea?

We’re into the final week before the election, and forecasts abound still. From May2015’s website, most put the Tories ahead, but not by enough to form a majority. May2015 calculate an average of six forecasts and find the Tories 12 seats up on Labour:

A range of bookmakers have been pricing up individual constituencies for some time now, providing an alternative source of information on constituencies to the hugely illuminating Ashcroft polls.

But what do such individual betting markets imply for the national picture? These bookmakers do naturally have overall outcome markets, but to what extent are their individual constituency markets consistent with the overall national picture?

I decided to take the bookmakers seriously with the pricing at constituency level. Each set of odds implies probabilities surrounding outcomes in that seat. However, calculating probabilities for 650 seats to get probabilities of outcomes is a fearsome task. Instead I’ve simulated outcomes: based on the implied probabilities of outcomes in seats, I’ve generated 100 election outcomes for each bookmaker’s set of odds for constituencies (taking any constituency that bookmaker doesn’t have a market for as a certain hold for the incumbent party). The resulting set of outcomes can be thought of as the election occurring according to bookmaker X – what outcome is most likely, what outcome less likely?

What’s the result of this? Considerable disagreement between bookmakers!

This plot shows what the 18 bookmakers (or betting exchanges, including Betfair) imply for the seat totals of Labour and the Conservatives. Betfair’s exchange is the most conservative, only implying LAB 230 and CON 233, while Betway’s prices suggest that CON will get 300 seats to LAB’s 267. Bet Victor, Stan James, 888sport, Betway, Unibet, and 32Red are the bookmakers whose prices imply CON will win 30-50 more seats than Labour; all remaining bookmakers have the two parties roughly neck and neck.

An interesting aside from this is that as we generate distributions of outcomes according to each bookmaker, we can ask more interesting questions such as: how likely is it that Labour wins most seats? How likely is it the Conservatives get enough for a majority outright? For Betfair’s exchange, there’s a 37% chance Labour wins most seats, a 4% chance they win the same number, and a 59% chance that CON wins most seats. Betfair sees no possibility of either party winning a majority. Conversely, Betway’s prices entertain no possibility of Labour being the biggest party.

# How Accurate are Constituency Polls?

An additional source of data to calibrate forecast models for the forthcoming general election this time around is the sudden abundance of constituency level polls, almost exclusively thanks to Lord Ashcroft.  This undoubtedly is an awesome resource, but there’s at least two problems:

1. Some of them must be inaccurate, writes Stephen Tall: On the basis that 1 in 20 statistical tests will produce an error if we choose a 5% level of significance, so 1 in 20 polls, statistically speaking, must be wrong. Hence with close on 200 constituency polls thus far, at least 9 must be wrong – which ones, though?
2. How do we calibrate constituency polls into forecast models? In order to do so, we need some historical precedent – a previous election, for example.

As with Stephen Tall’s article, I don’t wish to reduce the importance of, and the welcome addition of Ashcroft’s polls. However, I do wish to try and dig a little deeper into both of these questions.

The only historical precedent we have for Ashcroft’s polls are by-elections, where we know the outcome. Wikipedia’s page on constituency polling, which can with a little bit of pain be turned into a use-able spreadsheet, and marshalled for this purpose.

There have been six by-elections for which constituency polling was carried out in this parliament: Clacton, Eastleigh, Heywood and Middleton, Newark, Rochester and Strood, and Wythenshawe and Sale East. For these by-elections we can plot the opinion poll vote share against actual vote share each party received in the by-election.

The 45-degree line represents a polling ideal: opinion poll vote shares are exactly equal to outcomes. Clearly this is unrealistic for every poll, but pollsters must aim to be near to this line, assuming voting intent does not change between the polling date and election date. Points above the line show that a party got more votes on election day than they were polled to, while points below suggests they got fewer.

Plots are undoubtedly informative, but quantifying potential biases needs more serious statistical work; a linear regression of by-election vote shares on poll shares can reveal the extent to which polls may be biased towards or against particular parties.

The purple dots above the 45 degree line are indicative of a downward bias in polls for Ukip’s vote share; linear regression analysis shows that this is significant, and represents about six polling points: Ukip’s actual vote share in these by-elections was six points more than it was polled to get. Hence pollsters under-estimated Ukip support. Equivalently, Labour’s red dots are generally below the line; pollsters over-estimated Labour’s vote share by three points in these by-elections.

Now, to some extent, it can be argued by-elections are not representative of reality since they often constitute protest votes by fed up voters. And these two biases (the rest are insignificantly different from zero) definitely suggest a protest vote away from the major party (Labour) to the fringe party (Ukip). But were this to be the case, it should be that pollsters pick up this sentiment when polling likely voters?

Nonetheless, this mini-analysis does suggest that, by and large, constituency polling is accurate – deviations from the 45-degree line are marginal at best (except for Labour and Ukip)…

# More election forecasts

Today the phenomenon that is FiveThirtyEight has joined the UK General Election fray, announcing it’ll be running a forecast. FiveThirtyEight, or perhaps more so its head Nate Silver, is well known for his forecasting prowess, particularly in US election, but as detailed in the linked article, his forecast of the 2010 UK General Election went awry somewhat. It sounds like he’s taking things a lot more seriously this time around, which will be very interesting to see.

In the interview style of the linked post, Silver talks about the issue of going from polls to seats, and how well it works – as in, not particularly well. Which is why I’m still surprised that my simple linear regression model of polls since 1970 did as well as it did (and accounts for 90% of the variation in historical data). Unlike the other forecasts of the outcome that Silver refers to, that model actually points towards a Tory majority on May 7.

It’s a really basic model, however, and has none of the basic ingredients we would want to include in a proper election forecast model. But it certainly provides an interesting alternative forecast…

# Can Polls Predict Seats?

Opinion polls are increasingly common; UK Polling Report lists 125 polls for both the 1974 elections combined, between 1970 and 1974, while the same website lists 1868 for the 2015 election, with still 67 days to go. However opinion polls only report the vote share implied by the surveying of the pollster; while undoubtedly vote share will influence election outcomes, anomalies are still possible. For example, in February 1974 the Conservatives won more votes yet fewer seats than Labour, an outcome it is reported the Tories are preparing for this time around.

Seats matter more, and trends vary across the country for the different parties; many Tory heartlands are supposedly at threat from Ukip, while Labour’s Scottish seats appear lost to the SNP. Constituency polling, led it seems by Lord Ashcroft since 2010, is viewed as the way forward. Despite this, traditional nationwide polls continue to attract attention – not least the current neck-and-neck nature of Labour and the Tories. Can we glean anything from such polling?

Is there any kind of relationship between how much a party is polling, and how many seats they can expect to get? There naturally is, but the more pertinent question is how strong and robust is that relationship over the years? We focus on polls since 1970, hence 10 elections and 8,253 polls. We include information on the time horizon until the election (number of days), the political party (in case of any biases), and consider any kind of incumbency effect, and we interact all these variables together in a linear regression model in order to see whether the resulting model had any explanatory power. Surprisingly enough, it manages to account for almost 90% of variation in seats won, and I’m happy to provide any interested party with the regression output (I’m working on tidying up a more general set of code for this).

A few plots to help:

Firstly this is a cross plot between actual opinion poll shares and election outcomes in terms of seats – the black circles signify such points. They show the Lib-Dem cluster below 100, and the Labour/Tory cluster spread between 200 and 400. It suggests something of a non-linear relationship in the Labour/Tory cluster, since polling shares in the 50s are consistent with seat totals of 300 and 400. The red apparent scribbles are the fitted values, or predicted values, from our linear regression model. They show that within the sample, to some extent, the two clusters are captured. Clearly improvements are possible, but it’s a reasonable model to begin with.

How did the model fare in 2010? Here’s the implied seat forecasts from each poll against the actual seat totals:

Note this is the same model estimated over all polls prior to the 2005 election – any poll for the 2010 election is excluded in order that this is actually a forecasting exercise rather than an in-sample fitting exercise. The resulting forecasts for seats are quite surprising in that Labour were forecast as the election neared to get more seats. This appears to be the incumbency effect, which is strongly significant in the regression model. On average, forecast errors were about 11 seats for this election, but clearly biased up for Labour, and down for the Tories and Lib Dems. The same was true, but on a smaller scale, in 2005.

What does all of this mean for 2015? It’s harder to tell whether there’s an incumbency effect since there is no comparable full coalition term; the model automatically attributes this to the Conservatives. The forecasts look like:

Hence unlike much of the current media narrative (e.g here, here), this very simple model appears to point towards the Tories not just being the largest party in a hung parliament, but potentially winning a majority; the most recent polls indicate based on historical data that the Tories will win somewhere between 280 and 300 seats, but Labour only 220-240. Could the incumbency effect be too strong here, as it appears to have been in forecasts in 2010?

There are no Ukip forecasts since there is no recent historical precedent for Ukip, hence no data upon which to base a forecast. We could apply the model, estimated over Lib-Dem historical performance, to Ukip, but there’s only so far one should take a very basic statistical model such as this. On the most important question of most seats between Labour and the Tories, it has already provided a thought provoking forecast.