Election outcome – what?!

The news outlets have now spent the last 16 hours finding as many superlatives as they possibly can to describe the election we just witnessed. As Britain went to the polls, the opinion pollsters continually had Labour and the Tories neck and neck:


The picture doesn’t make it totally clear (it’s all polls with a smoothed line plotted through), but a visit to, say, UK Polling Report’s polling average makes it more clear.

The outcome, however, is that the Tories polled 36.9% of the vote nationally, and Labour 30.5%, and subsequently the Tories have managed to win a majority, with 330 seats as I write, one more to declare.

The discussion has at least in part centred on why the polls were so wrong. I want to add a minor quip to all of this. As part of my forecasting course I prepared a simple linear regression based forecast model that simply uses nationwide polls alone to predict seat outcomes. I presented it to my students back in March (slide 10), and then also quickly referred to it in a talk at Nottingham Business School on Wednesday (slide 6). Here’s the forecast(s):


Marked on are the outcomes, as they stand. The regression model took each opinion poll with its projected vote share, and also corrected for the number of days until the poll, and whether a party was incumbent. It combined these variables using interaction terms, but remained nonetheless a simple linear regression model. Nothing special.

But it does get Labour’s seat total pretty much bang on. A couple of opinion polls were as optimistic as what’s occurred, but the majority aren’t all that far short of what the Tories ended up with, and certainly this model did predict a much wider gulf between the parties than the naked polls alone did.

What does this say? It probably says that if we corrected polls for their historical performance in predicting seat outcomes, they’re not that far away from what actually happened, in reality. This method does also bias correct polls as well, should they display any bias towards one party or another, and adds a control for an incumbent party – which raises their seat totals, and hence the Tory total being much bigger than Labour despite polling neck and neck.

I’m sure nonetheless that pollsters will get it in the neck, but I thought I’d just point this out…

Forecasting the election – does anyone have any idea?

We’re into the final week before the election, and forecasts abound still. From May2015’s website, most put the Tories ahead, but not by enough to form a majority. May2015 calculate an average of six forecasts and find the Tories 12 seats up on Labour:

A range of bookmakers have been pricing up individual constituencies for some time now, providing an alternative source of information on constituencies to the hugely illuminating Ashcroft polls.

But what do such individual betting markets imply for the national picture? These bookmakers do naturally have overall outcome markets, but to what extent are their individual constituency markets consistent with the overall national picture?

I decided to take the bookmakers seriously with the pricing at constituency level. Each set of odds implies probabilities surrounding outcomes in that seat. However, calculating probabilities for 650 seats to get probabilities of outcomes is a fearsome task. Instead I’ve simulated outcomes: based on the implied probabilities of outcomes in seats, I’ve generated 100 election outcomes for each bookmaker’s set of odds for constituencies (taking any constituency that bookmaker doesn’t have a market for as a certain hold for the incumbent party). The resulting set of outcomes can be thought of as the election occurring according to bookmaker X – what outcome is most likely, what outcome less likely?

What’s the result of this? Considerable disagreement between bookmakers!

Implied Seat Totals

This plot shows what the 18 bookmakers (or betting exchanges, including Betfair) imply for the seat totals of Labour and the Conservatives. Betfair’s exchange is the most conservative, only implying LAB 230 and CON 233, while Betway’s prices suggest that CON will get 300 seats to LAB’s 267. Bet Victor, Stan James, 888sport, Betway, Unibet, and 32Red are the bookmakers whose prices imply CON will win 30-50 more seats than Labour; all remaining bookmakers have the two parties roughly neck and neck.

An interesting aside from this is that as we generate distributions of outcomes according to each bookmaker, we can ask more interesting questions such as: how likely is it that Labour wins most seats? How likely is it the Conservatives get enough for a majority outright? For Betfair’s exchange, there’s a 37% chance Labour wins most seats, a 4% chance they win the same number, and a 59% chance that CON wins most seats. Betfair sees no possibility of either party winning a majority. Conversely, Betway’s prices entertain no possibility of Labour being the biggest party.

The University of Reading’s workshop on “Big Social Data: Interdisciplinary Analytics”

The University of Reading’s workshop on “Big Social Data: Interdisciplinary Analytics” was held last week, with funding contributions from the University’s RETF and the Dept. of English and Applied Linguistics (DELAL), Henley Business School (HBS), School of Politics, Economics and International Relations (SPEIR) and School of Systems Engineering (SSE). The workshop was organised by academics […]

The BBCDebate: absentees more influential?

Last night the BBC aired its debate of the challengers, as it put it, with leaders of the five opposition parties squaring up to each other. Prime Minister David Cameron and Deputy Prime Minister Nick Clegg did not participate, and the latter was at pains to point out that he wasn’t even invited.

There’s little doubt this wasn’t the biggest Twitter event of the election campaign, but nonetheless well over a thousand tweets per minute were recorded, and in total we collected 151,417 tweets surrounding the event. Most activity, understandably, came towards the end of the debate as each politician tried to leave viewers with their version of events:

Number of tweets per minute

The spike towards the end could perhaps be explained away by the three “major” parties going into spinning overdrive as the debate closed; this seems clearer looking at the numbers of tweets per party:

Number of tweets per party

The second Ukip spike, just after 8:30pm, appears to coincide with Nigel Farage’s attack on the audience both in the studio and at home, while nearer 9pm is when the debate moved to immigration; at this point Ukip were getting more than twice as many mentions on Twitter as any other party.

As Sylvia outlined in our last post after the seven-way debate, we’ve created out own sentiment index, and below we plot the index for each of the parties, including the two not participating in the debate:

Sentiment during #BBCDebate


What is perhaps most notable is that the index with the biggest range is the Conservative one, despite David Cameron not participating; just before 9, not long after the question on defence, Conservative sentiment is at rock bottom, but just before the end of the debate (perhaps co-ordinated?), Tory sentiment is soaring, although in the final minute Labour’s sentiment is almost identical. The SNP, widely noted for their social media campaigning, also show a late burst, although Sturgeon’s somewhat disappointing final comments appear reflected in the last minute tail off in sentiment.

Overall it’s clear that very little is clear regarding who “won” last night, and whether indeed it was one of the two parties that didn’t participate – at least in the televised debate…

So who really won the debate? Post-match analysis of public attitudes on Twitter

Immediately after the end of the leaders’ debate, media and political analysts rushed to identify the winners and losers of the event. Various exit polls were cited. Whereas YouGov proclaimed Nicola Sturgeon and Nigel Farage the winners, ICM put Miliband first. And today every part leader seems to celebrate his or her debate victory … of course. While the focus on the party leaders is understandable in the run-up to the election, we should perhaps pause for a minute and reflect back on the messages that were voiced yesterday; perhaps they could tell us a bit more of what ideas are likely to gain public support. Social Media could be useful in this respect. As we have already noticed (see the previous post on Democracy is Cyber-participation), the TV political debates seem to engage Twitter users. Using the Twitter streaming API to monitor ‘political’ tweets yesterday in real time, we recorded a massive rise in Twitter activity during the debate. The total count of ‘political’ tweets, that is, tweets including specific references to party terms and produced on Thursday 2nd April was 800,350, of which nearly 80% (614,800 tweets) were generated between 7pm and midnight. No doubt, Twitter users were engaging with the debate.

political tweets count

We were, however, interested in the ways in which Twitter users respond to the messages voiced by the individual party leaders and to what extent what was said by the party leaders influenced public attitudes or sentiments. In order to do so, we created a ‘political’ sentiment index. The index is based on evaluative words (mainly adjectives) retrieved from political tweets that we have been collecting over the last two weeks. Each item was given a score: +1 for positive meanings, -1 for negative meanings and 0 for neutral. When doing so, we recognised the fact that certain words may change their evaluative meanings when used in political contexts. Nevertheless, the massive amount of available data allows extracting valuable information even in the presence of semantic inaccuracies and noise. This is the beauty of the data-driven knowledge discovery.

Subsequently, a sentiment score was assigned to all the 600,000 political tweets generated during the debate. In this sense, our analysis is much more comprehensive that the one offered by Demos who considered only tweets which included boos and cheers. The graph below shows the moods in relation to political parties as the debate evolved. Four major topics were discussed including deficit, NHS, immigration and future for young people. The blue lines on the graphs below mark the time slots dedicated to each theme.


Sentiment_Major Parties

Twitter Sentiment Index

Sentiment_Other Parties

Twitter Sentiment Index

As can be seen, the support for each party fluctuated depending on the theme. Which messages scored particularly positively in the eyes of the public? NHS policy of Labour and LibDems seem to have scored well. 40 minutes into the debate, Ed Miliband outlines his plans on how to finance the NHS and following this statement, Labour reaches the peak of positive evaluation. Conversely, UKIP should seriously re-think its NHS policy; stigmatising HIV patients is not going to win public support, though UKIP’s views on immigration seemed to do the trick. SNP appears to be mostly positively evaluated. Having said that, certain messages seem to have been particularly endorsed. Nicola Sturgeon’s appeal for a rational debate on immigration (21:02) and her personal statement about free education that enabled her to be where she is (21:32) won massive support, as does her final statement, in which she outlined SNP as an alternative to Westminster.

The following two word-clouds have been generated with the frequent words found in the tweets associated with SNP and Nicola Sturgeon during the two main periods of Twitter popularity. These are the two periods with highest Political Sentiment Index and appear to have been inspired by Nicola’s key statements on immigration and education, respectively, at 20:55 and 21:35. And these are the messages that appeared to be the winners of the leaders’ debate.

Word Cloud1

Word-cloud for SNP tweets from 21:02 to 21:12

Word Cloud2

Word-cloud for SNP tweets from 21:40 to 22:00

Democracy is Cyber-participation: General Election Twitter Boom

Thursday the 26th of March 2015 was the day of the TV debate ” Cameron & Miliband: The Battle for Number 10″. Arguably this has been the most remarkable day for Twitter activity related to UK politics so far. Many media have reported that allegedly as many as 260,000 tweets were generated during the event.

We have been using the Twitter streaming API to monitor any tweet related to UK politics in real time: a combination of more than 30 tracked terms and ad-hoc filters for a political context check are used to identify the ‘political’ tweets. According to the Twitter streaming API reports, we have missed only 5% of the total traffic generated by citizens who were inspired by the event to become “cyber-chatterboxes“. During the day of the event (26/03), a total of 388,733 ‘political’ tweets were collected, with a 300% increase w.r.t. the previous day. Only in the evening, between 20:30 and 24:00, there were 348,993 ‘political’ tweets, of which 119,282 specifically included the term “BattleForNumber10”. This chart reports the number of recorded tweets over time with specific counts for those that included an explicit reference to a political party.


This second chart provides a more detailed view of the frequency of party terms in the tweets. It clearly shows the transition Tories-Labour corresponding to the transition Cameron-Miliband during the event.


Racism, Farage and Clarkson

The political story of the last 24 hours is clear: Ukip’s leader Nigel Farage would scrap racial discrimination laws in order to set free our employers from the shackles that bind them. Regardless of one’s feelings about this (Fraser Nelson thinks the SNP’s anti-rich bigotry is more appalling, while naturally the Huffington Post takes a different line), there’s little doubt it’s driven much content on social media in the last 24 hours; in the last hour alone, over a thousand tweets specifically mentioning Ukip have been sent.

Here in Reading we’re collecting election-related Tweets, and so this seemed like an good opportunity to visualise what’s going on. Below is a word cloud composed of two types of words: firstly terms in green, such as party names and references, and other proper nouns, and the second set if plain old words, and how frequently they occur. The font size is dictated by the frequency of the word or term: bigger for more commonly found terms.

Wordcloud 12 March

Unsurprisingly Ukip figure prominently in the terms, but amongst the words we see: legislation, racist, racial, discrimination, equality, scrap, rid, laws, and Nigel.

One interesting word there is misrepresent; it’s often claimed that Ukip are misrepresented – could that be what’s happening here?

Another term, tucked away in small font is one that keeps rumbling along: Clarkson.

How Accurate are Constituency Polls?

An additional source of data to calibrate forecast models for the forthcoming general election this time around is the sudden abundance of constituency level polls, almost exclusively thanks to Lord Ashcroft.  This undoubtedly is an awesome resource, but there’s at least two problems:

  1. Some of them must be inaccurate, writes Stephen Tall: On the basis that 1 in 20 statistical tests will produce an error if we choose a 5% level of significance, so 1 in 20 polls, statistically speaking, must be wrong. Hence with close on 200 constituency polls thus far, at least 9 must be wrong – which ones, though?
  2. How do we calibrate constituency polls into forecast models? In order to do so, we need some historical precedent – a previous election, for example.

As with Stephen Tall’s article, I don’t wish to reduce the importance of, and the welcome addition of Ashcroft’s polls. However, I do wish to try and dig a little deeper into both of these questions.

The only historical precedent we have for Ashcroft’s polls are by-elections, where we know the outcome. Wikipedia’s page on constituency polling, which can with a little bit of pain be turned into a use-able spreadsheet, and marshalled for this purpose.

There have been six by-elections for which constituency polling was carried out in this parliament: Clacton, Eastleigh, Heywood and Middleton, Newark, Rochester and Strood, and Wythenshawe and Sale East. For these by-elections we can plot the opinion poll vote share against actual vote share each party received in the by-election.

By-Election Opinion Polls and Outcomes

The 45-degree line represents a polling ideal: opinion poll vote shares are exactly equal to outcomes. Clearly this is unrealistic for every poll, but pollsters must aim to be near to this line, assuming voting intent does not change between the polling date and election date. Points above the line show that a party got more votes on election day than they were polled to, while points below suggests they got fewer.

Plots are undoubtedly informative, but quantifying potential biases needs more serious statistical work; a linear regression of by-election vote shares on poll shares can reveal the extent to which polls may be biased towards or against particular parties.

The purple dots above the 45 degree line are indicative of a downward bias in polls for Ukip’s vote share; linear regression analysis shows that this is significant, and represents about six polling points: Ukip’s actual vote share in these by-elections was six points more than it was polled to get. Hence pollsters under-estimated Ukip support. Equivalently, Labour’s red dots are generally below the line; pollsters over-estimated Labour’s vote share by three points in these by-elections.

Now, to some extent, it can be argued by-elections are not representative of reality since they often constitute protest votes by fed up voters. And these two biases (the rest are insignificantly different from zero) definitely suggest a protest vote away from the major party (Labour) to the fringe party (Ukip). But were this to be the case, it should be that pollsters pick up this sentiment when polling likely voters?

Nonetheless, this mini-analysis does suggest that, by and large, constituency polling is accurate – deviations from the 45-degree line are marginal at best (except for Labour and Ukip)…

Have the bookies adjusted for Ashcroft?

Last Wednesday social media was ablaze with Lord Ashcroft’s latest set of Scottish polls, which suggest that Labour are still on course for a Scottish wipeout on May 7. Has this affected what the bookies have to say?

As before, we look at mean implied probabilities for bookmakers, and this time consider the markets for banded ranges of seats for Labour. The impact of worse than previously anticipated polling in Scotland ought to be reflected in a lower seat expectation than previously. Betfair, Bet365, SkyBet, Ladbrokes and William Hill report markets on bands of seats a party wins at the election, and the bands are

  • less than 200 seats
  • 201-225 seats
  • 225 seats and under
  • 226-250 seats
  • 251-275 seats
  • 276-300 seats
  • 301-325 seats
  • 326-350 seats
  • 326 seats or over
  • 351-375 seats
  • 351 seats or over
  • 376-400 seats
  • 401 or more seats

Clearly the options towards the bottom of that range are hugely unlikely (bookies rate anything above 375 seats as less than 5% likely to happen), but it’s the upper half of the range when the action has been:


The black vertical line is March 4, when the Ashcroft Polls were released. Hence prices have moved since the announcement, but with the range 276-300 falling in likelihood only from 35% to 34% and 301-325 from 23% to 21%, the impact doesn’t appear to have been dramatic. Lower seat totals like 251-275 increased from 28% to 32%. Less likely events saw bigger moves, with 326-350 seats falling from 12.5% to 7% today.

Overall, the numbers would appear to suggest that the Ashcroft polls are reinforcing the current trends, at least in terms of bookmaker prices; an update to the plot of bookmaker implied probabilities for most seats from two weeks ago emphasises this:


More election forecasts

Today the phenomenon that is FiveThirtyEight has joined the UK General Election fray, announcing it’ll be running a forecast. FiveThirtyEight, or perhaps more so its head Nate Silver, is well known for his forecasting prowess, particularly in US election, but as detailed in the linked article, his forecast of the 2010 UK General Election went awry somewhat. It sounds like he’s taking things a lot more seriously this time around, which will be very interesting to see.

In the interview style of the linked post, Silver talks about the issue of going from polls to seats, and how well it works – as in, not particularly well. Which is why I’m still surprised that my simple linear regression model of polls since 1970 did as well as it did (and accounts for 90% of the variation in historical data). Unlike the other forecasts of the outcome that Silver refers to, that model actually points towards a Tory majority on May 7.

It’s a really basic model, however, and has none of the basic ingredients we would want to include in a proper election forecast model. But it certainly provides an interesting alternative forecast…