Evaluation and Scoring Rules

A bunch of “other” matches took place last night, which we forecast. We also presented conditional probabilities, perhaps a bit more intuitive because while 1-1 is the most likely result, a draw is hardly ever the most likely result.
After the event, the natural question is: how did we do? The table below shows that our standard forecasts (loads of 1-1s) got 15 right results, and 8 scores (from 46 matches), while our conditional forecasts got 22 right results, and 5 scores.

	Results	Scores	Lawro score	Sky score	Made up score
Forecast	15	8	47	35	31
Conditional	22	5	42	34.5	32

So what is better? To get more scores, or get more results? This all depends on preferences. The scoring rule of Mark Lawrenson’s forecasts on BBC Sport is 40 points for a score, 10 for a result. The Sky Super Six scoring rule, which we might attribute to Paul Merson’s forecasts, is 5 for a score, 2 for a result, thus valuing a score less than the BBC does. By the Lawro score (scaled down to make it comparable to Sky), our unconditional forecasts got 47, our conditional ones 42. By the Sky score, the difference was one half: 35 to 34.5. If we value a score at only twice a result (rather than 2.5 times as Sky does, or 4 times, as Lawro does), then we get that our conditional forecasts were better, scoring 32 to 31.
Scoring rules matter, and may well matter for how players play games. Scoring rules probably also reflect, to some extent, our preferences and beliefs. It’s clearly much harder to get an exact score right, so why not reward that, like Lawro’s score does, by a lot more than just a result?

2 thoughts on “Evaluation and Scoring Rules”

2 thoughts on “Evaluation and Scoring Rules”

Leave a Reply Cancel reply

Recent Posts

Recent Comments

Archives

Categories

Meta