A bunch of “other” matches took place last night, which we forecast. We also presented conditional probabilities, perhaps a bit more intuitive because while 1-1 is the most likely result, a draw is hardly ever the most likely result.
After the event, the natural question is: how did we do? The table below shows that our standard forecasts (loads of 1-1s) got 15 right results, and 8 scores (from 46 matches), while our conditional forecasts got 22 right results, and 5 scores.

Results Scores Lawro score Sky score Made up score
Forecast 15 8 47 35 31
Conditional 22 5 42 34.5 32

So what is better? To get more scores, or get more results? This all depends on preferences. The scoring rule of Mark Lawrenson’s forecasts on BBC Sport is 40 points for a score, 10 for a result. The Sky Super Six scoring rule, which we might attribute to Paul Merson’s forecasts, is 5 for a score, 2 for a result, thus valuing a score less than the BBC does. By the Lawro score (scaled down to make it comparable to Sky), our unconditional forecasts got 47, our conditional ones 42. By the Sky score, the difference was one half: 35 to 34.5. If we value a score at only twice a result (rather than 2.5 times as Sky does, or 4 times, as Lawro does), then we get that our conditional forecasts were better, scoring 32 to 31.
Scoring rules matter, and may well matter for how players play games. Scoring rules probably also reflect, to some extent, our preferences and beliefs. It’s clearly much harder to get an exact score right, so why not reward that, like Lawro’s score does, by a lot more than just a result?