RED has been producing picks for forthcoming football matches for over a year now. They’re interesting, and can sometimes be unexpected or even controversial.

But perhaps most importantly, up until the start of this season they have been a monstrous reduction in information. Monstrous in that it creates distortion upon distortion.

RED generates a probability, in principle, for every possible scoreline (even for the unlikeliest of scorelines, like the 5-5 draw last season between Nottingham Forest and Aston Villa). And RED isn’t bad, in general, at doing so.

In the plot below, we present the amount of times each forecast RED made for a scoreline turned out in reality during last season’s Premier League. So, for example, a 1-1 scoreline might have a probability of 10%, and a 4-4 scoreline might have a probability of 1%.

What we think instead is “how often would an event forecast at 5% occur”? If RED’s forecasts are good, then such events should occur 5% of the time. Similarly, if RED forecasts a scoreline to occur at 10%, then such forecast events should occur 10% of the time.

We can check this easily, but it requires a significant number of forecasts so that we don’t see distortions due to small numbers of forecasts. We’ve taken forecasts from 370 of last season’s Premier League matches by RED, and categorised them according to the probability. We’ve then looked at how often these forecast events occurred. If RED is pretty good, then if we plot one against the other (forecasts and outcomes), the points should fall along the 45-degree line (dotted line).

The vertical lines are the number of such forecasts that RED made. As RED made 81 scoreline forecasts per match, a lot are very close to zero (22,117), and so we provide this information too (log scale, on right hand side).

The dots are very close to the 45 degree lines for forecasts up to about 12% – the forecasts for which we have the most observations. This is indicative of a good forecast performance, and suggests there is value in RED being unleashed to tell the world this information.

Once we get above about 12%, performance does become more erratic. This can be attributed more to a lack of observations, though, rather than any intrinsic performance. There are fewer than a hundred observations in each bracket above 13%, and fewer than 10 when we get above 20%.

The more RED forecasts, the more such observations will be collected, and it may be that we get a better picture of forecast performance.

But the plot provided here is more than sufficient to give us confidence in unleashing the power of RED.