By: Tom Frame

Here is a question that you may think has a simple answer – but surveys have often indicated people misinterpret it. So why is this question difficult to answer? This blog entry is about why the probability of rainfall is sometimes misunderstood. First however some context: in recent decades weather forecasts have moved from simply giving a definite statement of what will happen (‘Tomorrow noon it will rain”) to giving probabilistic statements (“Tomorrow noon there is a 50% chance of rain”). This is particularly true of many mobile phone apps which issue forecasts based on your location and show information about the amount of rainfall (e.g. a dark cloud with raindrops, a word such as heavy or light, or a numeric amount in mm) along with a probability value, usually expressed as a percentage.

So what does this probability actually mean?

To start, before considering rainfall, let’s consider a much simpler and familiar problem. Think of rolling a standard six sided unbiased die. What is the probability of rolling a six? Simple – there are six sides each with equal probability of occurring, therefore the probability is 1 in 6. Within this there are some hidden assumptions – for example it is unspoken, but assumed, that the die will always come to rest on one of its faces (not on a corner or edge), and that if it doesn’t, the roll is deemed invalid and it must be rolled again. This constraint guarantees that the result is always defined to be 1, 2, 3, 4, 5, or 6 and more importantly everyone understands what it means to “roll a die” and what the event “roll a 6” is. The same is true for example of gambling on sporting events – at a bookmakers you are given odds on the outcome of the game, the game has a set of rules and a referee to oversee the implementation of the rules so that the final score is defined exactly and everyone involved will know that it is 3-nil – even if they disagree with the referee’s decisions. The bookmakers will have some stated procedure to deal with other eventualities – e.g. cancellation of the match. Either way the event (role a die or a 3-nil victory) is well defined, so it can be ascribed a probability and the result can be observed a verified.

Now let us consider the case of a probability of rainfall. In order for the probability of the event to be calculated, first it is necessary to define what the event is. For weather apps, the probability shown is typically the Probability of Precipitation (PoP) rather than probability of rainfall. For the end user this is the probability of any form of precipitation (rain, sleet, snow, hail, drizzle) occurring at their location within a specified time interval (e.g. within a particular hour long interval). These probabilities are not static so if you look at the Apps forecast for noon tomorrow at 6am and then look again at 6pm you might well see that the probability value has changed. These changes are associated with new information being available to the forecast provider. A simple (and topical!) analogy would be to imagine if this time last year you had been asked to estimate the probability of the whole of UK being locked down in May this year. Chances are you would have given a value close to zero, whereas if you had been asked the same question in February this year you would probably give a higher probability. The new information you had available to you about COVID-19 lead you to revise your estimate. This is the essence of what a probabilistic forecast is – an estimate of the probability of an event occurring given the information available at the time it was issued.

So what exactly is the event that is being predicted by PoP? To understand the definition of the event, the simplest way is to imagine what you could do to determine whether or not the event occurs. To do this you would simply need to stand in the same place for the designated time window (e.g. if it is a forecast of hourly precipitation, stand there for the designated hour). If there is some precipitation then the event occurred, if there is not, then the event didn’t occur. If you do this many times you could then assess whether the probability forecasts were “correct” (meteorologists call this verification) – for example, if you stand in the same location every time the PoP forecast is 10%, then 1 in 10 times you should experience precipitation (meteorologists call this property reliability).

In practice, forecasting centres define much more specific quantitative definitions of PoP, because in order verify and improve their PoP forecasts by “post-processing” raw forecast data they need to be able to routinely observe the precipitation and recalibrate their forecasts to make them reliable. For example, PoP is usually defined as precipitation exceeding some minimal value which is greater than zero related to the smallest amount of precipitation observable by rain-gauges (typically around 0.2 mm), although other observations such as rainfall radar may be used too. There may also be some spatial aggregation involved so that strictly speaking probabilities are not calculated for specific geographic locations but for larger areas with some assumptions about local homogeneity. The details of such calculations change as methodologies improve and may not be explicitly stated in publically available forecast guidance – but the guidance *will* (or at least should) state how the PoP forecast should be interpreted by the end user, so it is well worth reading through the guidance associated with any app you use.

So why the confusion? In surveys both long past (Murphy et al., 1980) and more recent (Fleischhut et al., 2020) the confusion seems to occur from end users not knowingthe definition of the event to which the probability is being assigned rather than misunderstanding the nature of probability itself. One interesting result is that, when surveyed, people often erroneously interpret PoP to refer to the fraction of the area covered with rain rather than the probability of precipitation at a specific location. While not the correct interpretation, there are cases where the PoP may be closely related to the area of rainfall covered or is at least assumed so for practical reasons. For example, people often model rainfall statistically, particularly showers and convective cells, as Poisson point processes – essentially a stochastic process in which there is a fixed probability of shower appearing at any location within a fairly large area and time. In such a system the PoP forecast would be approximately equivalent to the fraction of the area covered by rainfall. Similarly in the calculation of rainfall probabilities using “neighbourhood processing” (Theis et al. 2005) the probability of rainfall at a point is estimated from the fraction of the surrounding area covered by rainfall in the forecast – making an explicit link to between the two.

Speaking recently with people I know who are not meteorologists, but regularly use weather Apps I realised that they associated the PoP value with the intensity of rainfall: higher PoP meaning more intense rainfall. This is of course not the correct interpretation of PoP and in part these conversations motivated the subject of this blog. Thinking it over I suspect I know the reason for their misinterpretation. Firstly, of course, they had not read the guidance for the app they were using so were simply unaware of what the percentage values they see on the app actually refer to. But how did they come to associate them with rainfall intensity? My hypothesis here (which is untested) is that there is a tendency for forecasts of heavier rainfall, particularly associated with fronts in autumn and winter, to be associated with higher PoP than weaker “showery” rain – simply because showers are inherently more uncertain than coherent features such as fronts can be forecast with more confidence. Therefore, as they look at the app they see PoP increase and decrease in line with the intensity of rainfall forecast and began to use it as a “pseudo-intensity” forecast.