As discussed in last week’s blog by Javier Amezcua, uncertainties are paramount in weather and climate forecasting. In fact, they are paramount in every branch of science. In weather and climate forecasting these uncertainties are significant in observations, in the model state, but also in the model evolution equations.

We have two sources of information to generate the best forecasts, the observations and the models. How do we combine observations and models so that we get the best forecast? And, of course, not only a best forecast, we do want to know how accurate the forecast is, so how big the uncertainty is. Obviously, we have to take the uncertainties in observations and models into account in this merging of information. The main question of this blog is how to take all these uncertainties properly into account.

The answer to this question has been known for centuries and is called Bayes Theorem. To understand that theorem we first need to discuss how we will represent the uncertainties mathematically. We do that via probability density functions. That sounds complicated, but it is not.

**Figure 1** – an example of a probability density function for let’s say temperature in Reading, UK.

In figure one we show an example of a probability density function for temperature in Reading, UK. It shows how likely each temperature value is. The higher the probability density, the more likely, or probable, that temperature value is. Figure 1 shows that the most likely temperature value in Reading is 15°C, but there is quite a large range of other possible values. So 14°C is also possible, but 4°C is highly unlikely.

Let us assume that Figure 1 came from a model forecast, taking all model uncertainties into account. Suppose now that we have a thermometer at the University of Reading (and we do, as you might know). That thermometer gives a reading of 17°C. However, we should not forget that there are uncertainties in this measurement. These uncertainties give rise to a probability density function similar to Figure 1, but now for the observations: there is a most likely value (the actual reading), but there is also uncertainty.

Now let’s get back to Bayes Theorem, so to how we should combine the probability density of the model and that of the observation to find the best estimate of what the atmosphere is doing, including a probability density for its uncertainty. The answer can be written down compactly in a very simple equation:

P_{best}(T) = A p_{model}(T) * p_{observation}(T)

in which p_{model}(T) denotes the probability density of the model estimate of the temperature, and similar for the other subscripts, and A is a constant. So the answer is really simple, we just have to multiply the probability density of the model with that of the observation, for each possible value of the temperature. Figure 2 illustrates the process that is called **data assimilation**.

**Figure 2.** The data assimilation process

The real problem for weather and climate forecasting is that these probability densities are difficult beasts to handle. Think about the atmospheric model, which nowadays contains about a billion variables. We need the combined probability density of all these variables. This means that we would have to store in the computer about 10^{billion} numbers (this is a 1 with a billion zeroes …) To put this number in perspective, the estimated number of atoms in the universe is 10^{80} (so a 1 with 80 zeros). Clearly we will never ever be able to store all these numbers. This is a real ‘Big Data’ problem. This means that meteorologists have to make approximations to Bayes Theorem, and those can be done in numerous ways. We don’t know what the best affordable approximation is, perhaps that will depend on the application, but there is definitely room for improvements to the present daily practise. This is one of the research areas we are working.