# Dr Ross Bannister

#### Senior Research Scientist (NCEO)

**We need to talk about assimilation …**

Modelling the real world is never a perfect process. Errors and uncertainties in all models accumulate in time, consequently placing limits on the value of any forecast, whether it represents a prediction of the weather, or of flooding, etc. It is possible though to achieve good quality forecasts by correcting the model as it evolves with information from fresh observations, where and when they are available. In the case of weather forecasting, such observations constrain the model to “today’s weather”, so that “tomorrow’s weather” can be forecast as accurately as possible.

Merging observations with models is called data assimilation (or DA for short). For weather forecasting, DA helps to determine initial conditions of a numerical weather prediction model, whose output is used by weather forecasters, or used as input into other models that predict flooding depending upon the expected rainfall. The practical application of DA takes a forecast field valid for today and computes a sort of correction field based on the observations. The modified field, is generally a more realistic model state. The correction fields are determined from an algorithm, which uses not only today’s forecast and observations, but also information about how likely these pieces of information are correct (remember the model forecast is erroneous, and even observations are never exact measures of reality). This algorithm is based as closely as possible on a set of equations called the Kalman Filter equations, which were originally developed for engineering applications in the 1950s and 60s. The Kalman Filter equations have a wide range of uses, e.g. the control of trajectories of bodies over large distances (think of space flight and warfare), but have since been adapted for weather forecasting. The weather forecasting problem though has vastly more degrees of freedom than the basic Kalman Filter can cope with so the full Kalman Filter equations are not used; instead approximate equations are used which are solved using either variational procedures, or using a vastly reduced number of variables (e.g. the ensemble Kalman Filter).

**Today’s forecast is our very best estimate of today’s weather … probably not**

Standard DA methods assume that errors in forecasts and observations obey a normal distribution (otherwise known as a Gaussian distribution). Consider a forecast of today’s relative humidity (RH) above, say London, of about 98% RH (i.e. close to saturation), and let the error (standard deviation) of this forecast be of order 2.5% RH. The Gaussian distribution with this standard deviation is shown in Fig. 1(a). One can interpret this plot by imagining that one has access to a very large number of forecasts, and the distribution is a means of showing the frequency of possible forecast outcomes. Knowing that about 2/3 of the area of a Gaussian distribution is within one standard deviation of the mean, then 2/3 of the forecasts would have values of 98 ± 2.5 % RH. In the absence of multiple forecasts, we assume that the single forecast represents the most likely point of this distribution (the mean, or centre point), and observations can make adjustments to this forecast (examples are the arrows in Fig. 1(a)). The assumed Gaussian specifies how the relative humidity forecast is allowed to be modified by the observations. If an observation suggests that the forecast should really be drier by about 6% (purple arrow), then the distribution would inhibit this change as it has such a small probability. An equally large positive change also has the same degree of unlikeliness (red arrow). Smaller changes though – of either sign – are deemed far more likely (blue and green arrows), and so such modifications would be more likely to happen. Specification of this distribution is how DA can control how observations can update the forecast.

Figure 1: Possible probability distributions of errors in a forecast of relative humidity when the forecast value is 98% RH. Panel (a) has an assumed Gaussian form of this distribution, and panel b has a non-Gaussian form. The arrows in panel a serve to illustrate likely (blue and green), and unlikely (purple and red) changes to the forecast value as a result of assimilating observations. The Gaussian has identical probabilities for positive and negative updates, but the particular non-Gaussian shown has larger probabilities of updating negatively than positively.

Actual distributions are rarely Gaussian shaped. Shown in Fig. 1(b) is a distribution whose shape is far more realistic for a forecast close to saturation (surprisingly this distribution has the same standard deviation as the Gaussian in Fig. 1(a)). This non-Gaussian distribution is asymmetric – it says that there is a higher probability of an observation lowering the relative humidity (negative correction) than raising it. This makes sense given the forecast is at 98% RH – it is more more likely that the true atmosphere has 95% RH (3% RH lower than the forecast), than super-saturated at 101% RH (3% RH higher). There is a similar picture for forecasts of very dry air (e.g. 2% RH), where qualitatively the distribution would be the mirror image of that in Fig. 1(b). These represent examples where the distributions are not only non-Gaussian, but are also strongly flow-dependent, which makes them difficult to specify in operational situations.

In reality the picture is even more complicated than this, as the distribution needs to account for relationships between different variables (e.g. between different positions, and between different kinds of fields like temperature, and winds). There is also the issue of corrections being made that involve moisture phase changes (between ice, water, and vapour), between which there are also potentially strongly flow-dependent relationships. Also generally in weather forecasting problems, most observations made are not of the model’s own variables, but of something related to them, which represents another complication in the DA problem as a whole.

**Why is this important?**

The aim of doing data assimilation (DA) is to set realistic initial conditions for models that will deliver more accurate forecasts of the future weather than if the model was not updated by the latest observations. Such forecasts include accurate predictions of rainfall for use in models that predict flooding. Some off-line results that we did in the FRANC project suggest that if non-Gaussianity of the forecast distribution is not well accounted for, it is possible for DA to actually give worse initial conditions – worse by the fact that the initial conditions can be unphysical (for instance represent negative, or supersaturated humidities).

A pragmatic solution might be to simply set negative humidities to zero, and supersaturated humidities to 100% RH after the assimilation step, and before the model is run. This `cure’ though could have side-effects, since such adjustments do not necessarily obey the subtle balances that might be at play, e.g., between humidity and temperature. As always prevention is better than cure, and in this case, we believe that prevention may well be to do with improving the assimilation.