By: Alberto Carrassi
Numerical models of the climate are made of many mathematical equations that describe our knowledge of the physical laws governing the atmosphere, the ocean, the sea-ice etc. These equations are solved using computers that “see” the Earth system at discrete points only, for instance at the vortexes of a grid where the physical quantities are defined. The density of the grid defines the model resolution: the denser the grid the higher the resolution and, in principle, the better the match between the simulated and the real climate.
Resolution is inevitably finite and to a large extent constrained by computer power. As a consequence, our numerical climate models do not see what occurs in between grid points and offer only a partial description of the reality. This source of model error is called “subgrid” or “unresolved scale” model error. Reducing or correcting for this is a major endeavour of our scientific community, and a lot has been achieved in the past decades thanks to increased computational power and the improvement of our understanding of the subgrid processes and on their effects on the resolved scale.
Inspired by the astonishing success of artificial intelligence in so many different areas of science and social life, in our recent study (Brajard et al., 2021) we investigated whether artificial intelligence could also be used to improve current numerical climate models by estimating and correcting for the unresolved scale error. Artificial intelligence, and machine learning, in particular, extracts and emulates behavioural patterns from observed data. Being driven by data alone, the forecasts based on machine learning predict behaviour based on behaviour that has previously been observed. Therefore, the quality and completeness of the data used in the training is extremely important.
To overcome this limitation our approach relies on data assimilation, another key component of the nowadays operational weather or ocean prediction routine. Data assimilation is the process by which data are incorporated into models to get a more accurate description of reality. After many years of research and development, data assimilation now provides a range of methods that handle noisy and sparse data with great efficiency.
In our approach, we combine data assimilation and machine learning in the following way. First, we assimilate the raw (sparse and noisy) data into the physical model. This step outputs a sequence of pictures, like a “movie”, showing the climate over the given observed period, whose accuracy depends on the unresolved scale error in the model. The difference between this movie and the model contains information about the unresolved scale error that we wish to correct. In the machine learning step these differences are used to train a neural network to estimate the model error. At the end of the training, we have a neural network that has been optimised to produce an estimate of the model error given the model state as input. The final step consists of constructing a new, possibly more accurate, hybrid numerical model of the climate, made of the original physical model plus the data-driven model obtained using this method.
Figure 1: Shows the model prediction error as a function of time: the longer the time horizon (time length of the forecast) the larger the error. The dashed black line shows the original physical model. The solid lines refer to hybrid (physical plus data-driven) models based on a complete and perfect dataset (black) or on a different amount (p) of noisy observations. The hybrid models perform much better than the original model. *MTU – Model Time Unit
The data assimilation-machine learning approach has been tested in idealised models and observational scenarios with very encouraging results. A key advantage of the method is that it relies on data assimilation methods that are already routinely applied in weather and ocean prediction centres: we expect this type of approach to be widely implemented operationally in the future.