How do we actually run very high resolution climate simulations?

By: Annette Osprey

High resolution modelling

Running very detailed and fine scale (“high resolution”) simulations of the Earth’s atmosphere is vital for understanding changes to the Earth’s climate, particularly extreme events and high-impact weather [1]. However, each simulation is 1) time-consuming to set up – scientists spend a lot of time designing the experiments and perfecting the underlying science, and 2) expensive to run – it may take many months to complete a multi-decade simulation on thousands of CPUs. But the data from each simulation may be used many times for many different purposes.

Under the hood

There is a lot of technical work that is done “under the hood” to make sure the simulations run as seamlessly and efficiently as possible and the results safely moved to a data archive where they can be made available to others. This is the work that we do in NCAS-CMS (the National Centre for Atmospheric Science’s Computational Modelling Services group), alongside our colleagues at CEDA (the Centre for Environmental Data Analysis) and the UK Met Office. My role is to work with the HRCM (High Resolution Climate Modelling) team, helping scientists to set up and manage these very large-scale simulations.

CMS is responsible for making sure the simulation code, the Met Office Unified Model (UM), runs on the national supercomputer, Archer2, for academic researchers around the UK. As well as building, testing and debugging different versions of the code, we need to install the supporting software that is required to actually run the UM (we call this the “software infrastructure”). This includes code libraries, experiment and workflow management tools [2], and software for processing input and output data. This is all specialist code that we need to configure for our particular systems and the needs of our users, and sometimes we need to supplement this with our own code.

Robust workflows

We call the end-to-end process of running a simulation the “workflow”. This involves 1) setting up the experiment (selecting the code version, scientific settings, and input data), 2) running the simulation on the supercomputer, 3) processing the output data, 4) then archiving the data to the national data centre Jasmin, where we can look at the results and share with other scientists. When running very high resolution and/or long-running simulations we need this process to be as seamless as possible. We don’t want to have to keep manually restarting the experiment or troubleshooting technical issues.

Furthermore, the volume of data that is generated from these high resolution simulations is incredibly large. It is too large to store all the data on the supercomputer, and it can sometimes take as long as the simulation to move the data to the archive. The solution therefore, is to process and archive the data as the simulation is running. We build this into the workflow so that it can be done automatically, and we have as many of the tasks running at the same time as possible (this is known as “concurrency”).

The HRCM workflow

 

 

 

 

 

 

 

Figure 1: An example workflow for a UM simulation with data archiving to Jasmin, showing several tasks running concurrently.

The image shows the workflow we have set up for our latest high resolution simulations. We split the simulation into chunks, running 1 month at a time. Once one month has completed, we set the next month running and begin processing the data we just produced. The workflow design means that the processing can be done at the same time as the next simulation month is running. First we perform any transformations on the data, then we begin copying it to Jasmin. We generate unique hashes (checksums) that we use to verify the data copy is identical to the original, so that we can safely delete it, clearing space for forthcoming data. Then we upload the data to the Jasmin long term tape archive, and we may put some files in a workspace where scientists can review the progress of the simulation.

Helping climate scientists get on with science

The advances that we make for the high resolution simulations are made available to our other users, whatever the size of the run. Ideally the workflow design means that the only user involvement is to start the run going. In reality, of course, sometimes the machine goes down, connections are lost, the model crashes, (or the experiment wasn’t set up correctly!) Thus, we have built a level of resilience into our workflow that means that we can deal with failures effectively. So, scientists can focus on setting up the experiment and analysing the results, without worrying too much about how the simulation runs.

References

[1] Roberts, M. J., et al. (2018). “The Benefits of Global High Resolution for Climate Simulation: Process Understanding and the Enabling of Stakeholder Decisions at the Regional Scale” in Bulletin of the American Meteorological Society, 99(11), 2341-2359, doi: https://doi.org/10.1175/BAMS-D-15-00320.1

[2] H. Oliver et al. (2019). “Workflow Automation for Cycling Systems,” in Computing in Science & Engineering, 21(4), 7-21, doi: https://doi.org/10.1109/MCSE.2019.2906593.

This entry was posted in Climate. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *