An overview of a dataset digitized by citizen science volunteers – the 1900-1910 Daily Weather Reports

By: Philip Craig

Two years ago the citizen science project Weather Rescue was used to digitize hand-written weather observations from the Met Office’s Daily Weather Reports from the years 1900-1910 (Figure 1). These were, as the name suggests, daily documents that published weather observations from various locations around Great Britain and Ireland, plus some countries in western Europe (Figure 2). This was the second phase of Weather Rescue and was based on a very successful effort to digitize hourly observations at the Ben Nevis observatory and two stations in Fort William.

The Daily Weather Reports were digitized by 2148 volunteers between December 2017 and July 2018 with five volunteers asked to transcribe each observation of pressure, temperature and rainfall. If the volunteers entered the same value it would be stored in a spreadsheet for the appropriate day, but if enough volunteers disagreed on a value it would be flagged as an error in the spreadsheet and subjected to quality control.

Figure 1: the top half of page 1 of the Daily Weather Report from Wednesday 1st July 1903.

This is where I came in. For six months beginning in July 2018 I conducted the quality control on the entire dataset of observations recovered from the Daily Weather Reports. That was 4017 spreadsheets with a growing number of stations each year. To quality control the dataset, I compared every flagged error to the entry in the original documents (available online from the Met Office’s National Meteorological Library and Archive). Any values that were illegible I deleted from the spreadsheet, but I had confidence in some of the values so replaced the error in the spreadsheet with the value from the original document. Using multiple volunteers for each observation helped to avoid transcription errors such as confusing a 3 for an 8 or typing the wrong number, which are easy mistakes to make but this method removes the obvious errors by volunteers.

It’s fair to say that processing 4017 daily spreadsheets for six months was a pretty tedious task.  I mostly identified the errors by eye but also used a simple Python script to show any errors I had missed. Most spreadsheets only had a small number of errors, but some spreadsheets required substantially more work. For example, there were some spreadsheets with lots of errors that may have been caused by some misaligned images from the scanned documents. Although this was a tedious task it was generally very straightforward, and since I’d just spent months writing my PhD thesis it was a nice change! I also learned to understand the old Imperial units for pressure and temperature for the first time after having only ever used metric units!

The new data recovered from the Daily Weather Reports has filled some gaps and corrected errors in the existing observational records. For example, in the International Surface Pressure Databank version 4.7 (ISPDv4.7) there are no stations in England and Wales for 1900-1910, with four in Scotland, three in Ireland and one in the Channel Islands. Weather Rescue has provided new pressure observations from 28 stations in Great Britain, Ireland and the Channel Islands (Figure 2) – data from Stornoway, Aberdeen and Valentia are already in ISPDv4.7. The new pressure data will help to constrain the ensemble of the Twentieth Century Reanalysis (20CR). The lack of pressure observations means that there is often large uncertainty of the atmospheric circulation across the 80 realizations in 20CR version 3 (20CRv3), particularly for high impact weather events such as cyclones!

Figure 2: map of stations in the 1900-1910 Daily Weather Reports. The country boundaries are the modern day borders.

The full observations dataset is available from the Centre for Environmental Data Analysis. It contains 1,832,926 observations of pressure, temperature and rainfall from 72 stations in Great Britain, Ireland and Western Europe (Figure 2). The data are stored in daily spreadsheets and in the Station Exchange Format (SEF), which is to be the international standard for exchanging historical weather data. In the daily spreadsheets, the data are stored in their original Imperial units: pressure in inches of mercury (in Hg), temperature in degrees Fahrenheit (°F) and rainfall in inches (in). These were converted into SI units for the SEF files: pressure in hectopascals (hPa), temperature in degrees Celsius (°C) and rainfall in millimetres (mm).

Please also keep an eye out for my paper coming up in Geoscience Data Journal that describes the dataset and quality control process in more detail. I also compare the recovered observations to 20CRv3 and the Met Office’s gridded precipitation dataset.


