By Jon Blower
It goes without saying these days that we live in an immensely interconnected world. Scientists are nowadays typically continuously connected to the Web, accessing and publishing publications, educational material, opinions and, of course, data. But all this information is not, in fact, very well connected. It’s hard to find the exact data that were used in a study (particularly one that’s more than a couple of years old), or to find everything that is known about the strengths and limitations of a certain observing system. A lot of this information is out there somewhere, but it is published by lots of different organizations who don’t always know about each other, and so those vital links don’t get created.
To help scientists (and in fact all users of the Web), we need to impose a little more structure on all this information, and this is where the idea of “Linked Data” comes in. “Linked Data” encapsulates a set of principles that everyone can apply to make the content on the Web a bit less chaotic. The goal is to express information so precisely that even computers can understand it (at least to some degree), opening the door for more intelligent web sites, web browsers and search engines.
This blog isn’t the right place to go into technical details but, in essence, Linked Data means that we need to:
- Create persistent, global and unique identifiers for the “things” that we care about in our community (such as researchers, Universities, projects, datasets, papers and even concepts) – see Figure 1 below.
- Allow users to look up these identifiers on the Web to find out more about these things.
- If a computer looks up an identifier, provide information in a machine-readable form: without going into details, the preferred format is RDF, which is a kind of “lingua franca” for data.
- Embed links to other related things in this description using appropriate identifiers.
- Include information on why things are linked. This is a key feature of the RDF format. So we can say that Paper A describes a dataset whereas Paper B uses it.
Figure 1. A (simplified) representation of some “things” that are important in the Earth Observation community, showing how they are linked.
Of course, there are many tough challenges to make this all work properly. It’s hard to get global agreements on identifiers and their definitions. Publishing Linked Data is not easy at the moment; but remember that 20 years ago, publishing basic web pages was also difficult for most non-specialists.
But this is a rapidly maturing area. Here at the Department of Meteorology we are involved in a number of projects to investigate how Linked Data can improve scientific research. These include:
- CHARMe, in which we developed a Linked Data system to enable users to publish “commentary” about climate data – thereby sharing vital information about how data have been used in the community.
- MELODIES, in which we are developing eight new applications of environmental Open Data, using Linked Data to join up diverse sources of data.
The Web is truly for everyone, and now it’s time to make it work better for science!
Further reading