5 things to do with data in 2017

Let us not curse them with the name of New Year’s Resolutions. It’s a bit late for that anyway. But here are five simple and positive things you can do with your research data this year.

1. Be ‘as open as possible, as closed as necessary’

Make this your mantra. Say it once a day, when you brush your teeth in the morning. It is the governing principle of the European Commission’s Open Research Data Pilot, which from the start of this year has been extended from its pilot focus to cover all thematic areas of the Horizon 2020 programme. It is a good principle: let it inform how you think about the data you collect, and how you manage them. Consider what actions will enable you to share the data you collect as openly as possible, while honouring your ethical and legal obligations and any contractual restrictions. If you collect data from participants, ensure you obtain consent for data sharing, and use robust methods of anonymisation to make data safe for sharing. The UK Data Service offers excellent guidance on legal and ethical issues.

2. Have a data spring-clean

Clear out the cupboards: those USB sticks lying in your drawer, that external hard drive gathering dust on the shelf, your Dropbox folder, your personal drive on the University network, your project fileshare. Get rid of what you don’t need. If the data support published research and/or have long-term value, archive them in a data repository. If they are part of your working capital, make sure they are properly stored and backed-up. Use your institutional network as the primary storage for working data, as they will be automatically replicated to separate data centres, backed up on a daily basis, and recoverable in case of disaster.

While you’re sorting things out, why not also rationalise that monstrous proliferation of folders in your network drive? Organise your data so that you can navigate them and find what your want: arrange them by project, by experiment, by date, etc.; use folder and file names that make sense and help you manage versions, e.g. by including the date.

3. Plan for data management

When you prepare a new research project, one of the first documents you start should be your data management plan. Data are the foundation of your research – so don’t build on sand. Start with an outline plan with the bare essentials, and fill it in as your research proposal evolves and your collaborators contribute their input. Your plan should identify: what data will be collected; how data will be managed during the project; and how they will be preserved and shared after the end of the project. Use DMPonline or the Checklist for a Data Management Plan to help you write the plan and make sure you cover everything.

If you will be applying for funding, your funder may ask you to complete a data management plan as part of the application, so starting to develop one early in proposal development will improve the quality of the plan and make the application process easier.

4. Don’t rely on supplementary information – use a repository

When you submit your next paper, don’t submit supporting primary data as supplementary information files to be published alongside your article on the publisher’s website, but deposit the data into a suitable data repository and link to them from your article. Here are some reasons why:

  • What is provided as supplementary information is often not primary data, but selected derived data, in the form of graphs, charts, tables reporting mean values, etc. Without access to the full primary dataset, your results cannot be properly validated or replicated, and the data themselves have limited re-use value;
  • Supplementary data are often provided in PDF, one of the least user-friendly data formats ever invented: numerical and textual data cannot be manipulated within the file format, or easily extracted and imported into other formats (e.g. tabular formats for numerical data, or simple text formats) where they are amenable to manipulation and further analysis.
  • While many publishers allow access to supplementary information even where the articles themselves are concealed behind a paywall, this is not necessarily always the case, and even where the data are made freely accessible, publishers may require you to transfer copyright to them, and may not allow others to reproduce or redistribute the data. Most data repositories on the other hand will simply ask for a licence to manage data on the rights-holder’s behalf.

5. Make the data FAIR

When you do make your primary data available, make sure they meet the FAIR Data Principles:

  • Findable: a detailed metadata record is published and indexed online describing the data and including a unique persistent identifier assigned to the data.
  • Accessible: the data are retrievable and accessible, preferably openly, or with as few intermediate steps or restrictions as possible.
  • Interoperable: the data are made available and described using open and/or widely-used formats and metadata standards, enabling the greatest possible opportunities for integration and interoperation with other data and systems.
  • Re-usable: the data are well-described and documented, so that the conditions in which they were collected or generated can be clearly understood, and they are accompanied by a licence stating the terms of use.

This may seem a lot to achieve, but in fact this will mostly be done simply by archiving your data in a suitable data repository, which will as a matter of course ensure that a standards-compliant metadata record describing the data is created and published, including a Digital Object Identifier (DOI) or other unique persistent identifier, that the data themselves are stored in suitable formats for access and re-use, and with relevant documentation, and that the data are made accessible under an appropriate licence.

This entry was posted in Research Data Management, Uncategorized and tagged , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published.