An open data snapshot

The State of Open Data, a selection of articles and analyses based on a survey of over 2,000 researchers, was published in October this year by Digital Science, owners of the popular figshare research outputs repository.

Key survey findings are summarised in this infographic:

Digital Science, The State of Open Data Infographic

Three findings stand out for me:

1. There is broad interest in and enthusiasm for use of open data practices on the part of researchers across disciplines, career stages and throughout the world, and many researchers (approximately 75% of those surveyed) have experience of sharing data and place value on the credit they receive for doing so.

It’s always good to hear this: the means of sharing data are many and various and it is not easy to get an overview of the totality of data sharing practices.

But I wonder if the picture is quite as rosy as the Digital Science headline suggests, for two reasons.

First, neither the report nor the underlying data explain how the survey sample was obtained or provide any evidence of how representative it is. The survey dataset does not contain a protocol or any explanation of sampling methodology. The survey report merely states: ‘Figshare has garnered many insights from its users in the past, from formal surveys and informal feedback […] Working with Springer Nature and Digital Science, we surveyed researchers […] over 2,000 researchers responded to the survey,spread across continents and disciplines, from all types of institution and researchers at different career stages’ (p. 12), which would suggest (although it is not clear), that researchers were selected from the companies’ contact lists. Given that figshare is a data sharing service and Springer Nature has a strong data policy for its journals, one might expect survey respondents to be more active in data sharing than the global average. One might compare the 76% of researchers in this survey who shared data with the 51% of Wellcome Trust and ESRC-funded researches recently surveyed who had made data available to the research community by one means or another (see survey report, p. 27).

Secondly, the report does not define data sharing with sufficient precision. The survey asked respondents how often they made data ‘freely available’. It’s not clear how ‘freely available’ was defined in the survey, if at all. A survey question about tools researchers used to share data reported responses in the following categories: Email; Google Drive; Dropbox; Figshare; GitHub; Other. This seems a curious list to me, as it identifies tools that I would associate primarily with restricted sharing (e.g. within a project team or among selected peers), such as email and cloud file-sharing services, and does not specify the key categories of open data sharing vehicles: data repositories and journal platforms (which may publish data as supplementary information alongside articles). Only one data repository is identified: figshare, which is owned by… Digital Science, the publisher of this report. Presumably, all the other data repositories in the world are subsumed under the Other category. I would have liked to see in the report a clearer definition of what survey respondents were given to understand ‘freely available’ meant, and whether their responses did fully justify the claim that ‘approximately three quarters of researchers have made research data openly available at some point’ (my emphasis). I would again make a comparison with the Wellcome Trust report (see above), which arrived at its 51% figure by asking researchers if they had made data ‘available to the research community‘ (my emphasis) and specifically excluded informal sharing or sharing on request – since if you have to ask for the data it clearly isn’t open or ‘freely available’.

I believe the rate of open data sharing is in fact considerably lower than the Digital Science report suggests.

2. Where open data practices are adopted, there are likely to be positive correlations with with overall research quality as well as with good practice in management and documentation of data.

This is something I find interesting, as it indicates that basics of good research practice, such as internal discussion and challenge of assumptions and methods, documentation of methods and values, and rigorous quality control in collection and processing, can be reinforced where it is known that data will be made publicly available and open to the same level of scrutiny as peer-reviewed papers, ultimately resulting in findings that are higher in quality, contain fewer errors, and prove more reliable in the long term. It is an effect that has been reported in the literature, and which I think merits greater emphasis as we seek to persuade our researchers of the benefit to them of being open with their data. For a light on this issue, see Wicherts JM, Bakker M, Molenaar D (2011) Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results. PLoS ONE 6(11): e26828. http://dx.doi.org/10.1371/journal.pone.0026828.

3. Researchers can be uncertain of the benefits of sharing data, may be unsure how to manage their data effectively or obtain resources for open data practices, and would welcome more support in these areas from their funders and institutions.

This is definitely the case in my experience: it can be hard to persuade researchers of the benefits of sharing their data, where there is rarely a direct correlation between effort invested and return to the researcher, in terms of recognition and reward. It has been said many times, but in spite of funders’ and institutional and publishers’ data sharing policies, the systemic incentives to share are weak: this is why out of nearly 191,000 research outputs submitted to the 2014 REF, only 68 outputs – that is 0.04% of the total – were Research datasets and databases (see this presentation from Ben Jonson of HEFCE and the REF Research outputs submissions data).

Professionals such as myself providing institutional services supporting research data management need to persuade researchers of the benefits to themselves, to scholarship and to society of sharing their research data; and we need to deliver services that meet their needs in intelligent and efficient ways.

But funders, policy-makers and research organisations also need to restructure the incentive frameworks that define how researchers progress in their careers, and receive recognition and reward for the communication of research. We need a much broader focus in the academic reward systems beyond the published peer-reviewed paper reporting positive, novel and exciting results, both to the papers reporting the less headline-grabbing outcomes (the negative, the null and the apparently nugatory), and to other kinds of output, including the datasets that can serve to validate research results or establish a foundation for future research.

This entry was posted in Research Data Management, Uncategorized and tagged , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *