Discovering data – new models proposed for effective data sharing in epidemiology

The challenge of making datasets 'discoverable' is one of the most fundamental barriers to effective data sharing. New research published today examines how research funders could make it easier for researchers to identify, access and use public health and epidemiological data to accelerate improvements in public health.

The research was commissioned by the Wellcome Trust on behalf of the Public Health Research Data Forum. The Forum brings together major international funders of global health research committed to increasing the availability of health research data in ways that are equitable, ethical and efficient.

The research was conducted by a multidisciplinary team with expertise spanning epidemiological research, data management and data publication. It brought together researchers from the University of Cambridge, the Farr Institute at University College London, the London School of Hygiene and Tropical Medicine, the UK Data Service (University of Essex), the Open Data Foundation and Ubiquity Press. The research incorporates a review of existing data resources, an online survey to which over 250 responses were received, in-depth researcher interviews, and an analysis of existing models for enhancing discoverability of data.

The key finding was that the public health and epidemiology research field could improve data discoverability by drawing on the experiences and best practice adopted in other related research fields. The team identified three potential models that could be employed:

  1. A centralised portal model – creating a searchable 'catalogue' of datasets, well-documented to the variable level.
  2. A data journal model – utilising the growing number of open-access journals that focus on publishing descriptions of high-value datasets with links to the resources.
  3. A linked data model – adopting an emerging decentralised approach that links and searches data and documentation published across the web.

The report suggests that, while the optimum solution may combine all three of these approaches, the centralised portal model is the most strongly favoured by the research community, and funders should consider it as an initial priority. All three models depend on the wider adoption of appropriate standards for data and its documentation across the field, and a key priority for funders is therefore to encourage the establishment of best practice.

The report is intended to stimulate debate within the research community around the challenge of data discoverability. To begin this process, the Public Health Research Data Forum partners will hold a workshop at the London School of Hygiene and Tropical Medicine on 30 July for researchers, funders and other stakeholders to discuss and debate the findings. A further major output of the project will be the establishment by Ubiquity Press of a special collection of data papers.

Nicola Perrin, Head of Policy at the Wellcome Trust, said: "As funders we are committed to maximising the value of the vast and rich data generated by the research we support. Making data discoverable is a key first step in enabling wider access and re-use of data, and one we are committed to working with our partners on the Public Health Research Data Forum to address."