On 31 January 2020, Wellcome published a statement calling on researchers, journals and funders to 'share interim and final research data relating to the outbreak… as rapidly and widely as possible'.
This statement has now been signed by more than 150 organisations including publishers, scientific institutions and preprint repositories.
Signing a statement is one thing, acting on it something else. Has the research community done enough to share their data openly and transparently? And will these commitments lead to a collaborative and transparent research culture?
Things started well with the open sharing of the first SARS-CoV2 genome(opens in a new tab) sequence on 10 January 2020 by Professor Edward C. Holmes, University of Sydney on behalf of the consortium led by Professor Yong-Zhen Zhang, Fudan University, Shanghai.
The sharing of the SARS-CoV-2 sequence data was crucial – it helped to inform how public health officials should respond and gave researchers a starting point to develop the tools needed to tackle the virus.
Professor Holmes shared the following comment when I asked him for his reflections on data sharing in the past year:
'In terms of basic science, the data sharing has been hugely positive! Amazingly so. People were delighted and almost saw [it] as the start of the fundamental science for Covid-19. It was an important moment because it directly led to the first PCR test for the virus and helped fuel vaccine development.'
A quick search(opens in a new tab) of Europe PMC - an open science platform that enables access to a worldwide collection of life science publications and preprints from trusted sources - shows that more than 1,100 articles have cited this genome sequence in their publications.
'I think Covid-19 has totally changed the landscape for the better. This is reflected in the use of preprints.' – Professor Holmes
During the pandemic we have seen an increase in the use of preprints. A preprint is a version of a scientific paper that precedes formal peer review. Publishing data in a preprint means that results are being shared faster.
Data from Dimensions(opens in a new tab) shows that 13% (46,006/349,828) of Covid-19 related publications in 2020 were preprints. This indicates a positive change, with research findings about the virus becoming available and accessible at speed. But what about the data behind the results?
Sadly there is little change in the sharing of the data underlying publications. Only 9% of Covid-19-related research articles(opens in a new tab) in Europe PMC have any data availability statement – which tells the reader where and under what conditions the data can be accessed – compared to 22% of all research articles published in 2020.
Even when authors do include a data availability statement, the criteria for accessing the data is often ambiguous. For example, a recent study of Covid-19 related preprints(opens in a new tab) submitted to MedRxiv(opens in a new tab) (the preprint server for Health Sciences) reported 8% of data availability statements mention ‘reasonableness’ as one of the criteria for granting access.
This behaviour is mirrored in data related to Covid-19 clinical trials. A recent study(opens in a new tab) found that data is only being shared in a minority of cases (15.7%), with nearly half (47.8%) of the trial registry entries explicitly saying they are ‘not willing to share data’.
These examples highlight that there is still significant work to be done to shift to make all research more open.
We know that a huge amount of investment in Covid-19 research has been made over the past 14 months. According to the UK Collaborative on Development Research Covid-19 tracker(opens in a new tab) over 8,500 studies across 205 countries have been funded, mobilised or repurposed, while the World Health Organization ICRTP Registry(opens in a new tab) lists over 8,000 Covid-19 clinical trials. This all means an increasing amount of valuable data is being generated and collected.
We also know that the pandemic has encouraged or required researchers to turn to data analysis while they’ve been working from home. The latest ‘State of Open Data’ report(opens in a new tab) which surveyed over 3,400 researchers, shows that more researchers than in previous years are likely re-use their own (64%) or others’ data (50%), and more than a third expect to see an increase in the number of collaborations as a result of the pandemic.
Enormous efforts have gone into building or expanding repositories and platforms to store and share data, including collaborative work to develop agreed(opens in a new tab) data sharing principles and standards(opens in a new tab).
A scoping exercise performed by the International Covid-19 Data Alliance(opens in a new tab) found nearly 100 data repositories, platforms, databases and libraries of data relevant to Covid-19, and there are still new ones being developed.
However, with this explosion of data and the rapid development of data storage platforms comes risks. Making data available by storing it in a repository may not be enough. Data must be accessible, and systems need to be integrated. As Dr Tim Smith, Head of Collaboration at CERN(opens in a new tab), stated at a recent data sharing conference: 'Open is not enough, you ;have to prepare your data to make it understandable and useable to others'.
While we celebrate the immense progress that’s been made, we must be mindful of previous efforts to improve research culture, particularly in the context of public health emergencies, that did not result in long-term change(opens in a new tab). We hope that the scale and reach of the Covid-19 pandemic will realise sustained change in the research culture, with openness and collaboration firmly embedded.
To help better understand what aspects of data sharing has worked well and where lessons can be learned, the Open Research team, in partnership with other research funders, will commission an evaluation on the impact of the Covid-19 January 2020 statement. We anticipate that the study will be completed by December 2021.