Data for science and health: trustworthy data science

Data science is central to the future of health and the scientific endeavour. 

Why it's important 

Data science is transforming how science solves urgent health challenges but progress in this field is inconsistent and is hindered by three systematic blockers:

  • there is a lack of trust in how health data and technologies are built, used and governed
  • there is little funding for the foundational tools needed for health data science to thrive globally
  • there are few opportunities to employ the talents of data scientists and research software engineers to solve health challenges.

What is health data?

Health data is information about physical and mental wellbeing, and the biological, environmental or socioeconomic factors that contribute to it. The data can be about an individual or a population. It is often collected as part of routine clinical care or research studies. Other sources of health-related data include climate monitoring data, location data or data from phone apps.

What we want to achieve 

We have two goals:

  1. Put trust into practice by changing how data and software in health are funded, developed and governed.
  2. Equip and motivate data scientists with the tools and opportunities to innovate with health data in the public interest.

Our programme is global, and we want to ensure that people in low- and middle-income countries benefit from innovation with health data, as well as those in high-income countries.

What we’re doing 

In each area below, we’re working to advance the objectives of Wellcome’s new strategy. These include a broad programme of discovery research and a focus on the urgent health challenges of mental health, infectious disease and climate.

Tools to transform

Disproportionate effort is spent in preparing health data for analysis and computational methods now dominate modern science. Despite their importance, little funding is available for the foundational data and software tools used in health data research.

Without the right tools, work with health data is slow, the barrier to entry for data scientists is high, and only well-resourced institutions are able to make substantial progress.

We will fund open source foundational data and software tools for science and health, with a focus on usability, adoption and long-term sustainability. We will make it easier and more equitable for data scientists globally to innovate with health data. 

Trust in practice

Information about health feels intensely personal and people have concerns about privacy, security and commercial access to health data. Those collecting and using health data must earn the trust of people and society by demonstrating that they are trustworthy.

There are technical, institutional and social aspects to trust, all of which need to work together in practice. These include computational and data security measures, establishing good governance and clear accountability, being transparent about how data is used, and involving people in decisions.

Talent with incentive

Effective health data innovation requires a diverse set of people from multiple disciplines to communicate and work effectively together: data scientists and research software engineers, clinicians and healthcare providers, biomedical researchers, patients and the public, policy and decision-makers. These groups often don’t have a shared understanding of problems and there are few opportunities for them to collaborate effectively.

Active projects and calls for proposals  

We’re enabling trustworthy data science by supporting specific projects. These include:

  • The Wellcome Data Science Ideathon is a competition that will have teams compete to propose data science solutions to tackle three urgent health challenges: mental health, infectious diseases, and climate and health.
  • Understanding Patient Data, a programme hosted by Data for Science and Health that combines research, policy and advocacy to make the way patient data is used more visible, understandable and trustworthy. For example, the team recently started the Black and South Asian public-led change: equitable data collection project, to hear the views of Black and South Asian people in the UK on the collection and use of health data. 
  • Running a series of Wellcome Data Prizes to create multi-disciplinary collaborations aimed at solving urgent health challenges. This builds on the scoping done by the Open Data Institute. The first data prize aims to understand effective interventions against anxiety and depression in young people and will be run with Wellcome’s mental health team, Social Finance and DataKind UK.
  • Funding teams to develop open source software tools with a particular focus on community, usability, accessibility, adoption and long-term sustainability. This includes grants awarded to:
    • OpenSAFELY which is a new approach to analysing NHS electronic health records that has proved instrumental in understanding risk factors for Covid-19 mortality
    • the MIT J-Clinic to develop methods for trustworthy clinical artificial intelligence
    • IDDO and Vivli, two data sharing platforms, as part of a vision for a FAIR Data Network of Infectious Diseases (FAIR meaning Findable, Accessible, Interoperable, Reusable).
  • Ensuring the benefits of trustworthy data science are experienced equitably in all communities. We are supporting Outreachy to provide paid internships to people who are underrepresented in open source technology. We aim to understand barriers to data science careers in health and tackle them by changing research culture. We encourage and reward non-traditional research outputs and career paths, such as open data sharing and re-usable code.
  • Landscaping and scoping activities to understand existing work being conducted in health data science. This includes:
    • mapping the health data science ecosystem in East Africa
    • mapping existing open source software used by applied clinical researchers with Hetco Design
    • landscaping the availability of, and need for, software tools to model the relationship between climate and infectious disease with Inter-American Institute for Global Change Research. The incidence, transmission and severity of some infectious diseases can be affected by changes in climate. Model-based approaches offer the opportunity to identify this relationship, and allow predictions that can inform policy.
  • Supporting the Lancet and Financial Times Commission, Governing health futures 2030: Growing up in a digital world. The Commission is exploring the convergence of digital health, artificial intelligence (AI) and other frontier technologies with universal health coverage. Read the Financial Times’ report on the Future of AI and Digital Healthcare.

Calls for proposals

We often put out requests for proposals for specific projects. These will be published alongside all Wellcome contract opportunities


Our team 

Contact us