Work in progress

The problem with ethnicity categories in UK health data

Ethnicity categories in UK healthcare are inconsistent and not always accurate. This has real-world consequences, like how we estimate the risks of Covid-19 for different groups. In its early days, Wellcome's Data for Science and Health team kicked off several research and engagement projects to investigate. Here's what they learned.

An abstract, colourful illustration shows columns of different personal identity cards. The four cards most visible contain a blue hexagon, a green rectangle, a yellow triangle and a red square – the other four are only partially visible.


Licence: Attribution CC BY

Grace Annan-Callcott

Listen to this article
The problem with ethnicity categories in UK health data
Elapsed time:00:00Total time:00:00

If we want to determine how health outcomes vary across different groups, we first need categories that people can identify with.

One way we currently do this is by using ethnicity categories. But in reality, the terms we use to identify ourselves, how others see us, and how institutions define us are fluid and context-dependent.

That’s one reason why, depending on what sources of health data you look at, you can end up with a different answer to the same question.

Inconsistent ethnicity categories can have real-world consequences 

A recent example is the research carried out during the pandemic to identify how ethnicity is associated with the risk of dying from Covid-19. While research by OpenSAFELY suggested that people of mixed ethnicity are at increased risk, similar research by the Office for National Statistics came to different conclusions. 

This is partly due to the methods they used, but it’s also down to issues with the data itself. The teams used different national datasets, and the way ethnicity is recorded varies between datasets. 

For example, in the census, someone might tick the box that places them as part of a specific ethnic group, while their health record might have the ‘Other’ category selected. In addition, some datasets still use the ethnicity categories from the 1991 census – further contributing to inconsistencies in data recording. 

This inconsistency has real-world consequences for healthcare and policy decisions.

Supporting researchers to investigate ethnicity data 

The Data for Science and Health team at Wellcome has supported several projects to investigate ethnicity data in UK healthcare.

This research contributes to the ongoing debates and discussions surrounding ethnicity data recording by: 

  • highlighting the challenges faced in capturing reliable data 
  • identifying the gaps and comparing different ways of handling ethnicity data 
  • emphasising the importance of data quality 
  • and calling for immediate attention to improve the overall effectiveness of data collection in addressing health inequalities

Four key takeaways from investigating ethnicity data in UK healthcare 

1. Ethnicity categories vary across the different data sources, and understanding of ethnicity varies between groups.

And the categories have changed over time too. For example, in the 2001 Census, there wasn’t a category for ‘Arab’, so anyone who might ordinarily use that category to identify themselves probably had to use ‘Other’ instead.

This has started to change with the recent addition of an ‘Arab’ category in the Census and the General Practice Extraction Service. Moreover, these categories can fail to capture the complexity of ethnicity, with community participants pointing out that ethnicity means different things to different people, including nationality, heritage, geographical region or religious group. The ONS refined the categories further for the 2021 Census. 

2. Ethnicity recording is less accurate for minoritised ethnic groups than for White British people.

Taking the Census as the best source we have for self-recorded ethnicity, we can see which groups are most impacted by poor-quality ethnicity data recording in routinely collected health datasets.

For example, across all the administrative data sources, the ‘Mixed’ ethnic groups had some of the greatest discrepancies with the Census. In more than a third of cases, the electronic health record didn’t match the individual's response, and consistency was also low for the ‘Other Asian’, ‘Other White’, ‘Other Mixed’, ‘Other Black’ and ‘Any Other’ ethnic groups.

3. Healthcare workers need to be better equipped to talk to patients about health data.

In the survey carried out by Liberating Knowledge, only one in five respondents reported having a conversation with a healthcare worker about how patient data is used.

This shows the need for much greater engagement by the NHS with the public regarding the collection and use of data. The research also found that conversations with NHS staff about data could have a significant impact on willingness to share personal data and a broader understanding of the use of patient data, including the benefits. As such, it is important to equip healthcare workers with the tools to hold meaningful conversations about data.

4. Communities want to be and should be, actively involved in creating data collection processes and deciding how the data is used.

It’s clear from these studies that people want to know how their data is collected and used to inform research and address inequalities.

Requesting sensitive data for patient health records without explanation can leave patients feeling coerced and suspicious of data-sharing activities. People want agency in the decision-making process, and they want to be involved in designing and implementing these processes. Doing so will also ensure that processes and policies consider the specific needs of communities and have a greater chance of succeeding. Based on the findings from this research, communities should also be involved in any decision-making and research prioritisation involving data. 

We need better quality data to address health disparities 

If we want research and analysis that addresses health disparities among Black, Asian, and minority ethnic communities, we need better quality data – urgently.

This research is deepening our understanding of the concerns people have about existing ethnicity categories and how to improve them to better reflect how people identify themselves.

Ultimately, our goal is to support the development of trustworthy practices for collecting and using ethnicity data, to promote greater health equity.

With thanks to Theint Theint Thu, Emily Jesper-Mir and Rebecca Asher for contributing to this article.

  • Grace Annan-Callcott

    Program Adviser


    Grace leads the learning, communications and influence work for the Data for Science and Health team at Wellcome. She has an interest and background in communicating the impact of emerging technologies on people, politics and society.

    Connect with Grace: