The problem with ethnicity categories in UK health data
Ethnicity categories in UK healthcare are inconsistent and not always accurate. This has real-world consequences, like how we estimate the risks of Covid-19 for different groups. In its early days, Wellcome's Data for Science and Health team kicked off several research and engagement projects to investigate. Here's what they learned.
If we want to determine how health outcomes vary across different groups, we first need categories that people can identify with.
One way we currently do this is by using ethnicity categories. But in reality, the terms we use to identify ourselves, how others see us, and how institutions define us are fluid and context-dependent.
That’s one reason why, depending on what sources of health data you look at, you can end up with a different answer to the same question.
Inconsistent ethnicity categories can have real-world consequences
A recent example is the research carried out during the pandemic to identify how ethnicity is associated with the risk of dying from Covid-19. While research by OpenSAFELY suggested that people of mixed ethnicity are at increased risk, similar research by the Office for National Statistics came to different conclusions.
This is partly due to the methods they used, but it’s also down to issues with the data itself. The teams used different national datasets, and the way ethnicity is recorded varies between datasets.
For example, in the census, someone might tick the box that places them as part of a specific ethnic group, while their health record might have the ‘Other’ category selected. In addition, some datasets still use the ethnicity categories from the 1991 census – further contributing to inconsistencies in data recording.
This inconsistency has real-world consequences for healthcare and policy decisions.
Supporting researchers to investigate ethnicity data
The Data for Science and Health team at Wellcome has supported several projects to investigate ethnicity data in UK healthcare.
We supported the Office of National Statistics' work to understand the consistency and completeness of ethnicity datasets.
Their analysis looks across national datasets to quantify some of the issues around the recording of ethnicity in health datasets in England. It takes the 2011 Census as a baseline (the most recent census data available at the time of research) and compares it with ethnicity information from:
- the Hospital Episode Statistics (HES), a database containing details of all admissions, accident and emergency attendances and outpatient appointments at hospitals in England
- and the General Practice Extraction Service (GPES) Data for Pandemic Planning and Research (GDPPR), a fortnightly extraction of data from General Practices to support the response to the pandemic.
Read the Office of National Statistics analysis
Race Equality Foundation engaged with community participants and healthcare workers to investigate different aspects of ethnicity data collection in healthcare settings and related this to the quality of data produced.
Community and healthcare workers explained the barriers to recording ethnicity in healthcare settings: from systematic hurdles, such as a historical distrust of the healthcare system and concerns about racism, to technical blockers, such as a lack of time or a standardised definition of ethnicity. The research also uncovered that communities and healthcare workers were often unaware of the importance of ethnicity data or why they were being asked to record this data, with no guidance available to explain this.
From this work, it was clear that the reliability of ethnicity data is impacted more by methodological factors rather than the willingness of communities to participate.
Read the Race Equality Foundation report [PDF]
Understanding Patient Data, at the time hosted at Wellcome, partnered with ClearView Research and Liberating Knowledge to explore the views and experiences of Black and South Asian people and healthcare workers on patient data; and produced guides for NHS professionals collecting health data, including data on ethnicity.
ClearView Research focused on working with Black and South Asian members of the public, using community research and exploration labs to listen to their concerns, questions and aspirations about how health data is collected and used.
Building on this public engagement, Liberating Knowledge explored the views of healthcare professionals through focus groups and interviews and carried out a public survey to bring quantitative insight. They also collaborated with healthcare professionals to develop guides to data collection for healthcare staff, leaders and policy professionals with the aim of supporting healthcare workers to have better conversations about health data with patients.
Read the Closing gaps in patient data for Black and South Asian communities [PDF] report
Read the Diverse voices on data [PDF] report
This research contributes to the ongoing debates and discussions surrounding ethnicity data recording by:
- highlighting the challenges faced in capturing reliable data
- identifying the gaps and comparing different ways of handling ethnicity data
- emphasising the importance of data quality
- and calling for immediate attention to improve the overall effectiveness of data collection in addressing health inequalities
Four key takeaways from investigating ethnicity data in UK healthcare
1. Ethnicity categories vary across the different data sources, and understanding of ethnicity varies between groups.
And the categories have changed over time too. For example, in the 2001 Census, there wasn’t a category for ‘Arab’, so anyone who might ordinarily use that category to identify themselves probably had to use ‘Other’ instead.
This has started to change with the recent addition of an ‘Arab’ category in the Census and the General Practice Extraction Service. Moreover, these categories can fail to capture the complexity of ethnicity, with community participants pointing out that ethnicity means different things to different people, including nationality, heritage, geographical region or religious group. The ONS refined the categories further for the 2021 Census.
2. Ethnicity recording is less accurate for minoritised ethnic groups than for White British people.
Taking the Census as the best source we have for self-recorded ethnicity, we can see which groups are most impacted by poor-quality ethnicity data recording in routinely collected health datasets.
For example, across all the administrative data sources, the ‘Mixed’ ethnic groups had some of the greatest discrepancies with the Census. In more than a third of cases, the electronic health record didn’t match the individual's response, and consistency was also low for the ‘Other Asian’, ‘Other White’, ‘Other Mixed’, ‘Other Black’ and ‘Any Other’ ethnic groups.
3. Healthcare workers need to be better equipped to talk to patients about health data.
In the survey carried out by Liberating Knowledge, only one in five respondents reported having a conversation with a healthcare worker about how patient data is used.
This shows the need for much greater engagement by the NHS with the public regarding the collection and use of data. The research also found that conversations with NHS staff about data could have a significant impact on willingness to share personal data and a broader understanding of the use of patient data, including the benefits. As such, it is important to equip healthcare workers with the tools to hold meaningful conversations about data.
4. Communities want to be and should be, actively involved in creating data collection processes and deciding how the data is used.
It’s clear from these studies that people want to know how their data is collected and used to inform research and address inequalities.
Requesting sensitive data for patient health records without explanation can leave patients feeling coerced and suspicious of data-sharing activities. People want agency in the decision-making process, and they want to be involved in designing and implementing these processes. Doing so will also ensure that processes and policies consider the specific needs of communities and have a greater chance of succeeding. Based on the findings from this research, communities should also be involved in any decision-making and research prioritisation involving data.
We need better quality data to address health disparities
If we want research and analysis that addresses health disparities among Black, Asian, and minority ethnic communities, we need better quality data – urgently.
This research is deepening our understanding of the concerns people have about existing ethnicity categories and how to improve them to better reflect how people identify themselves.
Ultimately, our goal is to support the development of trustworthy practices for collecting and using ethnicity data, to promote greater health equity.
With thanks to Theint Theint Thu, Emily Jesper-Mir and Rebecca Asher for contributing to this article.