A framework for ethical governance of mental health databanks
Wellcome has commissioned a series of projects to explore global mental health datasets (a collection of data from a single source or for a single purpose) and databanks (large aggregations of data from many datasets).
This critical analysis framework supports the management of ethical risks arising from the creation, enrichment and aggregation of potentially sensitive datasets.
Download
Who the framework is designed for
We developed this framework to guide teams building a databank through key ethical considerations associated with building and running a databank. We provide a library of mitigation strategies for teams to consider implementing to address these considerations.
It is openly available on a CC-BY 4.0 license (credit: Wellcome), for anyone to use, share and adapt it.
While not compulsory, teams may find this framework useful when writing grant applications for Wellcome or other funding organisations.
Others who may also find this framework useful
- Funders, governments, and civil society/non-governmental organisations might use or build from this framework to guide the evaluation of proposals for data collection, enrichment, aggregation and/or use.
- Research institutions and individual researchers could use this framework to extend the conceptualisation, planning, and execution of their work towards more equitable and community-centring outcomes.
- Communities and people with lived experience are invited to use and adapt this framework to meet their needs as they evaluate opportunities for engaging with the research ecosystem.
What’s included in the framework
The framework takes readers through two phases: developing a databank and using a databank. Each phase explores a variety of risks and mitigations within each area, adding specific points where lived experience and community participation provide mutual benefit. On this page you can explore the key questions. For examples of mitigations, please download the full document.
Phase 1 – Developing a databank
Risks include:
- Possibility of uninformed/poorly informed requisitioning of large amounts of data ('data grab').
- Environmental impact of big data collection, use and storage.
- Privacy protection comes at a cost to scientific solving and/or return of value.
- The scientific hype cycle can add urgency/pressure that isn’t actually there and push researchers.
- Implicit prioritisation of high-income countries' (HIC) research interests over low-and-middle-income countries' (LMIC) research interests.
Risks for regulatorily sensitive data:
- Regulatory regimes may reinforce performative approaches to anonymisation, ascribing greater value to anonymisation and other privacy preserving approaches than is actually realised in practice.
- Data type and subject definitions (and related rights/protections) may vary widely by jurisdiction, particularly in regard to protected groups and the types of data that can be collected from them, including 'minors'.
- Regulatory norms may impose localisation rules for different types of data which may affect centralisation, harmonisation and storage.
- Some data types may have shorter shelf lives because of jurisdiction-specific regulation.
- Data subjects may have limited or variable autonomy/decision making capacity to consent at the point of data collection and/or variable autonomy/decision-making capacity over time and there are no standards for addressing variable autonomy over time in big data/secondary use research.
- National security exceptions to data requests.
- Data used to criminalise or marginalise people in state jurisdiction.
- Use of data for surveillance of communities, individuals or targeted groups especially by government agencies.
- Use of data to drive malicious or corrupt state interests.
- Regulatory regimes may prioritise consideration of individual harms over potential harms to groups or exclude the consideration of group harms entirely. Read more about consideration of harm in the United States.
- In data-rich research, implicated 'communities' include not only self-identified or 'claimed' communities but also imposed, algorithmically defined groups.
Risks for culturally sensitive data:
- Even in the absence of regulatory requirements, there still may need to be limits on the way data is collected, aggregated and/or used for cultural reasons.
- Participants’ desire for adherence to cultural norms regarding biological sample (or other data) collection, storage and destruction may impact the equity and inclusivity of the dataset over time.
Risks if the data already exists:
- Data is available only from a low trust source, for example: data from corrupt state actors or misaligned private sector actors.
- Drive for creating a 'representative' dataset could lead to pressure to aggregate data from low-trust sources.
- Acquiring data from a low-trust source may perversely incentivise further extractive data processes.
- Misalignment of consent conditions when data is aggregated.
- The conditions of consent under which the data were collected are unclear or suspect (for example: click to agree, terms of service, or consent to data/sample collection as a condition of receiving healthcare).
- Data exists but is analogue.
Risks if the data does not already exist and this prompts data collection:
- Risk of inadequate data collection (inequitable, exclusive).
- Data type is not available equally from all contexts because of different access to technology (for example: wearable data from youth in low-income countries).
- Data collection cost/sustainability may vary widely across contexts.
- Data may be pragmatically challenging to obtain and these challenges may not be equal across all jurisdictions. For example: if target data is in medical records, its extraction may be near impossible if those records are paper, stored in a conflict zone, etc.
- Lack of awareness and literacy among participants regarding impact of data collection (for example: people don’t know what they are giving away).
- Drive for creating a 'representative' dataset could lead to coercion in collection, for example: through use of disproportionate incentives.
- Lack of specific, unambiguous informed consent obtained from participants.
- The research demands of a databank are not conducive to individual participant sign off on particular secondary uses (such as granular consent).
Risks include:
- As data are aggregated for secondary use, they are further removed from their community context.
- The demands of data scale and speed of databank building drives partnership with technology platform vendor(s) that may have low/lower public trust.
- Platform vendor as source of noncompliance with the spirit of data regulations (for example: legal compliance in the absence of ethical, cultural, or moral compliance).
- Difficult to institute community decision making because access already may be determined by the data broker/set at collection.
- If access controls are not universal for all data within the databank (for example: due to federation or specific access requirements set at data collection) some data may become overused or hypervisibilised which can result in exploitation and/or inequity/exclusion.
Read more about recognising and preventing the strain of hypervisibility.
Risks include:
- When data is collected using standard measures across diverse cultures, equivalent data values may not have equivalent meaning (for example: due to differences in diagnostic criteria, in diagnosis rates, in access). See an example of adaptation in the Malawi Longitudinal Study of Families and Health.
- Data dictionary may not align with different community groups’ self-definitions.
- Choice of data harmonisation strategy for data from varied origins (geographic, commercial) implicitly prioritises some cultural norms over others.
- Data generated for commercial purposes may be harmonised/curated to different data standards than research data.
- There are no data harmonisation or curation standards that are community-informed.
- Curation of some forms of data (for example: biological samples) may need to be localised due to regulation; cost of this curation may limit inclusion of data from lower resourced contexts.
Risks include:
- Databank builder bearing data storage fees may unequally incentivise some data holders to allow for centralisation/cede local control.
- Regulatory or cultural restrictions on storage may reduce the inclusion of some groups (for example: indigenous communities) due to the removal of samples.
- If regulations or cultural norms require periodic re-consent or recontact, not all data/sample contributors will be equally reachable, especially people who lack stable housing or people living in areas with unstable infrastructure.
- The environmental impact of data storage.
- The costs (maintenance, compliance) of storage of primary data/samples may impact the sustainability of the databank as a whole and must be balanced against the potential benefit of storing primary data/samples for reinterpretation over time.
- If data are not centrally held, long term storage costs may be borne inequitably.
Phase 2 – Using the databank
Risks include:
- Access controls may block/prioritise some researchers because of bandwidth requirements, credentialing (for example: US eRA Commons ID) requirements, and/or external ethics review requirements.
- Having low access controls for certain requesters (private, state actors) may result in misuse of data.
- Risks assessment models for providing access to requesters may be based on biases.
- Lack of recourse in case of misuse of data.
Risks include:
- The databank is underused by target users (for example: databank not fulfilling its ethical obligation to participants).
- Databank users drawn exclusively from the pool of people who are already known.
- Inequitable distribution of resources and capacity leads to inequity in data use.
- Databank infrastructure is exclusionary due to resource/capacity requirements for its use.
- Risk of systemic inequities or exclusion of research subject groups being unwittingly furthered by well-meaning researchers.
- Lack of transparency regarding use of data.
- Potential for misinformed or biased research design/agenda by data users.
- Potential for use of well-meaning data analyses for targeted activities (selling, campaigning, etc) by malicious or misaligned actors.
- There are malicious community actors with agendas.
- Mechanisms of research funding may skew which analyses are proposed and/or research design and/or methods of research.
- Funding behind a research request may affect research the agenda.
- Profit motives and shareholder interests may skew research design, or be covertly designed solely toward profitability.
Risks include:
- Limitations on computational equity posed by cost of computation or local infrastructure (for example: power grid stability, hardware) in low-resource settings.
- Bias in the dataset may result in bias generating/perpetuating research and/or spurious findings.
- Research may generate systemic harm and/or discrimination.
- Researchers prioritising advantaged (high-income country, white, global north) contexts over disadvantaged contexts.
- Research does not recognise structural harms, systemic biases, historical contexts, and/or is unaligned with community desires.
- Databank used for 'diversity washing' of insights gleaned from homogenous populations.
Risks include:
- Individual-level decisions on data use may, over time, disproportionately prioritise value toward certain stakeholders or disenfranchise others.
- Barriers to distribution of value from the databank – such as language, geography and income – may undermine the value the research holds for communities.
- Proper distribution of value among communities may be difficult due to the nature of value itself.
- Lack of accessible databank infrastructure may limit how much value communities can make of the databank as a resource themselves.
Download the framework
To learn more about the risks and how to mitigate them, download the full framework:
About this framework
Wellcome worked in partnership to create this guidance:
- Aapti Institute – a public research organisation that works at the intersection of technology and society. With a keen focus on data governance and rights from a community-centric lens, Aapti explores stakeholder-wide pathways to responsibly unlock the societal value of data.
- Sage Bionetworks – a non-profit health research organisation focused on the translation of science into medicine.
Wellcome's lived experience advisers and external subject matter experts reviewed the framework, ensuring that critical nuance was not lost through the synthesis process.
Related content
- A framework for ethical governance of mental health databanks
- Common metrics in mental health research
- Costs grantholders can claim on a grant
- How to extend or postpone your grant end date
- How to report grant progress
- How to transfer a grant
- How to transfer funds between budget headings
- Supplementary grant funding
- Using an engaged research approach