Yasmin considers the ways in which we measure and discuss race, ethnicity and ancestry in human research. When scientists use these terms incorrectly, they reinforce and perpetuate inaccurate and often racist narratives.


Dr Yasmin Ahmadzadeh


Everyone loses when science is not diverse. But how do we measure diversity?


In scientific research, the information (or variables) that we measure must be valid. Whole careers and research fields are built around the quest to achieve accurate measurement of human traits and characteristics. This is to ensure the validity of our results and evidence-based recommendations. 

Race, ethnicity and ancestry are among the most commonly used variables in human research. Many journals require inclusion of these data before research is accepted for publication. However, measurement protocol for these variables remains murky and, in some instances, unacceptable (some countries [e.g., Germany] do not collect demographic data on race or ethnicity).

In an ideal world, we would use categories that allow us to group populations according to relevant common features in an accurate and systematic way, so that the resulting statistics can be easily reproduced and compared over time and between sources. However, the categories used for measuring race, ethnicity and ancestry are inconsistent and can be harmful. When scientists use these terms incorrectly, they reinforce and perpetuate inaccurate and often racist narratives.

In search of valid categories for labelling populations


Race, ethnicity and ancestry are multifaceted terms subject to shifting definitions. They are related, but they are not all the same. These terms should not be used interchangeably to define participant groups in research. Unfortunately, this mistake is often made, resulting in the misrepresentation of participant groups.

Race categories were developed as a taxonomic grouping of humans, formed during the heyday of European colonialism. They are typically associated with physical characteristics such as skin colour or hair texture (e.g., ‘Black’ or ‘white’). They are not valid indicators for any underlying biological differences between individuals (e.g., genetics), or place of origin. Racial categories are a social construct. That is, there are no observable, biological measures that can be used to reliably classify individuals within racial groups. The history of racial categorising is entrenched with oppression and subjugation of non-European people. Many agree that race is the invented product of racism.

Ethnicity refers to the grouping of humans based on cultural expression and identification (e.g., ‘Latino’, ‘European’, ‘Chinese’). This can include factors such as shared ancestry, language or traditions. As with race, ethnicity is a social construct, categorised into groups that are deemed distinct by society. There is no consensus on what constitutes an ethnic group, and membership is self-defined and subjectively meaningful to the person concerned.

Ancestry typically refers to the geographical origin of populations (e.g., ‘European ancestry’ or ‘African Americans’). It can also be used in terms of the heritage or descent of a group, (e.g., ‘Ashkenazi Jewish ancestry’). Ancestry is considered the most objective of the three defined terms, but it is not without flaws. Populations are never homogenous in their ancestry because genetic diversity is continuous, not categorial. There is no such thing as a population with a single direct (or, as some have put it, ‘pure’) ancestry. Therefore, the borders that we place around ancestries are arbitrary and, again, socially constructed.

Categories that we currently use in research


The ‘race’ and ‘ethnic group’ categories currently recommended for use in the USA (by the NIH) and the UK (by the ONS) are outdated and unsystematic. They were not designed for research. They pool a range of racial, ethnic and ancestry terms within single categories (e.g., ‘Black/African/Caribbean/Black British’). They include derogatory words (e.g., ‘American Indian’) and group people by skin colour (e.g., ‘white’). They include some cultural groups (e.g., ‘Arab’ or ‘Hispanic’), but not others. Some countries are listed individually (e.g., ‘Ireland’ or ‘India’), while other countries are clustered in large geographical regions (e.g., ‘Africa’ or ‘Asia’). Personally, I find that my Middle Eastern ancestry is grouped with ‘white’ Europeans on one scale, then with ‘non-white’ Asians on another. 

In sum, these categories are inadequate for representing the global population equally and fairly. As researchers, we must work to break perpetual cycles of racism in research, and not remain complicit. We can strive to do things differently and develop new tools for research. We must not stop talking about race, but we must understand and use racial terms correctly. Abandoning racial categories would be a mistake. If we stop using racial categories, then we will not be able to identify racial inequity in research.

A call for action


Correct your vocabulary, and the vocabulary of others. Consider the context.


* Racial terms (e.g., ‘Black’) are only appropriate when discussing the racialisation of participants. (see here for why many are now deciding to capitalise the B in Black).

* By using racial categories in the wrong context, we risk reinforcing the notion that they are biologically and scientifically valid.

* Stop referring to race, ethnicity or ancestry as ‘risk factors’ in health research. There is no evidence to suggest that ethnicity is a causal risk factor for health problems. There is evidence to suggest that racism and marginalisation is.

* Stop using the term ‘Caucasian’ in research. This is a pseudoscientific term, derived from an 18th century racist classification system. You should use either ‘white’ or ‘European ancestry’, depending on your context. (see more here)


Consider how the data that you collect will be used in your research.

Is it valid? Who will it serve? What can it really tell you? Wherever possible, be specific.


* Do not pool all ‘Black, Asian and Minority Ethnic’ (BAME) groups into one category. The BAME label (and related acronyms, such as BME) is insufficient to reflect, and therefore help, the many groups captured within that definition. The BAME categorisation creates a dichotomy of ‘white’ versus ‘non-white’ (or ‘majority’ versus ‘minority’ groups), which serves to reinforce racism in research. If you must refer to multiple communities under a single label, the terms ‘marginalised groups’ or ‘racialised communities’ are preferable. (see here for further critique of the BAME label). 

* Categories need to be consistent across research and health services, to support research efforts and policy recommendations. But categories also need to capture what it is that matters for the study at hand. For example, ancestry will be more important for geneticists exploring the role of population stratification; while racial terms will be more important for psychologists exploring the influence of racial inequality.

* Where possible, data collection should support an intersectional approach to understanding the multiple identities that individuals hold, encapsulating different forms of privilege or disadvantage in society. Information on participant race, ethnicity and ancestry cannot tell the whole story. Within groups there is diversity of experience. 


Individuals of non-European descent – i.e., the majority of the global population – remain under-represented in human research (among research teams and research participants). Their narratives are excluded and their data are under-reported. Many teams and institutions are now striving to change and improve their research strategies to address this imbalance. This involves confronting the systemic racism that exists in human research. An important first step will be to consider how we define diversity among participants in the first place. We must be meticulous in our planning as to what data we need for our research, and why we need it. We must involve participant perspectives in this process. We must be honest with participants about the limitations of the variables that we use, and communicate how we intend to use any data that we collect about their race, ethnicity or ancestry in our work. 

We must see this as an active learning process, requiring constant reflection and challenging of assumptions that are so deeply ingrained in our practise and institutions.


Saini, A. (2019). Superior: the return of race science. Boston: Beacon Press.

Rutherford, A. (2020). How to Argue with a Racist: History, Science, Race and Reality. London: Hachette UK.

Kendi, I. X. (2019). How to be an antiracist: One world.

Miika Tervonen, historian and social scientist, on the “the myth of a monocultural Finland”.

Professor Dorothy Roberts, professor of law and sociology, on race-based medicine.

Recording of a recent webinar chaired by the Mental Elf: “Mental health research is racist, so what are we all going to do about it?“.


NB: the last national survey of racial and ethnic minority mental health in the UK was published almost 20 years ago, in which researchers pooled all “Black, Asian and Minority Ethnic” (BAME) groups into one category.


Yasmin Ahmadzadeh

Author Yasmin Ahmadzadeh

More posts by Yasmin Ahmadzadeh