In our next EDIT Lab blog installment, Zain discusses the replication crisis facing the field of psychiatry and how self-reported measures may, in part, be a contributing factor.



It is a widely accepted fact that the ‘replication crisis’ exists in psychiatry. The replication crisis refers to a situation where the results from studies prove difficult or impossible to replicate in subsequent attempts to repeat the same experiment (Leichsenring et al. 2017). This has been a topic of much discussion since the 1990s, but methodologies still require work to be more reproducible. There are numerous reasons why a set of results fail to be replicated, and among these is the heterogeneity of self-reported measures often relied upon in studies.

In psychiatry, self-reported measures are usually contrasted with clinician-reported measures. In self-reported measures, the patient will complete questions and provide information about their own symptoms and feelings, whereas with clinician-reported, the clinician will interview the patient and use these to inform their report of the patients’ symptoms and feelings. Clinician-reported measures are subject to their own biases, but generally are more reliable than self-reported measures. One of the biggest confounders in self-reported measures is social desirability bias.

Patients have a tendency to give answers in self-reported questionnaires which present a favourable image of themselves. This is referred to as ‘socially desirable responding’ (SDR) (Paulhus 2002). Numerous measures are subject to social desirability, including income, illicit drug use, alcohol consumption and feelings of low mood. This leads to self-reported measures giving an inaccurate picture of the patient’s true situation, where ‘socially undesirable’ traits and symptoms are diminished, and ‘socially desirable’ traits are enhanced. The fact that this bias is not uniform and affects individuals, samples and questions differently significantly complicates the later replication attempts. All is not lost, however.

Yang et al. showed in a 2020 paper that it was possible to correct, at least partially, for inaccuracies in alcohol consumption measures from the UK Biobank sample by using other measures and inconsistencies across measures to deduce when patient responses to alcohol consumption questions were not accurate. For example, patients with a history of alcoholic liver disease who had listed themselves as ‘never drinkers’ were possibly demonstrating social desirability and not being completely truthful in their answers. 

The application of this method actually resulted in the ‘J-shaped curve’ relationship between alcohol consumption and disease risk being modified. Where previously results had made it appear as if low alcohol consumption was actually beneficial for health, it turned out that this was due to mis-categorisation of these patients. In reality, no amount of alcohol is beneficial for health. There is hope for a similar approach in measures of other diseases too.

For example, with low mood and depressive symptoms, an experimenter could review a patient’s drug history and categorise patients who have historically been prescribed antidepressants as having a history of low mood, or the same for current low mood and current antidepressant prescription. Measures such as these would not eliminate the effects of social desirability, but would at least correct for it to some degree, and give researchers some insight into the size of the SDR effect in their sample. 

As well as acknowledging that the replication crisis exists, researchers might also benefit from appreciating that social desirability partly contributes to it. We can bear this idea in mind when designing our studies and we attempt to maximise the possibility of reproducibility. In order to achieve optimal reproducibility, we need to continue to develop and use better measures of symptoms and traits. In studies where self-reported measures are the only feasible method of data collection, we must always ask ourselves ‘How much can I trust what the patient is telling me and how can I verify it?’


Leichsenring, F., Abbass, A., Hilsenroth, M. J., Leweke, F., Luyten, P., Keefe, J. R., Midgley, N., Rabung, S., Salzer, S. and Steinert, C. (2017) “Biases in research: risk factors for non-replicability in psychotherapy and pharmacotherapy research,” Psychological Medicine. Cambridge University Press, 47(6), pp. 1000–1011.

Paulhus, D. L. (2002). Socially desirable responding: The evolution of a construct. In H. I. Braun, D. N. Jackson, & D. E. Wiley (Eds.), The role of constructs in psychological and educational measurement (p. 49–69).

Xue A, Jiang L, Zhu Z, Wray NR, Visscher PM, Zeng J, et al. Genome-wide analyses of behavioural traits biased by misreports and longitudinal changes. medRxiv [Internet]. 2020 Jan 1

Zain-Ul-Abideen Ahmad

Author Zain-Ul-Abideen Ahmad

More posts by Zain-Ul-Abideen Ahmad

Join the discussion 2 Comments

  • Katrina Davis says:

    This is a really good point, and it does require some thought. I do wonder, however, whether we need to be careful with language. “Trust” is an important and bidirectional factor in research and clinical practice. If a patient truly trusted us, and didn’t feel judged, would they be more likely to give an answer that was accurate, or more likely to tell us what they thought we wanted to hear? ‘How much can I trust what the patient is telling me..?’ puts the blame on the research participant, when in reality, we can’t change the fact that there is an innate desire to please, but we can change our research practices and questions to adapt to what we know about SDB. e.g. I was taught always to ask: “How much are you drinking, say in a day, is it one bottle, two, ten?” — supposedly the patient then knows they cannot shock you with how much they are drinking. Can we not work some of that into our patient-reported outcomes? In the UK Biobank alcohol consumption item, participants had to move consumption of each alcoholic beverage up from zero: how different would have been the results if they had to move them down from 10? We need to make it easier for participants to answer accurately, and respond to any bias we uncover, and trust (as I understand the term) should remain (or be enhanced) between researcher and participant by this process.

    • Zain-Ul-Abideen Ahmad says:

      Thank you for taking the time to read the post and thank you for taking the time to comment. You raise an important point which I admit I had perhaps not considered deeply enough. I think it is for certain that there is work to be done on improving and maintaining the level of trust created between the researcher and participant, and that this will continue to be something to be improved upon, perhaps forever. In terms of the language, perhaps ‘rely’ would better serve the purpose I intended in hindsight. The intention here is not to place blame in any direction, but merely to acknowledge that the information provided warrants questioning, as does any information in science. Indeed the scientific method (as I understand it at least) revolves around positing statements which are subsequently disproven by evidence obtained from experimentation. I think the approach of improving patient-researcher trust would work in tandem with verification of self-report responses, but I do not think it diminishes the need for measures which verify patient responses.
      In my opinion at least, building such trust is very difficult in the case of a self-report measure, where you are not necessarily interacting directly with a patient, nor directly gaining their trust. They are not telling ‘us’ what we want to hear in the conventional sense, they are instead interacting with a construct: the faceless author(s) of the questionnaire they are filling in. I agree that this places great importance on how questions are phrased and structured. I would ask, however, what is the relative carryover compared to using checks and balances?
      I think that this approach is indeed modifying our research practices to incorporate our knowledge of SDB: we know that patients may be inclined to answer with a certain bias, and hence we can find a way of correcting for that.
      The relationship between researcher and participant is absolutely crucial to the success of our work, but it is far harder to build without being there with the participant.