In our next EDIT Lab blog installment, Zain discusses the replication crisis facing the field of psychiatry and how self-reported measures may, in part, be a contributing factor.
It is a widely accepted fact that the ‘replication crisis’ exists in psychiatry. The replication crisis refers to a situation where the results from studies prove difficult or impossible to replicate in subsequent attempts to repeat the same experiment (Leichsenring et al. 2017). This has been a topic of much discussion since the 1990s, but methodologies still require work to be more reproducible. There are numerous reasons why a set of results fail to be replicated, and among these is the heterogeneity of self-reported measures often relied upon in studies.
In psychiatry, self-reported measures are usually contrasted with clinician-reported measures. In self-reported measures, the patient will complete questions and provide information about their own symptoms and feelings, whereas with clinician-reported, the clinician will interview the patient and use these to inform their report of the patients’ symptoms and feelings. Clinician-reported measures are subject to their own biases, but generally are more reliable than self-reported measures. One of the biggest confounders in self-reported measures is social desirability bias.
Patients have a tendency to give answers in self-reported questionnaires which present a favourable image of themselves. This is referred to as ‘socially desirable responding’ (SDR) (Paulhus 2002). Numerous measures are subject to social desirability, including income, illicit drug use, alcohol consumption and feelings of low mood. This leads to self-reported measures giving an inaccurate picture of the patient’s true situation, where ‘socially undesirable’ traits and symptoms are diminished, and ‘socially desirable’ traits are enhanced. The fact that this bias is not uniform and affects individuals, samples and questions differently significantly complicates the later replication attempts. All is not lost, however.
Yang et al. showed in a 2020 paper that it was possible to correct, at least partially, for inaccuracies in alcohol consumption measures from the UK Biobank sample by using other measures and inconsistencies across measures to deduce when patient responses to alcohol consumption questions were not accurate. For example, patients with a history of alcoholic liver disease who had listed themselves as ‘never drinkers’ were possibly demonstrating social desirability and not being completely truthful in their answers.
The application of this method actually resulted in the ‘J-shaped curve’ relationship between alcohol consumption and disease risk being modified. Where previously results had made it appear as if low alcohol consumption was actually beneficial for health, it turned out that this was due to mis-categorisation of these patients. In reality, no amount of alcohol is beneficial for health. There is hope for a similar approach in measures of other diseases too.
For example, with low mood and depressive symptoms, an experimenter could review a patient’s drug history and categorise patients who have historically been prescribed antidepressants as having a history of low mood, or the same for current low mood and current antidepressant prescription. Measures such as these would not eliminate the effects of social desirability, but would at least correct for it to some degree, and give researchers some insight into the size of the SDR effect in their sample.
As well as acknowledging that the replication crisis exists, researchers might also benefit from appreciating that social desirability partly contributes to it. We can bear this idea in mind when designing our studies and we attempt to maximise the possibility of reproducibility. In order to achieve optimal reproducibility, we need to continue to develop and use better measures of symptoms and traits. In studies where self-reported measures are the only feasible method of data collection, we must always ask ourselves ‘How much can I trust what the patient is telling me and how can I verify it?’
Leichsenring, F., Abbass, A., Hilsenroth, M. J., Leweke, F., Luyten, P., Keefe, J. R., Midgley, N., Rabung, S., Salzer, S. and Steinert, C. (2017) “Biases in research: risk factors for non-replicability in psychotherapy and pharmacotherapy research,” Psychological Medicine. Cambridge University Press, 47(6), pp. 1000–1011.
Paulhus, D. L. (2002). Socially desirable responding: The evolution of a construct. In H. I. Braun, D. N. Jackson, & D. E. Wiley (Eds.), The role of constructs in psychological and educational measurement (p. 49–69).
Xue A, Jiang L, Zhu Z, Wray NR, Visscher PM, Zeng J, et al. Genome-wide analyses of behavioural traits biased by misreports and longitudinal changes. medRxiv [Internet]. 2020 Jan 1