Measurement and the heritability gap for childhood behaviour problems

Decades of twin studies have shown that childhood behaviour problems including anxiety, depression, conduct and hyperactivity are substantially heritable. However, our recent research found that individual differences in behaviour problems are not significantly influenced by the common DNA differences that we directly measure. This finding held across diverse domains of psychopathology, across parent-, teacher-, and self-ratings, and across ages 12 and 16, in a large sample. A previous EDIT lab blog and a ‘TEDS talk’ spell out our methods and results, as well as the implications for research into the aetiology of behaviour problems. This article focuses on one set of explanations for low DNA heritability: the difficulties associated with measuring behaviour problems.


Amongst the many potential explanations for the low DNA-based heritability of behaviour problems in TEDS*, one that was highlighted by Elise Robinson in her comment for an article about our paper in Spectrum News, is the issue of measurement. Elise said:

“These findings don’t mean that we shouldn’t be using genetics to study childhood behaviour problems. But they do suggest that we need to figure out a better way to measure behavioural problems in children.”

But could so many diverse measures – 37 across many phenotypes (traits) and raters (i.e. self-, parent- and teacher-ratings of behaviour) – really all not be working? Twin heritability estimates for behaviour problems in TEDS match those from many other studies. The measures are well-validated and have good psychometric properties. Moreover, measurement issues should affect twin and SNP estimates equally, so it’s hard to see how measurement could explain the huge gap.

However, other clues do suggest that suboptimal measurement could explain low DNA heritability estimates. For example, height and weight (easier to measure ‘objectively’/without error) had substantial DNA heritability estimates in the same sample.

Other clues do suggest that suboptimal measurement could explain low DNA heritability estimates.

What are some potential issues with measurement of behaviour problems, and to what extent might they explain the low SNP heritability?

Issue 1: behaviour problem measures are skewed

Accumulating behaviour genetic evidence finds that common disorders like anxiety or ADHD are the quantitative extremes of heritable continua. TEDS questionnaire items ask about a range of symptoms, and each has at least 3 response options to help capture the full distribution of each trait.

However, behaviour problem measures don’t form the ideal ‘bell-curve’. They do not always assess individual differences sensitively throughout the normal distribution. Skew reflects the purpose of the measures – to assess behaviour ‘problems’ by capturing variation at the negative ‘pathology’ end of each trait. When such measures are applied to a population-representative sample like TEDS, the data are usually skewed. For example, the figure below shows 12 measures of anxiety and depression in TEDS. The measures are all positively skewed, meaning that a large majority of individuals have very low scores- it takes high levels of anxiety to get a high score.

Missing variation at the low non-pathology end could result in reduced power to estimate genetic influence on the ‘core’ quantitative trait. A large proportion of the sample have low scores, leaving little information for heritability estimates to work with.

However, the presence of skew does not necessarily suggest that the questionnaires aren’t measuring behaviour problems well. For one thing, twin heritability estimates of the same skewed measures in TEDS match those in other studies. Moreover, we conducted sensitivity analyses in which we compared heritability estimates resulting from different ways of manipulating the distributions of the measures. We found that transforming the distributions to be ‘normal’ did not increase DNA based or twin estimates.

Another way we could test the effect of skew on heritability would be to use measures that tap into the full range of each behavioural trait, asking about ‘positive’ aspects, and compare the heritability of these to that of the psychopathology measures. For example, in TEDS, we could compare the SWAN measure of ADHD-like traits (which is informative across positive and negative dimensions) to the Conners ADHD measure. The SWAN might capture a more heritable trait. However, if we are to develop and use more normally-distributed measures, it isn’t necessarily obvious what the lower ends are characterised by. Depression could be on the same spectrum as ‘subjective wellbeing’, but might the opposite end of anxiety be an unusually low level of fear/worry?

Issue 2: there is little rater agreement on ratings of behaviour problems

By definition, heritability estimates are reduced when a trait is measured with a lot of error (and total variance and environmental variance are inflated). One definition of reliability – the inverse of error – is consistency across raters.

In this sense, behaviour problem measures are unreliable. Parents, teachers and children themselves do not agree much about the severity of children’s behavioural problems: their ratings tend to have a correlation of only ~0.3.

For example, the figure below shows correlations between anxiety and depression variables in TEDS across raters (and across time) – stronger positive correlations are a deeper red.

The low correlations might reflect that the individual measures do not tap into core behavioural traits. This idea seems to be backed up by research showing that the strongest genetic influences are on what raters see in common about children’s behaviour (‘trans-situational behaviour’). In other words, removing rater-bias, tends to yield a better, more accurate measure.

However, rather than emphasising disagreement and reporter-specific error, many researchers now highlight that parents, teachers and children have different insights, and are reporting on different aspects of a child’s behaviour. For example, children don’t necessarily behave in the same way at home as at school. These aspects of behaviour seem to have different genetic and environmental influences on them. Indeed, there is evidence for rater-specific genetic and environmental influences as well as rater-common influences.

Making better USE of our measures

In summary, the combination of the skewed distribution of behaviour problem measures, and the low concordance between raters is likely to reduce power for heritability analyses. But the way forward is not to give up on rich, multi-reporter, longitudinal data. Instead, researchers should focus on analysing what has been measured, and making better use of our data. For example, childhood behavioural and emotional problems vary a lot over time, such that single time-point measures are inherently inaccurate in a developing population. We are writing a paper showing that capturing stability across time and across raters helps to obtain a ‘core’ stable trait. This trait has higher DNA and twin heritability.

“Researchers should focus on analysing what has been measured, and making better use of our data.”

Improved measurement of behaviour problems is unlikely to narrow the gap between these heritability estimates. However, anything that increases DNA heritability for behaviour problems could be helpful in relation to future gene-finding and prediction studies. The young age of onset of behaviour problems, and their persistent, wide-ranging negative outcomes, mean that prediction should be prioritised.

In this article, I have made the assumption that there is something wrong with our estimates of DNA heritability for childhood behaviour problems. This optimism can be justified by the fact that estimates from studies of psychopathology in adults, once samples are very large, are quite a lot higher. A recent genomic study from our group of anxiety in UK Biobank (25,453 probable cases and 58,113 controls) found a substantial DNA heritability of ~30%. This suggests that anxiety is highly polygenic, but that homogenous phenotyping (measurement) and large samples are required to capture it. The question is whether this pattern will hold for childhood psychopathology.

Many thanks to Geli Ronald for her contribution to this blog and to the paper.


*some key explanations include: modest study sample size (resulting in inadequate power to detect small effects of common DNA variants); the importance of other factors that aren’t captured in current DNA-heritability estimates (e.g. gene-gene interactions, gene-by-shared-environment interactions, and rare genetic variants). Notably, different factors probably vary in their importance in explaining the heritability gap for different traits e.g. rare variants are more relevant to autistic traits.