Next in the ‘A-Z’ series is M for Minimal Phenotyping. Reaching the huge sample sizes needed for identifying the genetic basis of psychological traits is challenging. In this blog, Alicia and Patrycja look at minimal phenotyping as an approach to overcome sample size challenges.

Alicia Peel, Former EDIT Lab PhD student






One of the most robust findings from decades of behavioural genetic research is that all psychological traits show significant genetic influence. In fact, it is number 1 on the list of the Top 10 Replicated Findings from Behavioral Genetics. These estimates of genetic influences on psychological traits have been largely obtained from family and twin studies, which estimate heritability based on similarity between relatives. Taking depression as an example, twin studies consistently estimate that genetic influences account for ~40% of variance in this disorder (see our previous post on D for Depression). 

More recently, there has been an increase in molecular genetic research, in which heritability is estimated from differences in single genetic variants. However, despite consistent evidence of heritability from family studies, molecular genetic research has struggled to identify the specific variants that contribute to this heritability. This is likely due to genetic risk for psychiatric disorders being made up of thousands of genetic variants, which each explain a very small proportion of the effect.


In genome-wide association studies (GWAS), the effects of up to ~10 million single genetic variants are estimated for a specific trait. Because so many tests are run in GWAS, researchers apply a very strict correction to their p-value threshold, which is the level that is used to determine if the result is likely to be due to chance. Applying this correction limits the risk of false positive findings, by ensuring that only the more robust associations are identified. However, even the most robustly associated genetic variants still only have a small effect on their own. To achieve power to detect these small associations at this corrected significance level, samples in the hundreds of thousands to millions are typically needed.

The power needed to detect these associations causes an issue for genetic research of psychological traits. The samples that we have with detailed measures assessing phenotypes such as depression often aren’t large enough to detect genome-wide significant hits or to robustly identify the genetic risk loci underlying psychiatric disease (Cai et al., 2020).

However, reaching the huge sample sizes needed for identifying the genetic basis of psychological traits is challenging. This is mostly because of the financial and time constraints of collecting symptom data for the thousands of individuals required for well powered GWAS. In addition, completing long and detailed phenotypic measures can be burdensome for participants, especially if they are currently struggling with their symptoms.

One approach that can be taken to overcome these sample size challenges is to use ‘minimal phenotyping’, whereby single self-reported questionnaire items are used to capture the trait of interest (Sanchez-Roige & Palmer, 2020; Cai et al., 2020). This strategy decreases phenotyping costs and reduces data to a single or few self-reported answers, which makes data collection quicker and easier for researchers and participants. Hence, it can be more easily applied to bigger samples, including population-based cohorts such as the UK Biobank. 


A recreation of the illustration from Sanchez-Roige & Palmer (2020) depicting the trade-off between phenotyping depth and sample size: ​​Deep phenotyping is more expensive and time consuming; therefore, when the available budget is fixed, greater phenotyping depth comes at the expense of sample size. In contrast, scalable phenotyping strategies, which are more commonly used in population-based cohorts, allow for larger sample sizes.

Minimal phenotyping has been used in GWAS of psychiatric phenotypes, such as depression, to achieve the sample sizes needed to detect genome-wide significant loci. One example is a study conducted in the UK Biobank by Howard et al. (2018). They derived three phenotypes capturing depression using different assessment methods. These self-reported traits were:

  1. ‘Broad depression’ phenotype: self-reported past help-seeking for problems with “nerves, anxiety, tension or depression”, 
  2. Probable MDD phenotype: depressive symptoms with associated impairment assessed through diagnostic algorithms,
  3. ICD-coded MDD phenotype: depression identified from International Classification of Diseases (ICD)-9 or ICD-10-coded hospital admission records.

While the latter two phenotypes are more structured definitions of depression based on diagnostic criteria, the first is designed to broadly capture individuals who are likely to have experienced depression in their lifetime.

The researchers conducted a GWAS for each of these phenotypes in 322,580 UK Biobank participants, and identified 17 independent genetic loci that were significantly associated with all three. This suggests that each phenotype was capturing a trait that shared some genetic basis. Fourteen of these loci were never previously identified, showcasing the power of minimal phenotyping for the discovery of new genetic variants associated with psychiatric disorders. Their findings were also replicated in an independent sample, indicating that they were not just specific to the participants in the UK Biobank. Using the ‘broad depression’ phenotype enabled the discovery of more genes, advancing understanding of the biological pathways underlying depression.

However, it has also been argued that depression phenotyped through only self-reported measures is not representative of a clinical disorder, and instead identifies many cases with nonspecific or subclinical depressive symptoms, or symptoms of comorbid conditions. 

To assess the implications of a minimal phenotyping strategy for GWAS of depression, Cai et al. (2020) compared the genetic architecture of minimal phenotyped definitions of depression with those using full diagnostic criteria in the UK Biobank. They found that minimal phenotyped GWAS preferentially identifies loci that are not specific to depression. This means that genetic loci identified through this approach could actually bias the understanding of genetic architecture of the disorder in question. 

Nevertheless, two minimally assessed depression phenotypes – ‘broad depression’ and depression assessed through self-reported clinical diagnosis or treatment history –  have been found to have strong genetic correlations with clinically-derived phenotypes for major depressive disorder (Howard et al., 2019). This suggests that despite using different phenotypic instruments, each assessment of depression was likely to be capturing the underlying genetic architecture of a similar trait.

Combining the results of all of these phenotypic strategies in a meta-analysis resulted in a sample size of over 800,000 individuals, and has enabled the most successful genetic investigation of depression to date, identifying over 100 independent variants (Howard et al., 2019). This demonstrates how, despite the criticisms of minimal phenotyping, this strategy can be combined with more traditional ways of capturing depression, or, more broadly, psychiatric phenotypes, to boost power and increase understanding of the genetic basis of disorders.



Plomin, R., DeFries, J. C., Knopik, V. S., & Neiderhiser, J. M. (2016). Top 10 replicated findings from behavioral genetics. Perspectives on Psychological Science, 11(1), 3-23.

Sanchez-Roige, S., & Palmer, A. A. (2020). Emerging phenotyping strategies will advance our understanding of psychiatric genetics. Nature Neuroscience, 23(4), 475–480.

Howard, D. M., Adams, M. J., Shirali, M., Clarke, T. K., Marioni, R. E., Davies, G., … & McIntosh, A. M. (2018). Genome-wide association study of depression phenotypes in UK Biobank identifies variants in excitatory synaptic pathways. Nature Communications, 9(1), 1-10.

Cai, N., MDD Working Group of the Psychiatric Genomics Consortium, Revez, J. A., Adams, M. J., Andlauer, T. F. M., Breen, G., Byrne, E. M., Clarke, T.-K., Forstner, A. J., Grabe, H. J., Hamilton, S. P., Levinson, D. F., Lewis, C. M., Lewis, G., Martin, N. G., Milaneschi, Y., Mors, O., Müller-Myhsok, B., Penninx, B. W. J., … Flint, J. (2020). Minimal phenotyping yields genome-wide association signals of low specificity for major depression. Nature Genetics (52)4, 437–447.

Howard, D. M., Adams, M. J., Clarke, T. K., Hafferty, J. D., Gibson, J., Shirali, M., … & McIntosh, A. M. (2019). Genome-wide meta-analysis of depression identifies 102 independent variants and highlights the importance of the prefrontal brain regions. Nature Neuroscience, 22(3), 343-352.

Alicia Peel

Author Alicia Peel

More posts by Alicia Peel