{"id":1773,"date":"2017-11-20T15:00:10","date_gmt":"2017-11-20T14:00:10","guid":{"rendered":"http:\/\/blogs.kcl.ac.uk\/editlab\/?p=1773"},"modified":"2018-02-16T15:50:44","modified_gmt":"2018-02-16T14:50:44","slug":"book-review-how-to-lie-with-statistics","status":"publish","type":"post","link":"https:\/\/blogs.kcl.ac.uk\/editlab\/2017\/11\/20\/book-review-how-to-lie-with-statistics\/","title":{"rendered":"Book review: &#8216;How to Lie with Statistics&#8217;"},"content":{"rendered":"<h2 style=\"text-align: right\"><strong>Methods and concepts in Behavioural Genetics are intrinsically statistical, and jargon and acronyms abound. This often makes the research difficult for people outside of the field (and in the field) to understand and critique. In light of this, Rosa [EditLab PhD student] takes a look at the world\u2019s most famous statistics book, pulling out examples of everyday statistical slip-ups, and applying them to behavioural genetics.<\/strong><\/h2>\n<h2>\u00a0<a href=\"http:\/\/blogs.kcl.ac.uk\/editlab\/wp-content\/blogs.dir\/166\/files\/2016\/07\/Rosa1.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignright size-thumbnail wp-image-243\" src=\"http:\/\/blogs.kcl.ac.uk\/editlab\/wp-content\/blogs.dir\/166\/files\/2016\/07\/Rosa1-150x150.jpg\" alt=\"Rosa(1)\" width=\"150\" height=\"150\" srcset=\"https:\/\/blogs.kcl.ac.uk\/editlab\/files\/2016\/07\/Rosa1-150x150.jpg 150w, https:\/\/blogs.kcl.ac.uk\/editlab\/files\/2016\/07\/Rosa1-50x50.jpg 50w, https:\/\/blogs.kcl.ac.uk\/editlab\/files\/2016\/07\/Rosa1-100x100.jpg 100w\" sizes=\"auto, (max-width: 150px) 100vw, 150px\" \/><\/a><\/h2>\n<hr \/>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>Most behavioural geneticists probably hope that their research will be understood by their family and friends, other researchers, and even journalists and politicians. It is not very handy, then, that an accurate definition of heritability, the most important concept in behavioural genetics, cannot be reduced to much more than \u2018the proportion of variance in a trait that is explained by genetic variance\u2019<strong>*<\/strong>. For us all to interrogate and interpret evidence, and spread good ideas, we need to demystify definitions, and get everyone on the same page with some of the basics of statistical fluency.<\/p>\n<p>\u2018How to Lie with Statistics\u2019, a humorous statistics primer for the general reader (first published in 1954) offers much inspiration. The writer, Darrell Huff, was not a trained statistician, but the relatable, vivid and breezy language resulting from this makes the book an absorbing read. Many of the \u2018ways to lie with statistics\u2019 included in the book are crude and apply more to advertising in the 1950s than modern peer-reviewed science (e.g. chopping off much of the axes on a graph to make a curve appear steep). However, the uncomfortable truth is that behavioural research is difficult and messy. Sometimes, research articles tread the fine line between a neat summary and a purposeful misrepresentation. Bias of journals and media outlets towards \u2018exciting\u2019 \u2018novel\u2019 \u2018positive\u2019 results is <a href=\"https:\/\/www.nature.com\/articles\/s41562-016-0021\">only one of the problems<\/a> reducing the reliability and transparency of research. It is essential to think about how things can go wrong in research right from the initial design to analysis and publication, in order to spot what information is <em>not <\/em>presented as well as potential problems with what is.<\/p>\n<p>Here are three examples of general statistical concepts from Huff\u2019s book and how they relate to Behavioural Genetics.<\/p>\n<h3><span style=\"text-decoration: underline\">\u2018The sample with the built-in bias\u2019 (selection bias)<\/span><\/h3>\n<p>\u201cA psychiatrist reported once that practically everybody is neurotic. Aside from the fact that such use destroys any meaning in the word \u2018neurotic, take a look at the man\u2019s sample\u201d. The psychiatrist\u2019s impression has been biased by their line of work.<\/p>\n<p>To be useful, a sample for a scientific study should be representative of the population it is trying to investigate. But by definition, a sample doesn\u2019t give a complete objective picture of the whole human population. Often, it has selected itself. We need to work out where the bias is coming from and take it into account. This is difficult when the people we want to know about are precisely the people who aren\u2019t participating. These people could be less likely to agree to take part and to remain in a study over time for many factors such as old\/young age, disinterest, ill-health, living far away, not being able to afford a bus trip and so on. So when we use our biased sample to investigate conscientiousness, we might reach the conclusion, like the psychiatrist in the book, that \u2018practically everyone is conscientious\u2019.<\/p>\n<div style=\"width: 399px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/mathbabe.files.wordpress.com\/2014\/02\/screen-shot-2014-02-03-at-7-02-22-am.png\" alt=\"\" width=\"389\" height=\"362\" \/><p class=\"wp-caption-text\">Our samples are not necessarily representative of the population we&#8217;re interested in &#8211; and this can bias our results.<\/p><\/div>\n<p>Turning to genetic research, we know that conscientiousness, as well as all the other factors listed are heritable. If we search for genetic variants that are more common in very conscientious people than non-conscientious people, we could be scanning for genetic markers that influence correlated traits more than they influence conscientiousness. Researchers often adjust for such heritable correlated traits in order to discover genetic variants <em>independently<\/em> associated with the primary trait. However, it isn\u2019t as simple as that \u2013 adjusting for covariates <a href=\"https:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC4320269\/\">can lead to false positives<\/a>.<\/p>\n<p>There are many sample-related problems for genetic association studies, such as <a href=\"https:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC2975875\/\">genetic ancestry differences between cases and controls<\/a> and the <a href=\"https:\/\/www.nature.com\/news\/genomics-is-failing-on-diversity-1.20759\">neglect of individuals with non-European ancestry<\/a>. This is particularly problematic if we come to use genetic variants (amongst other factors) to predict the onset of medical and psychiatric issues, because our \u2018precision medicine\u2019 solutions won\u2019t be helpful for everyone around the world.<\/p>\n<h3><span style=\"text-decoration: underline\">Difficulties with reporting and measuring<\/span><\/h3>\n<p>When samples rely on people to tell the truth about themselves, we might find out more about who they want to be than who they are. Duff uses the example of the university alumni questionnaire. In addition to the problem that the questionnaire reaches a probably biased sub-sample (e.g. with known addresses), there is the problem that people can\u2019t be trusted to give accurate self-reports. Bragging likely inflates universities\u2019 figures for the average graduate salary. Although this could be balanced out by people minimising their salaries to evade tax.<\/p>\n<p>Unfortunately, the traits that are most interesting to behavioural scientists are often <a href=\"http:\/\/blogs.kcl.ac.uk\/editlab\/2017\/05\/31\/data-real-or-reified\/\">not easily measured and quantified<\/a>. For example, studies of mental health problems rely on self-reports of symptoms such as abnormal experiences and beliefs, which are vulnerable to response bias, rather than \u2018objective\u2019 biomarkers. Some samples, such as the Twins Early Development Study, have valuable cross-reporter data. Interestingly, parents, teachers and children themselves do not agree much about the severity of childrens\u2019 behavioural problems: their ratings tend to have a correlation of only ~0.3. Genetic research shows that the strongest genetic influences are on what raters see in common about children\u2019s behaviour (\u2018trans-situational behaviour\u2019), but there are also <a href=\"https:\/\/www.ncbi.nlm.nih.gov\/pubmed\/14566173\">rater-specific genetic and environmental influences<\/a>. We might assume that tapping into what is shared, and removing rater-specificity, gives a better, more accurate measure. However, rather than emphasising disagreement and reporter-specific error, many researchers now highlight that parents, teachers and children have different insights, and are reporting on different aspects of a child\u2019s behaviour. For example, children don\u2019t necessarily behave in the same way at home as at school. These aspects of behaviour seem to have different genetic and environmental influences on them.<\/p>\n<h3><span style=\"text-decoration: underline\">Correlation vs causation<\/span><\/h3>\n<p>Duff discusses the \u2018fallacy that if B follows A, then A caused B\u2019. Smokers may end up with worse grades than non-smokers, but smoking isn\u2019t necessarily what is causing the worse results. The relationship could be the other way round, with poor grades driving individuals to chain-smoking, or there could be no real relationship at all. More likely, some third factor influences both such that they appear related. Perhaps more extraverted people, or less intelligent people, are more likely to both smoke and achieve worse results. There are countless reasonable explanations.<\/p>\n<div style=\"width: 469px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"irc_mi\" src=\"https:\/\/media.licdn.com\/mpr\/mpr\/shrinknp_800_800\/p\/7\/005\/0a6\/0bd\/19ccc13.jpg\" alt=\"Related image\" width=\"459\" height=\"185\" \/><p class=\"wp-caption-text\">Correlation does not equal causation<em>\u00a0 (Image from<a title=\"xkcd: correlation\" href=\"https:\/\/www.google.co.uk\/url?sa=i&amp;rct=j&amp;q=&amp;esrc=s&amp;source=images&amp;cd=&amp;cad=rja&amp;uact=8&amp;ved=0ahUKEwj1jLuFr8vXAhVQKOwKHeVlDQAQjRwIBw&amp;url=%2Furl%3Fsa%3Di%26rct%3Dj%26q%3D%26esrc%3Ds%26source%3Dimages%26cd%3D%26cad%3Drja%26uact%3D8%26ved%3D%26url%3Dhttps%253A%252F%252Fwww.linkedin.com%252Fpulse%252Fcausation-vs-correlation-alex-jones%26psig%3DAOvVaw2sOJdVtbEKCcLqrMD-ypqZ%26ust%3D1511205581549012&amp;psig=AOvVaw2sOJdVtbEKCcLqrMD-ypqZ&amp;ust=1511205581549012\" target=\"_blank\"> xkcd<\/a>)<\/em><\/p><\/div>\n<p>In terms of <a title=\"The intergenerational transmission of anxiety\" href=\"http:\/\/ajp.psychiatryonline.org\/doi\/abs\/10.1176\/appi.ajp.2015.14070818\" target=\"_blank\">our research<\/a>, it is well-established that the development of anxiety is partly down to environmental factors. But negative parenting, for example, doesn\u2019t necessarily have a simple, one-way influence on anxiety. It can go the other way around, with childhood anxiety <a title=\"Does childhood anxiety evoke maternal control?\" href=\"http:\/\/onlinelibrary.wiley.com\/doi\/10.1111\/j.1469-7610.2010.02227.x\/full\" target=\"_blank\">shaping negative parenting<\/a>. Importantly, we need to account for the role of genetics (analogous to the third factor in Duff\u2019s example) in studies. This is because parenting, like most measures of the environment, shows significant genetic influence. Consequently, genetic influences on both parenting and child anxiety may account for their association. Genetic influence on exposure to environments is termed \u2018genotype-environment correlation\u2019. <em>Passive <\/em>genotype-environment correlation reflects the fact that biological parents provide both genes and the environment to their children, leading the two to be correlated. For example the offspring of anxious mothers will likely receive a genetic predisposition for anxiety as well as the environmental effects of an anxious parent. Another mechanism is <em>evocative<\/em> genotype-environment correlation, where environmental responses are evoked by genetically-influenced behaviour. For example, infants who cry easily might be more likely to evoke negative parenting.<\/p>\n<p>The Children-of-Twins (CoT) design can control for shared genes between generations and so help unravel the mechanisms of intergenerational transmission. CoT data indicate that the association between parental and adolescent offspring anxiety largely arises because of a direct association between parents and their children independent of genetic confounds (i.e. living together) <a title=\"The intergenerational transmission of anxiety\" href=\"http:\/\/ajp.psychiatryonline.org\/doi\/full\/10.1176\/appi.ajp.2015.14070818\" target=\"_blank\">(Eley et al., 2015).<\/a>\u00a0<em>[Edit:\u00a0For more on the CoT design and intergenerational\u00a0associations see <a title=\"Heritability and intergenerational associations\" href=\"http:\/\/blogs.kcl.ac.uk\/editlab\/2016\/11\/11\/heritability-and-intergenerational-associations\/\" target=\"_blank\">this great post<\/a> by Tom McAdams]<\/em><\/p>\n<p>&nbsp;<\/p>\n<hr \/>\n<p>&nbsp;<\/p>\n<p><strong>*<\/strong> Or, \u2018the extent that trait differences can be explained by inherited differences in our DNA\u2019. Heritability is a property of the population and its variability, not the individual \u2013 it does not capture why a particular person has a trait\/disease. It is specific to how the trait was measured, and to the population sample it was measured in.\u00a0 Therefore, we look more at overall patterns across different studies rather than specific estimates.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Methods and concepts in Behavioural Genetics are intrinsically statistical, and jargon and acronyms abound. This often makes the research difficult for people outside of the field (and in the field) to understand and critique. In light of this, Rosa [EditLab PhD student] takes a look at the world\u2019s most famous&#8230;<\/p>\n","protected":false},"author":45,"featured_media":1784,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[9],"tags":[220,221,222,168,219],"class_list":["post-1773","post","type-post","status-publish","format-standard","has-post-thumbnail","category-research-matters","tag-behavioural-genetics","tag-bias","tag-interpretation","tag-science-communication","tag-statistics"],"_links":{"self":[{"href":"https:\/\/blogs.kcl.ac.uk\/editlab\/wp-json\/wp\/v2\/posts\/1773","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blogs.kcl.ac.uk\/editlab\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.kcl.ac.uk\/editlab\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.kcl.ac.uk\/editlab\/wp-json\/wp\/v2\/users\/45"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.kcl.ac.uk\/editlab\/wp-json\/wp\/v2\/comments?post=1773"}],"version-history":[{"count":8,"href":"https:\/\/blogs.kcl.ac.uk\/editlab\/wp-json\/wp\/v2\/posts\/1773\/revisions"}],"predecessor-version":[{"id":1782,"href":"https:\/\/blogs.kcl.ac.uk\/editlab\/wp-json\/wp\/v2\/posts\/1773\/revisions\/1782"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blogs.kcl.ac.uk\/editlab\/wp-json\/wp\/v2\/media\/1784"}],"wp:attachment":[{"href":"https:\/\/blogs.kcl.ac.uk\/editlab\/wp-json\/wp\/v2\/media?parent=1773"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.kcl.ac.uk\/editlab\/wp-json\/wp\/v2\/categories?post=1773"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.kcl.ac.uk\/editlab\/wp-json\/wp\/v2\/tags?post=1773"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}