Reflecting on an interest gained during his MSc, EDIT Lab PhD student Tim explains what reproducibility is, what contributes to non-reproducibility, how the scientific community set about fixing it, and what the future holds for the quality of scientific output.

 

Tim Kerr, EDIT Lab PhD student


In my late teens and early twenties I read Ben Goldacre’s Bad Science and Bad Pharma. Both books augmented my prevailing scepticism of claims made in studies, with a new understanding that science could be misused by bad actors with something to gain. I forevermore checked any study’s funder for industrial ties, and looked out for blinding in clinical trials. I continued on under the assumption that the rest of science was all above board. That I could believe what I read.

It was some years later, during my MSc in Neuroscience, that the carpeted floor of the Wolfson Lecture Theatre was pulled from under me. I unexpectedly encountered the concept of reproducibility and wondered why a crisis had spawned in its name.

Reproducibility

Reproducibility, strictly, means that published results can be duplicated from the methods and materials outlined in the original publication. A kind of scientific hallmark, it increases the likelihood that the finding is true.

A few high profile replication studies conducted in the early 2010’s, examined findings in fields including psychology, oncology, and the social sciences. The analyses discredited hundreds of individual studies for having non-reproducible results. The conclusion was that the findings could be untrue, and a reproducibility crisis was declared at this iceberg tip (1).

Some felt the crisis narrative was overblown. The replication of results being the hill upon which results went to die was seen as arbitrary. The non-replication of a result does not necessarily prove a finding to be false, and there are legitimate reasons why a study may fail to replicate.

Nevertheless, it signalled to the science community, that there was a pervasive yet subtle effect diminishing the robustness of research. This nuance was what I found interesting. Published falsehoods weren’t the product of fraud and deception, which are limited to studies by single individuals, and easy to spot. Instead, they are typically unintentional, stemming from very human biases – the same biases I once had, assuming that science inherently self-corrected, and did not reflect the imperfections of the scientists involved.

Consequently, the concept of reproducibility has shifted to include other hallmarks of scientific rigour which impact the quality of results. Belief in a published result is today likely to be based on the number of good research practices used in the making of the paper, acting as a proxy for likely reproducibility. This is handy, as replications are rarely undertaken in practice.

Journals. The cause of, and solution to, all of a researcher’s problems.

What contributes to irreproducible results?

Enthused by the topic, I wrote an essay about the factors contributing to the crisis for my MSc. Two thousand words, plus or minus ten percent, of spellbinding prose. I intricately outlined publication bias, the file draw problem, underpowered studies, small sample sizes, analytic flexibility, HARKing, p-hacking, perverse incentives, and predatory journals.

Despite having definitely read all of the 100 references I used to paint my hyperbolic picture, these concepts remained quite abstract. I wasn’t yet a researcher. I couldn’t quite understand why, in reality, scientists weren’t always funded to use massive sample sizes, didn’t grasp the minutiae of statistical inference, or might be unable to outline a flawless analytic plan a priori, requiring no future amendment.

But a few years on from this essay, approaching the end of my first year of a PhD, the abstract nature of these contributory factors are far more tangible. Helped by the intervening pandemic bringing paradoxical life to the existential topic of how science should be conducted.

Perhaps owing to my time in the SGDP, I now view research behaviour as the output of an interaction between the researcher and their environment. The individual researcher, with their own biases, levels of expertise, insight, and accountability, often shoulders much of the blame for an irreproducible result. It is their name atop the paper.

But they are shaped by an environment, both locally and nationally. They are trained, with curricula not set by them. They publish their research in journals which profit more from positive results garnering greater citation numbers. They are employed by universities which lust after high levels of research output, reducing time for rigour. And they in turn are funded through research bodies seeking value for money, tempering sample sizes.

Therefore, solutions to these problems which focus exclusively on the individual, will fail to address the environment, and thus fail to resolve irreproducibility.

Steps taken to promote reproducibility

False findings masquerading as truth can remain in the literature for some time, usually only to be uncovered through the glacial pace of science via meta analyses, or by determined individuals following a hunch. The costs associated with these retrospectively discovered false findings are many. They hinder the progress of science. They render it inefficient, costly, demoralising, and possibly dangerous.

Proactive corrective measures are a pragmatic solution to preventing false findings ever reaching the page. They increase the likelihood of findings being true and inspire confidence in both science and the scientist.

The first big steps taken were to increase transparency and openness of research, as described in this EDIT lab blog. A shift to pre-registrations prevents the analytic flexibility and p-hacking endemic to much non-reproducible research. Online repositories and the publication of code allow mistakes to be identified and results promptly corrected.

Some journals now commit to publication of non-significant results. Many funders commit to open access to research outputs and prioritise reproducibility in grant assessments. A few universities commit to hiring plans not based exclusively on research outputs and H indices. Fuelled by the zeitgeist, reproducibility networks and organisations have sprung into existence, encouraging researchers into better research through cultural and educational practices.

Whether these have an actual impact on the overall quality of science is unclear as yet, it may be too early to tell.

A reproduction of the room the committee convened in. Despite using similar apparatus, the results of the original committee are unlikely to be reproduced in this setup, as the methods section failed to mention the need to invite participants to fill the chairs.

The future of reproducibility

I think there is work to be done on one particular component or reproducibility, which is desired, and obvious, but often omitted.

When asked what could improve the reproducibility crisis in 2016, scientists overwhelmingly responded with calls for better education and training (2). I was quite surprised then, that as a new PhD student, on a scientific doctoral training pathway in 2022, there was zero mention of reproducibility within our training (though it was provided by the team I joined).

For the ethos of reproducibility to be ingrained in national scientific culture, it must be taught early and widely. In practice, this would require centrally set educational guidelines, ultimately by those we democratically elect to lead us.

To this end, a number of key players in the reproducibility world gave evidence to a recent parliamentary inquiry into Reproducibility and Research Integrity. On the back of this evidence, the committee produced a report, listing their conclusions and policy suggestions (3); itself pleasingly reproducible, with full transcripts and evidence available online.

The government, or more specifically, The Department for Science, Innovation & Technology (DSIT), then had the option to accept or refute these suggestions. It is, alas, the best mechanism we appear to have for experts in their fields to contribute to policy.

Predictably, DSIT rejected many of the suggestions put forward. There will be no scientific misconduct regulator. No dedicated funding for statistical experts within research teams. No programme to specifically fund replication studies. The slow pace of science may well have met its match in the speed of legislature.

However, one suggestion was fully accepted: the need for reproducibility to be embedded within university science education. And there was specific mention of doctoral training pathways like mine being required to incorporate such training within their programmes.

I think this is a pleasing and pragmatic outcome. Unlike strategy which can be implemented instantly, culture is something which develops over time. By mandating the teaching of reproducibility at the start of scientific careers, the ethos will exist in the minds of researchers long after the crisis narrative has simmered. And we will wonder why we ever cared so much about impact factors.

 

(1) https://www.bbc.co.uk/news/science-environment-39054778

(2) Baker, M. 1,500 scientists lift the lid on reproducibility. _Nature_ **533**, 452–454 (2016). https://doi.org/10.1038/533452a

https://www.nature.com/articles/533452a

(3) https://publications.parliament.uk/pa/cm5803/cmselect/cmsctech/101/summary.html

Tim Kerr

Author Tim Kerr

More posts by Tim Kerr