Joni Coleman tackles the complex subject of delivering transparent, robust science to all – and almost manages not to talk about Twitter.


If you’ve ever volunteered to take part in research: you should be angry.
If you pay your taxes: you should be angry.
If you believe the scientific method is central to rationalism, modernity, and progress: YOU SHOULD BE ANGRY.

Because much of science is failing you. Last year, well over a million papers were published in the biomedical sciences alone, many of which will have been funded through public or charitable monies, and will have relied inescapably on the generosity of participants. But (unless you are lucky enough to have access to a subscription) many of them will be unavailable for you to read. Science is, to a great extent, a closed shop.

 

Pictured: Deep metaphor

 

But, increasingly, things are changing. Open science initiatives are blooming, with the intent of making science more accessible, more transparent, and more rapid. In this blog, I will introduce some of the key enterprises of open science, and assess how well they are going. Check out the links, or squawk at me on Twitter (@Joni_Coleman) if you want to know more.

Doing science: Data sharing, open source, and collaboration

It is possible (if crude) to reduce science to three pillars: hypothesis (what you think should happen in a given situation), data (what actually happens) and interpretation (how the data relate to the hypothesis). Scientists are generally pretty free with communicating their hypotheses and interpretations (although more on that later…), but the communication of data is often slower. Sometimes there are good reasons for that – anyone who has an email address knows all too well now that data protection is a hot topic¸ and sometimes making data openly available would risk the privacy of participants. In other cases, scientists have an understandable desire to protect the investment they have made in gathering data, an expensive and strenuous task that is often undervalued in the reward systems of science.

“Data sharing and open data were among the central concepts in the mapping of the human genome”

However, there are clear benefits to sharing data, both in terms of the rigour of science (I might trust your conclusions far more if I am able to reach the same answer from the data) and also in terms of the speed of progress. This has been a cornerstone of my own field, genomics. Data sharing and open data were among the central concepts in the mapping of the human genome; the Bermuda Accord (1996) established a precedent that raw human sequence data should be made public-access within 24 hours of being generated. In part, this was a necessary step to enable the international collaboration to function effectively, but it also set the background for an approach to data sharing that has arguably stimulated the proliferation of genomic studies in the last 10 years. For example, the summary results from each Psychiatric Genomics Consortium study are made publicly available, and this enables further discoveries to be made much more speedily. Genomics is by no means the only field to adopt such an approach either.

 

Collaboration: this can also be achieved with Windows or Linux laptops

Another key tenet of the surge in genomic data generation has been the open code movement, and the idea of sharable reproducible analysis scripts. Freely-available programming languages like R and python build on community-written “packages” enabling others to perform specific analyses, while literate programming initiatives encourage clear writing of scripts. Increasingly, journals are asking for code to be available at review. While this is somewhat terrifying for the analyst (who has to let their ugly, esoteric code out in public), it ensures that reviewers are able to understand precisely how results have come about from data. Combined with open data, open code allows any reader to understand (and build upon) the original science.

 

Publishing science: Open access and pre-printing, transparency in reviews, and registered reports

So once the study’s done, that’s the hard bit over, right? Au contraire. Publishing a scientific paper can often be a long, involved and expensive process, and often ends with a paper locked behind a paywall that costs $10 a pop to read. Let’s dissect the differences the open access approach can make.

From submission to publication, the traditional route can commonly take months – our GWAS of cognitive behavioural therapy, for example, was published approximately two years after it was completed. In a field like genomics, the fast pace of innovation means that papers can be outdated by the time they emerge. Here is where open science has scored considerable success in biology in the last few years. The concept of pre-printing (making an early, pre-peer review version of articles publicly available) is far from novel (mathematics has been arxiv-ing data for years), but has been instrumental in disseminating research quickly. There are risks to this, however, particularly in terms of the citation of preprint articles. Peer review can be a valuable system, allowing a critical eye to catch errors of logic or weaknesses in argument that the authors may have missed. Without that eye, preprints become liable to change, rendering citations of them potentially erroneous further down the line. Arguably, this is also the case with published papers, however – there is generally poor recognition of the amount of citations given to flawed studies that have since been retracted. Preprints should be read carefully to assess the validity of their argument – as should every article cited.

“if most good science doesn’t produce headline-grabbing outcomes, agreeing to publish good science will leave many headlines ungrabbed”

Careful and considered reading can enable you to form a well-reasoned opinion on a paper. But you can only judge what is presented to you. How do you as a reader know whether the results in the paper answer the hypothesis the authors originally sought to answer? A certain degree of trust is required, and, unfortunately, multiple well-publicised cases of scientific fraud have shown the incentives of science drive some authors to lie. Open science can combat this through pre-registration, outlining exactly what you are going to do before you do it. This could also extend to writing a registered report – agreeing with a journal (and with the peer reviewers appointed by that journal) that the question you are asking and the methods you will use are scientifically interesting and valid, and so your article should be published regardless of results. Both of these seem fine ideas. However, they have been slow to be adopted. In part, that is driven by the commercial concerns of publishers – if most good science doesn’t produce headline-grabbing outcomes, agreeing to publish good science will leave many headlines ungrabbed. However, there are also concerns from the scientists’ end. Scientists could (and do) “pre-register” studies wrongly, or could register a study that has already been completed with positive results (making it unclear if there were negative results that were not reported). For a more honest concern, many papers are inherently exploratory, and as such not all analyses could be preregistered – it is not always clear how such results would be dealt with in a pre-registered study. The concerns around pre-registration and registered reports need to be carefully and clearly resolved and communicated – making such a change will need further changes to the culture of science as a whole.

Back to that article you wrote. It’s gone in, and has changed in peer review considerably. Ideally that was because the editor picked careful, wise reviewers. Or maybe they picked terrible reviewers, who barely read the paper other than to demand the authors cite a bunch of papers written by the reviewer themselves. How can the future reader tell? In the majority of cases, peer reviews remain confidential to the editor, journal and authors. However, there is an increasing drive to publish reviews openly, so everyone can judge how reasonable the reviewer was in dissecting the paper. However, most reviews are performed anonymously. Again, there is a movement within open science towards signed reviews – if you are writing honestly and fairly, why wouldn’t you want your name attached to your review? Given that reviewers largely go unrewarded, and that science relies to a great extent on reputation, building a name for yourself as a good, fair reviewer would be beneficial. But. One of the biggest criticisms of signed reviews is that it could restrict the reviewer from being honest and robust in their review of a bad paper. Science relies on reputation – if a named reviewer makes a powerful author look foolish, what’s to prevent that author damaging the reviewer’s career? It’s perhaps unclear whether open, signed reviews would alleviate this – sunlight may be the best disinfectant, or it could simply make the honest reviewer more open to such covert attacks.

Finally, the paper is reviewed and published. How can readers access it? Currently, there are three main scenarios. Firstly, some papers are housed in subscription-only journals with no free options, despite the best efforts of the funding agencies (although you could try contacting the author. You certainly shouldn’t visit websites of dubious morality). Secondly, the paper might remain behind a paywall for 6-18 months before being released onto an open-access repository. Or, the authors might pay a fee to have the paper published openly from the beginning. These article processing charges allow the costs of publication to be met – however, they have also led to the creation of predatory pay-to-publish journals with little concern about the quality of science.

“enterprises like the Open Science Framework will not bring open science to the forefront alone – that will require a wholesale shift in the culture and approaches of science”

There is much to be done across many areas to make science more open; allow me to highlight one particular effort to meet the challenge. The Open Science Framework (OSF) is an open community for performing science, stretching across disciplines. Amongst the many initiatives the OSF is attempting to implement are recognition for good open science practices in individual papers; integration of robust and reproducible coding and data workflows; forums to encourage and enable collaborations; and the capacity to register, manage and archive a project’s data. Such enterprises will not bring open science to the forefront alone – that will require a wholesale shift in the culture and approaches of science. However, the OSF, and efforts like it, have an important role to play in making a more robust science that can deliver reliable progress.

Finally, it wouldn’t be a blog by me if it didn’t mention Twitter… With the ability to communicate globally, there is a huge opportunity for scientists to engage with each other and with members of the public in discussing their results and explaining the cool things they find. This can happen through many avenues, be it Twitter, Reddit (check out their Ask Me Anything sections) or blogs, covering research, methods and much, much more. Such discussions are invaluable to science – and were also invaluable to this blog in particular. As such, I must thank those listed below for taking the time to point out several exciting areas of open science that I hadn’t thought of! Any errors, inaccuracies or omissions in this blog are mine alone.

Thanks to the following for responding to spontaneous evening Twitter:
Jordan Anaya (@OmnesResNetwork), Lisa DeBruine (@lisadebruine), Nick Brown (@sTeamTraen), Chris Chambers (@chrisdc77), Mark Adams (@mja), Gustav Nilsonne (@GustavNilsonne), Mira van der Naald (@MiravdNaald), and Amy Riegelman (@amylibrarian).

Jonathan Coleman

Author Jonathan Coleman

More posts by Jonathan Coleman