This has been a week chock-full of bias! First nature ran a cover story on it, with an editorial, and a very nice introduction into the subject by Regina Nuzzo. Then Malcolm Macleod and colleagues published a perspective in Plos Biology demonstrating limited reporting of measures to reduce the risk of bias in life sciences publications, and that there may be an inverse correlation between journal rank or prestige of the University from which the research originated and presence of measures to prevent bias. At the same time Jonathan Kimmelman’s group came out with a report in eLife in which they meta-analytically explored preclinical studies of an anticancer drug (sunitinib) to demonstrate that only a fraction of drugs that show promise in animals end up proving safe and effective in humans, partly because of design flaws, such as lack of prevention of bias, and partly due to positive publication bias. Both articles resulted in a worldwide media frenzy, including coverage by Nature and the lay press, here is an example from the Guardian. Retraction Watch interviewed Jonathan, while Malcolm spoke on BBC4.
Eine Sendung des Deutschlandfunk (ausgestrahlt 20.9.15) von Martin Hubert. Aus der Ankündigung: ‘Biomediziner sollen in ihren Laboren unter anderem nach Substanzen gegen Krebs oder Schlaganfall suchen. Sie experimentieren mit Zellkulturen und Versuchstieren, testen gewollte Wirkungen und ergründen ungewollte. Neuere Studien zeigen jedoch, dass sich bis zu 80 Prozent dieser präklinischen Studien nicht reproduzieren lassen.’ Hier der Link zum Audiostream bzw. zum Transkript.
(German only, sorry!)
The crisis in scientific reproducibility has crystalized as it has become increasingly clear that the faithfulness of the majority of high-profile scientific reports is with little foundation, and that the societal burden of low reproducibility is enormous. In todays issue of Nature, C. Glenn Begley, Alastair Buchan, and myself suggest measures by which academic institutions can improve the quality and value of their research. To read the article, click here.
Our main point is that research institutions that receive public funding should be required to demonstrate standards and behaviors that comply with “Good Institutional Practice”. Here is a selection of potential measures, implementation of which shuld be verified, certified and approved by major funding agencies.
Compliance with agreed guidelines: Ensure compliance with established guidelines such as ARRIVE, MIAME, data access (as required by National Science Foundation and National Institutes of Health, USA).
Full access to the institution’s research results: Foster open access and open data; preregistration of preclinical study designs.
Electronic laboratory notebooks: Provide electronic record keeping compliant with FDA Code of Federal Regulations Title 21 (CFR Title 21 part 11). Electronic laboratory notebooks allow data and project sharing, supervision, time stamping, version control, and directly link records and original data.
Institutional Standard for Experimental Research Conduct (ISERC): Establish ISERC (e.g. blinding, inclusion of controls, replicates and repeats etc); ensure dissemination, training and compliance with IMSERC.
Quality management: Organize regular and random audits of laboratories and departments with reviews of record keeping and measures to prevent bias (such as randomization and blinding).
Critical incidence reporting: Implement a system to allow the anonymous reporting of critical incidences during research. Organize regular critical incidence conferences in which such ‘never events’ are discussed to prevent them in the future and create a culture of research rigor and accountability.
Incentives and disincentives: Develop and implement novel indices to appraise and reward research of high quality. Honor robustness and mentoring as well as originality of research. Define appropriate penalties for substandard research conduct or noncompliance with guidelines. These might include decreased laboratory space, lack of access to trainees, reduced access to core facilities.
Training: Establish mandatory programs to train academic clinicians and basic researchers at all professional levels in experimental design, data analysis and interpretation, as well as reporting standards.
Research quality mainstreaming: Bundle established performance measures plus novel institution-unique measures to allow a flexible, institution-focused algorithm that can serve as the basis for competitive funding applications.
Research review meetings: create forum for routine assessment of institutional publications with focus on robust methods: the process rather than result.
Biomedicine currently suffers a ‚replication crisis‘: Numerous articles from academia and industry prove John Ioannidis’ prescient theoretical 2005 paper ‘Why most published research findings are false’ (Why most published research findings are false) to be true. On the positive side, however, the academic community appears to have taken up the challenge, and we are witnessing a surge in international collaborations to replicate key findings of biomedical and psychological research. Three important articles appeared over the last weeks which on the one hand further demonstrated that the replication crisis is real, but on the other hand suggested remedies for it:
Two consortia have pioneered the concept of preclinical randomized controlled trials, very much inspired by how clinical trials minimize bias (prespecification of a primary endpoint, randomization, blinding, etc.), and with much improved statistical power compared to single laboratory studies. One of them (Llovera et al.) replicated the effect of a neuroprotectant (CD49 antibody) in one, but not another model of stroke, while the study by Kleikers et al. failed to reproduce previous findings claiming that NOX2 inhibition is neuroprotective in experimental stroke. In Psychology, the Open Science Collaboration conducted replications of 100 experimental and correlational studies published in three psychology journals using high-powered designs and original materials when available. Disapointingly but not surprisingly, replication rates were low, and studies that replicated did so with much reduced effect sizes.
On March 17, 2015 five panelists from cognitive neuroscience and psychology (Sam Schwarzkopf, Chris Chambers, Sophie Scott, Dorothy Bishop, and Neuroskeptic) publicly debated “Is science broken? If so, how can we fix it?” . The event was organized by Experimental Psychology, UCL Division of Psychology and Language Sciences / Faculty of Brain Sciences in London.
The debate revolved around the ‘reproducibility crisis’, and covered false positive rates, replication, faulty statistics, lack of power, publication bias, study preregistration, data sharing, peer review, you name it. Understandably the event caused a stir in the press, journals, and the blogosphere (Nature, Biomed central, Aidan’s Aviary, The Psychologist, etc…).
Remarkably, some of the panelists (notably Sam Schwarzkopf) respectfully opposed the current ‘crusade for true science’ (to which I must confess I subscribe) by arguing that science is not broken at all, but rather, by trying to fix it we run the risk to wreck it for good. Already a few days before the official debate, he and Neuroskeptic had started to exchange arguments on Neuroskeptic’s blog. While both parties appear to agree that science can be improved, they completely disagree in their analysis of the current status of the scientific enterprise, and consequently also on action points.
This predebate argument directed my attention to a blog, which was run by Sam Schwarzkopf, or rather his alter ego, the ‘Devil’s neuroscientist’ for a short, but very productive period. Curiously, the Devil’s neuroscientist retired from blogging the night before the debate, threatening that there will be no future posts! This is sad, because albeit somewhat aggressively, but very much to the point, the Devil’s neuroscientist tried to debunk the thesis that there is any reproducibility crisis, that science is not self-correcting, that studies should be preregistered, etc. In other words, he was arguing against most of the issues raised and remedies suggested also on my pages. In passing, he provided a lot of interesting links to proponents on either side of the fence. Although I do not agree with many of his conclusions, his is by far the most thoughtful treatment on the subject. Most of the time I discuss with fellow scientist who dismiss problems of the current model of biomedical research I get rather unreflected comments. They usually simply celebrate the status quo as the best of all possible worlds and don’t get beyond the statement that there may be a few glitches, but that the model has evolved over centuries of undeniable progress. “If it’s not broken, don’t fix it.”
The Devil’s blog stimulated me to produce a short summary of key arguments of the current debate, to organize my own thoughts and as a courtesy to the busy reader. Continue reading
‘Scientific rigor and the art of motorcycle maintenance‘ was another recent fine analysis on reliability issues in current biomedicine (Munafo et al. Nat Biotechnol. 32:871-3). If you only want to read one article, this may be it. It nicely sums up the problems and suggests all the relevant measures (see most of my previous posts). But besides the reference to Robert Pirsig’s 1974 novel, what is new on the article is the comparison of the scientific enterprise to the automobile industry, which successfully responded to quality problems with structured quality control (for a more thorough treatment, see the previous post on trust and auditing). Here is their conclusion:
‘Science is conducted on the principle that it is self-correcting, but the extent to which this is true is an empirical question. The more that quality control becomes integrated into the scientific process itself, the more the whole process becomes one of continual improvement. Implementing this at the level of production implies a culture of incentivizing, educating and empowering those responsible for production, rather than policing quality after the fact with ‘quality inspectors’ (i.e., peer reviewers) or, even more distally, requiring attempts at replication. We think this insight, applied successfully to automobile manufacturing in the 1970s, can also be profitably applied to the practice of scientific research to build a more solid foundation of knowledge and accelerate the research endeavor.’
It is time to act!
Many of the concrete measures proposed to improve the quality and robustness of biomedical research are greeted with great skepticism: ‘Good idea, but how can we implement it, and will it work?’. So here are a few recent best practice examples regarding two key areas: Replication, and the review process. Continue reading
Amidst what has been termed ‘reproducibility crisis’ (see also a number of previous posts) in June 2014 the National Institutes of Health and Nature Publishing Group had convened a workshop on the rigour and reproducibility of preclinical biomedicine. As a result, last week the NIH published ‘Principles and Guidelines for Reporting Preclinical Research‘, and Nature as well as Science ran editorials on it. More than 30 journals, including the Journal of Cerebral Blood Flow and Metabolism, are endorsing the guidelines. The guidelines cover rigour in statistical analysis, transparent reporting and standards (including randomization and blinding as well as sample size justification), and data mining and sharing. This is an important step forward, but implementation has to be enforced and monitored: The ARRIVE guidelines (many items of which reappear in the NIH guidelines) have not been adapted widely yet (see previous post). In this context I highly recommend the article by Henderson et al in Plos Medicine in which they systematically review existing guidelines for in vivo animal experiments. From this the STREAM collaboration distilled a checklist on internal, external, and construct validity which I found more comprehensive and relevant than the one published now by the NIH. In the end, however, it is not so relevant to which guideline (ARRIVE, NIH, STREAM, etc.) researchers, reviewers, editors, or funders comply, but rather whether they use one at all!
Note added 12/15/2014: Check out the PubPeer postpublication discussion on the NIH/Nature/Science initiative (click here)!
In 2005 PLOS Medicine published John Ioannidis’ paper ‘Why most published research findings are false’ . The article was a wake up call for many, and now is probably the most influential publication in biomedicine of the last decade (>1.14 Mio views on the PLOS Med webside, thousands of citations in the scientific and lay press, featured in numerous blog posts, etc.). Its title has never been refuted, if anything, it has been replicated, for examples see some of the posts of this blog. Almost 10 years after, Ioannidis now revisits his paper, and the more constructive title ‘How to make more published research true” (PLoS Med. 2014 Oct 21;11(10):e1001747. doi: 10.1371/journal.pmed.1001747.) already indicates that the thrust this time is more forward looking. The article contains numerous suggestions to improve the research enterprise, some subtle and evolutionary, some disruptive and revolutionary, but all of them make a lot of sense. A must read for scientists, funders, journal editors, university administrators, professionals in the health industry, in other words: all stakeholders within the system!
About a year ago Seok et al. shocked the biomedical world with the verdict that mice are not humans, or more specifically, that the blood genomic responses in various inflammatory conditions do not correlate at all between human patients, and the corresponding disease models (see previous post , as well as this one). Now another paper, by Takao et al. and also published in PNAS, concludes the exact opposite, that is that there is a near perfect correlation between blood genomic responses in mouse and man.
Meanwhile, the initial publication is among the top cited medical publications of the last year, and hundreds of newspapers and blogs (including this one) have covered it. It will be interesting to see how much media coverage the Takao paper will receive, probably much less. But what happened, which paper should we believe?