- Let’s get this out of the way: Reproducibility is a cornerstone of science: Bacon, Boyle, Popper, Rheinberger
- A ‘lexicon’ of reproducibility: Goodman et al.
- What do we mean by ‘reproducible’? Open Science collaboration, Psychology replication
- Reproducible – non reproducible – A false dichotomy: Sizeless science, almost as bad as ‘significant vs non-significant’
- The emptiness of failed replication? How informative is non-replication?
- Hidden moderators – Contextual sensitivity – Tacit knowledge
- “Standardization fallacy”: Low external validity, poor reproducibility
- The stigma of nonreplication (‘incompetence’)- The stigma of the replicator (‘boring science’).
- How likely is strict replication?
- Non-reproducibility must occur at the scientific frontier: Low base rate (prior probability), low hanging fruit already picked: Many false positives – non-reproducibility
- Confirmation – weeding out the false positives of exploration
- Reward the replicators and the replicated – fund replications. Do not stigmatize non-replication, or the replicators.
- Resolving the tension: The Siamese Twins of discovery & replication
- Conclusion: No scientific progress without nonreproducibility: Essential non-reproducibility vs . detrimental non-reproducibility
- Further reading
Tuberculosis kills far more than a million people worldwide per year. The situation is particularly problematic in southern Africa, eastern Europe and Central Asia. There is no truely effective vaccination for tuberculosis (TB). In countries with a high incidence, a live vaccination is carried out with the diluted vaccination strain Bacillus Calmette-Guérin (BCG), but BCG gives very little protection against tuberculosis of the lungs, and in all cases the vaccination is highly variable and unpredictable. For years, a worldwide search has been going on for a better TB vaccination.
Recently, the British Medical Journal has published an investigation in which serious charges have been raised against researchers and their universities: conflicts of interest, animal experiments of questionable quality, selective use of data, deception of grant-givers and ethics commissions, all the way up to endangerment of study participants. There was also a whistle blower… who had to pack his bags. It all happened in Oxford, at one of the most prestigious virological institutes on earth, and the study on humans was carried out on infants of the most destitute layers of the population. Let’s have a closer look at this explosive mix in more detail, for we have much to learn from it about
- the ethical dimension of preclinical research and the dire consequences that low quality in animal experiments and selective reporting can have;
- the important role of systematic reviews of preclinical research, and finally also about
- the selective (or non) availability and scrutiny of preclinical evidence when commissions and authorities decide on clinical studies.
Recently, NIH Scientists B. Ian Hutchins and colleagues have (pre)published “The Relative Citation Ratio (RCR). A new metric that uses citation rates to measure influence at the article level”. [Note added 9.9.2016: A peer reviewed version of the article has now appeared in PLOS Biol]. Just as Stefano Bertuzzi, the Executive Director of the American Society for Cell Biology, I am enthusiastic about the RCR. The RCR appears to be a viable alternative to the widely (ab)used Journal Impact Factor (JIF).
The RCR has been recently discussed in several blogs and editorials (e.g. NIH metric that assesses article impact stirs debate; NIH’s new citation metric: A step forward in quantifying scientific impact? ). At a recent workshop organized by the National Library of Medicine (NLM) I learned that the NIH is planning to widely use the RCR in its own grant assessments as an antidote to JIF, raw article citations, h-factors, and other highly problematic or outright flawed metrics. Continue reading
Using metaanalysis and computer simulation we studied the effects of attrition in experimental research on cancer and stroke. The results were published this week in the new meta-research section of PLOS Biology. Not surprisingly, given the small sample sizes of preclinical experimentation, loss of animals in experiments can dramatically alter results. However, effects of attrition on distortion of results were unknown. We used a simulation study to analyze the effects of random and biased attrition. As expected, random loss of samples decreased statistical power, but biased removal, including that of outliers, dramatically increased probability of false positive results. Next, we performed a meta-analysis of animal reporting and attrition in stroke and cancer. Most papers did not adequately report attrition, and extrapolating from the results of the simulation data, we suggest that their effect sizes were likely overestimated. Continue reading
Eine Sendung des Deutschlandfunk (ausgestrahlt 20.9.15) von Martin Hubert. Aus der Ankündigung: ‘Biomediziner sollen in ihren Laboren unter anderem nach Substanzen gegen Krebs oder Schlaganfall suchen. Sie experimentieren mit Zellkulturen und Versuchstieren, testen gewollte Wirkungen und ergründen ungewollte. Neuere Studien zeigen jedoch, dass sich bis zu 80 Prozent dieser präklinischen Studien nicht reproduzieren lassen.’ Hier der Link zum Audiostream bzw. zum Transkript.
(German only, sorry!)
The crisis in scientific reproducibility has crystalized as it has become increasingly clear that the faithfulness of the majority of high-profile scientific reports is with little foundation, and that the societal burden of low reproducibility is enormous. In todays issue of Nature, C. Glenn Begley, Alastair Buchan, and myself suggest measures by which academic institutions can improve the quality and value of their research. To read the article, click here.
Our main point is that research institutions that receive public funding should be required to demonstrate standards and behaviors that comply with “Good Institutional Practice”. Here is a selection of potential measures, implementation of which shuld be verified, certified and approved by major funding agencies.
Compliance with agreed guidelines: Ensure compliance with established guidelines such as ARRIVE, MIAME, data access (as required by National Science Foundation and National Institutes of Health, USA).
Full access to the institution’s research results: Foster open access and open data; preregistration of preclinical study designs.
Electronic laboratory notebooks: Provide electronic record keeping compliant with FDA Code of Federal Regulations Title 21 (CFR Title 21 part 11). Electronic laboratory notebooks allow data and project sharing, supervision, time stamping, version control, and directly link records and original data.
Institutional Standard for Experimental Research Conduct (ISERC): Establish ISERC (e.g. blinding, inclusion of controls, replicates and repeats etc); ensure dissemination, training and compliance with IMSERC.
Quality management: Organize regular and random audits of laboratories and departments with reviews of record keeping and measures to prevent bias (such as randomization and blinding).
Critical incidence reporting: Implement a system to allow the anonymous reporting of critical incidences during research. Organize regular critical incidence conferences in which such ‘never events’ are discussed to prevent them in the future and create a culture of research rigor and accountability.
Incentives and disincentives: Develop and implement novel indices to appraise and reward research of high quality. Honor robustness and mentoring as well as originality of research. Define appropriate penalties for substandard research conduct or noncompliance with guidelines. These might include decreased laboratory space, lack of access to trainees, reduced access to core facilities.
Training: Establish mandatory programs to train academic clinicians and basic researchers at all professional levels in experimental design, data analysis and interpretation, as well as reporting standards.
Research quality mainstreaming: Bundle established performance measures plus novel institution-unique measures to allow a flexible, institution-focused algorithm that can serve as the basis for competitive funding applications.
Research review meetings: create forum for routine assessment of institutional publications with focus on robust methods: the process rather than result.
Biomedicine currently suffers a ‚replication crisis‘: Numerous articles from academia and industry prove John Ioannidis’ prescient theoretical 2005 paper ‘Why most published research findings are false’ (Why most published research findings are false) to be true. On the positive side, however, the academic community appears to have taken up the challenge, and we are witnessing a surge in international collaborations to replicate key findings of biomedical and psychological research. Three important articles appeared over the last weeks which on the one hand further demonstrated that the replication crisis is real, but on the other hand suggested remedies for it:
Two consortia have pioneered the concept of preclinical randomized controlled trials, very much inspired by how clinical trials minimize bias (prespecification of a primary endpoint, randomization, blinding, etc.), and with much improved statistical power compared to single laboratory studies. One of them (Llovera et al.) replicated the effect of a neuroprotectant (CD49 antibody) in one, but not another model of stroke, while the study by Kleikers et al. failed to reproduce previous findings claiming that NOX2 inhibition is neuroprotective in experimental stroke. In Psychology, the Open Science Collaboration conducted replications of 100 experimental and correlational studies published in three psychology journals using high-powered designs and original materials when available. Disapointingly but not surprisingly, replication rates were low, and studies that replicated did so with much reduced effect sizes.