Category: Uncategorized

Trust but verify: Institutions must do their part for reproducibility

robust scienceThe crisis in scientific reproducibility has crystalized as it has become increasingly clear that the faithfulness of the majority of high-profile scientific reports is with little foundation, and that the societal burden of low reproducibility is enormous. In todays issue of Nature, C. Glenn Begley, Alastair Buchan, and myself suggest measures by which academic institutions can improve the quality and value of their research. To read the article, click here.

Our main point is that research institutions that receive public funding should be required to demonstrate standards and behaviors that comply with “Good Institutional Practice”. Here is a selection of potential measures, implementation of which shuld be verified, certified and approved by major funding agencies.

Compliance with agreed guidelines:  Ensure compliance with established guidelines such as ARRIVE, MIAME, data access (as required by National Science Foundation and National Institutes of Health, USA).

Full access to the institution’s research results: Foster open access and open data; preregistration of preclinical study designs.

Electronic laboratory notebooks: Provide electronic record keeping compliant with FDA Code of Federal Regulations Title 21 (CFR Title 21 part 11). Electronic laboratory notebooks allow data and project sharing, supervision, time stamping, version control, and directly link records and original data.

Institutional Standard for Experimental Research Conduct (ISERC): Establish ISERC (e.g. blinding, inclusion of controls, replicates and repeats etc); ensure dissemination, training and compliance with IMSERC.

Quality management: Organize regular and random audits of laboratories and departments with reviews of record keeping and measures to prevent bias (such as randomization and blinding).

Critical incidence reporting: Implement a system to allow the anonymous reporting of critical incidences during research. Organize regular critical incidence conferences in which such ‘never events’ are discussed to prevent them in the future and create a culture of research rigor and accountability.

Incentives and disincentives: Develop and implement novel indices to appraise and reward research of high quality.  Honor robustness and mentoring as well as originality of research. Define appropriate penalties for substandard research conduct or noncompliance with guidelines. These might include decreased laboratory space, lack of access to trainees, reduced access to core facilities.

Training:  Establish mandatory programs to train academic clinicians and basic researchers at all professional levels in experimental design, data analysis and interpretation, as well as reporting standards.

Research quality mainstreaming: Bundle established performance measures plus novel  institution-unique measures to allow a flexible, institution-focused algorithm that can serve as the basis for competitive funding applications.

Research review meetings: create forum for routine assessment of institutional publications with focus on robust methods: the process rather than result.

Continue reading

10 years after: Ioannidis revisits his classic paper

ioannidisIn 2005 PLOS Medicine published John Ioannidis’ paper ‘Why most published research findings are false’ . The article was a wake up call for many, and now is probably the most influential publication in biomedicine of the last decade (>1.14 Mio views on the PLOS Med webside, thousands of citations in the scientific and lay press, featured in numerous blog posts, etc.). Its title has never been refuted, if anything, it has been replicated, for examples see some of the posts of this blog. Almost 10 years after, Ioannidis now revisits his paper, and the more constructive title ‘How to make more published research true” (PLoS Med. 2014 Oct 21;11(10):e1001747. doi: 10.1371/journal.pmed.1001747.) already indicates that the thrust this time is more forward looking. The article contains numerous suggestions to improve the research enterprise, some subtle and evolutionary, some disruptive and revolutionary, but all of them make a lot of sense. A must read for scientists, funders, journal editors, university administrators, professionals in the health industry, in other words: all stakeholders within the system!

Can mice be trusted?

Katharina Frisch ‘Mouse and man’

I started this blog with an post on a PNAS paper which at that time had received a lot of attention in the scientific community and lay press. In this article, Seok et al. argued that ‘genomic responses in mouse models poorly mimic human inflammatory diseases‘. With this post I am returning to this article, as I recently was asked by the Journal STROKE to contribute to their ‘Controversies in Stroke’ series. The Seok article had disturbed the Stroke community, so a pro/con discussion seemed timely. In the upcoming issue of STROKE Sharp and Jickling will argue that ‘the peripheral inflammatory response in rodent ischemic stroke models is different than in human stroke. Given the important role of the immune system in stroke, this could be a major handicap in translating results in rodent stroke models to clinical trials in patients with stroke.‘ This is of course true! Nevertheless, I counter by providing some examples of translational successes regarding stroke and the immune system, and conclude that ‘the physiology and pathophysiology of rodents is sufficiently similar to humans to make them a highly relevant model organism but also sufficiently different to mandate an awareness of potential resulting pitfalls. In any case, before hastily discarding highly relevant past, present, and future findings, experimental stroke research needs to improve dramatically its internal and external validity to overcome its apparent translational failures.’ For an in depth treatment, follow the debate:

Article: Dirnagl: Can mice be trusted

Article: Sharp Jickling: Differences between mice and humans

Why post-hoc power calculation is not helpful

powerfistStatistical power is a rare commodity in experimental biomedicine (see previous post), as most studies have very low n’s and are therefore severly underpowered. The concept of statistical power, although almost embarrassingly simple (for a very nice treatment see Button et al.), is shrouded in ignorance,  mysteries and misunderstandings among many researchers. A simple definition states that Power is the probability that, given a specified true difference between two groups, the quantitative results of a study will be deemed statistically significant. The most common misunderstanding may be that power should only be a concern to the researcher if the Null hypothesis could not rejected (p>0.05). I need to deal with this dangerous fallacy in a future post. Another common albeit less perilous misunderstanding is that calculating post-hoc (or ‘retrospective )’ power can explain why an analysis did not achieve significance. Besides proving a severe bias of the researcher towards rejecting the Null hypothesis (‘There must be another reason for not obtaining a significant result than that the hypothesis is incorrect!), this is the equivalent of a statistical tautology. Of course the study was not powerful enough, this is why the result was not significant! To look at this from another standpoint: Provided enough n’s, the Null of every study must be reject. This by the way, is one of the most basic criticisms of Null hypothesis significance testing. Power calculations are useful for the design of studies, but not for their analysis. This was nicely explained by Steven Goodman in his classic article  ‘Goodman The use of predicted confidence intervals when planning experiments and the misuse of power when interpreting results Ann IntMed 1994‘:

First, [post-hoc Power analysis] will always show that there is low power (< 50%) with respect to a nonsignificant difference, making tautological and uninformative the claim that a study is “underpowered” with respect to an observed nonsignificant result. Second, its rationale has an Alice-in-Wonderland feel, and any attempt to sort it out is guaranteed to confuse. The conundrum is the result of a direct collision between the incompatible pretrial and post-trial perspectives. […] Knowledge of the observed difference naturally shifts our perspective toward estimating differences, rather than deciding between them, and makes equal treatment of all nonsignificant results impossible. Once the data are in, the only way to avoid confusion is to not compress results into dichotomous significance verdicts and to avoid post hoc power estimates entirely.  

NB: To avoid misunderstandings: Calculating the n’s needed in future experiments to achieve a certain statistical power based on effect sizes and variance obtained post – hoc from a (pilot) experiment is not called post-hoc power analysis (and the subject of this post), but rather sample size calculation.

For further reading:

How physics and engineering resolve reproducibility issues

Image

A few decades ago engineers discovered that adding nanometre-sized particles to a liquid makes it far more effective at carrying away heat than anyone expected. If true, this would have tremendous implications for designing more effective coolants! However, while some researchers confirmed the results, several laboratories were unable to reproduce the findings. Some even suggested that nanoparticles make heat transfer worse.

In experimental medicine such controversies are rare, as reproduction of pivotal findings is an uncommon exercise. If controversy arises (e.g. a recent series of attempted reproductions regarding the effect of APO-E directed therapy on ß-amyloid clearance in mouse models of Alzheimer’s, Science 24 May 2013: 924), it is rarely resolved.

How did the physics/engineering community resolve the issue?  By assembling the ‘International Nanofluid Property Benchmark Exercise’ (INPBE), in which over 30 organizations worldwide measured the thermal conductivity of identical samples of colloidally stable dispersions of nanoparticles or nanofluids, using a variety of experimental approaches (Buongiorno et al. http://dx.doi.org/10.1063/1.3245330). Result: No anomalous enhancement of thermal conductivity!

Another best practice example from the physics community. Life sciences, take note!

Domestic cats kill billions of mice in US

cat

A new study estimates that domestic cats kill 1.4–3.7 billion birds and 6.9–20.7 billion mammals (mostly mice) annually in the United States. PETA posits that 100 million mice and rats are used in animal experiments per year in the US.