There is a lot of thinking going on today about how research can be made more efficient, more robust, and more reproducible. At the top of the list are measures for improving internal validity (for example randomizing and blinding, prespecified inclusion and exclusion criteria etc.), measures for increasing sample sizes and thus statistical power, putting an end to the fetishization of the p-value, and open access to original data (open science). Funders and journals are raising the bar for applicants and authors by demanding measures to safeguard the validity of the research submitted to them.
Students and young researchers have taken note, too. I teach, among other things, statistics, good scientific practice and experimental design and am impressed every time by the enthusiasm of the students and young post docs, and how they leap into the adventure of their scientific projects with the unbent will to “do it right”. They soak up suggestions for improving reproducibility and robustness of their research projects like a dry sponge soaks up water. Often however the discussion is in the end not satisfying, especially when we discuss students’ own experiments and approaches to research work. I often hear: “That’s all very good and fine, but it won’t get by with my group leader.” Group leaders would tell them: “That is the way we have always done that, and it got us published in Nature and Science”, “If we do it the way you suggest, it won’t get through the review process”, or “We then could only get it published in PLOS One (or Peer J, F1000 Research etc.) and then the paper will contaminate your CV”, etc.
I often wish that not only the students would be sitting in the seminar room, but also their supervisors with them! Continue reading
I failed to reproduce the results of my experiments! Some of us are haunted by this horror vision. The scientific academies, the journals and in the meantime the sponsors themselves are all calling for reproducibility, replicability and robustness of research. A movement for “reproducible science” has developed. Sponsorship programs for the replication of research papers are now in the works.In some branches of science, especially in psychology, but also in fields like cancer research, results are now being systematically replicated… or not, thus we are now in the throws of a “reproducibility crisis”.
Now Daniel Fanelli, a scientist who up to now could be expected to side with those who support the reproducible science movement, has raised a warning voice. In the prestigious Proceedings of the National Academy of Sciences he asked rhetorically: “Is science really facing a reproducibility crisis, and if so, do we need it?” So todayon the eve, perhaps, of a budding oppositional movement, I want to have a look at some of the objections to the “reproducible science” mantra. Is reproducibility of results really the fundament of scientific methods? Continue reading
It is for good reason that researchers are the object of envy. When not stuck with bothersome tasks such as grant applications, reviews, or preparing lectures, they actually get paid for pursuing their wildest ideas! To boldly go where no human has gone before! We poke about through scientific literature, carry out pilot experiments that surprisingly almost always succeed. Then we do a series of carefully planned and costly experiments. Sometimes they turn out well, often not, but they do lead us into the unknown. This is how ideas become hypotheses; one hypothesis leads to those that follow, and voila, low and behold, we confirm them! In the end, sometimes only after several years and considerable wear and tear on personnel and material, we manage then to weave a “story” out of them (see also). Through a complex chain of results the story closes with a “happy end”, perhaps in the form of a new biological mechanism, but at least as a little piece to fit the puzzle, and it is always presented to the world by means of a publication. Sometimes even in one of the top journals. Continue reading
Tuberculosis kills far more than a million people worldwide per year. The situation is particularly problematic in southern Africa, eastern Europe and Central Asia. There is no truely effective vaccination for tuberculosis (TB). In countries with a high incidence, a live vaccination is carried out with the diluted vaccination strain Bacillus Calmette-Guérin (BCG), but BCG gives very little protection against tuberculosis of the lungs, and in all cases the vaccination is highly variable and unpredictable. For years, a worldwide search has been going on for a better TB vaccination.
Recently, the British Medical Journal has published an investigation in which serious charges have been raised against researchers and their universities: conflicts of interest, animal experiments of questionable quality, selective use of data, deception of grant-givers and ethics commissions, all the way up to endangerment of study participants. There was also a whistle blower… who had to pack his bags. It all happened in Oxford, at one of the most prestigious virological institutes on earth, and the study on humans was carried out on infants of the most destitute layers of the population. Let’s have a closer look at this explosive mix in more detail, for we have much to learn from it about
- the ethical dimension of preclinical research and the dire consequences that low quality in animal experiments and selective reporting can have;
- the important role of systematic reviews of preclinical research, and finally also about
- the selective (or non) availability and scrutiny of preclinical evidence when commissions and authorities decide on clinical studies.
Based on research, mainly in rodents, tremendous progress has been made in our basic understanding of the pathophysiology of stroke. After many failures, however, few scientists today deny that bench-to-bedside translation in stroke has a disappointing track record. I here summarize many measures to improve the predictiveness of preclinical stroke research, some of which are currently in various stages of implementation: We must reduce preventable (detrimental) attrition. Key measures for this revolve around improving preclinical study design. Internal validity must be improved by reducing bias; external validity will improve by including aged, comorbid rodents of both sexes in our modeling. False-positives and inflated effect sizes can be reduced by increasing statistical power, which necessitates increasing group sizes. Compliance to reporting guidelines and checklists needs to be enforced by journals and funders. Customizing study designs to exploratory and confirmatory studies will leverage the complementary strengths of both modes of investigation. All studies should publish their full data sets. On the other hand, we should embrace inevitable NULL results. This entails planning experiments in such a way that they produce high-quality evidence when NULL results are obtained and making these available to the community. A collaborative effort is needed to implement some of these recommendations. Just as in clinical medicine, multicenter approaches help to obtain sufficient group sizes and robust results. Translational stroke research is not broken, but its engine needs an overhauling to render more predictive results.
Read the full article at the Publishers site (STROKE/AHA). If your library does not have a subscription, here is the Authors Manuscript (Stroke/AHA did not allow me to even pay for open access, as it is ‘a special article…’).
Recently, NIH Scientists B. Ian Hutchins and colleagues have (pre)published “The Relative Citation Ratio (RCR). A new metric that uses citation rates to measure influence at the article level”. [Note added 9.9.2016: A peer reviewed version of the article has now appeared in PLOS Biol]. Just as Stefano Bertuzzi, the Executive Director of the American Society for Cell Biology, I am enthusiastic about the RCR. The RCR appears to be a viable alternative to the widely (ab)used Journal Impact Factor (JIF).
The RCR has been recently discussed in several blogs and editorials (e.g. NIH metric that assesses article impact stirs debate; NIH’s new citation metric: A step forward in quantifying scientific impact? ). At a recent workshop organized by the National Library of Medicine (NLM) I learned that the NIH is planning to widely use the RCR in its own grant assessments as an antidote to JIF, raw article citations, h-factors, and other highly problematic or outright flawed metrics. Continue reading
Using metaanalysis and computer simulation we studied the effects of attrition in experimental research on cancer and stroke. The results were published this week in the new meta-research section of PLOS Biology. Not surprisingly, given the small sample sizes of preclinical experimentation, loss of animals in experiments can dramatically alter results. However, effects of attrition on distortion of results were unknown. We used a simulation study to analyze the effects of random and biased attrition. As expected, random loss of samples decreased statistical power, but biased removal, including that of outliers, dramatically increased probability of false positive results. Next, we performed a meta-analysis of animal reporting and attrition in stroke and cancer. Most papers did not adequately report attrition, and extrapolating from the results of the simulation data, we suggest that their effect sizes were likely overestimated. Continue reading
This has been a week chock-full of bias! First nature ran a cover story on it, with an editorial, and a very nice introduction into the subject by Regina Nuzzo. Then Malcolm Macleod and colleagues published a perspective in Plos Biology demonstrating limited reporting of measures to reduce the risk of bias in life sciences publications, and that there may be an inverse correlation between journal rank or prestige of the University from which the research originated and presence of measures to prevent bias. At the same time Jonathan Kimmelman’s group came out with a report in eLife in which they meta-analytically explored preclinical studies of an anticancer drug (sunitinib) to demonstrate that only a fraction of drugs that show promise in animals end up proving safe and effective in humans, partly because of design flaws, such as lack of prevention of bias, and partly due to positive publication bias. Both articles resulted in a worldwide media frenzy, including coverage by Nature and the lay press, here is an example from the Guardian. Retraction Watch interviewed Jonathan, while Malcolm spoke on BBC4.
Eine Sendung des Deutschlandfunk (ausgestrahlt 20.9.15) von Martin Hubert. Aus der Ankündigung: ‘Biomediziner sollen in ihren Laboren unter anderem nach Substanzen gegen Krebs oder Schlaganfall suchen. Sie experimentieren mit Zellkulturen und Versuchstieren, testen gewollte Wirkungen und ergründen ungewollte. Neuere Studien zeigen jedoch, dass sich bis zu 80 Prozent dieser präklinischen Studien nicht reproduzieren lassen.’ Hier der Link zum Audiostream bzw. zum Transkript.
(German only, sorry!)
Biomedicine currently suffers a ‚replication crisis‘: Numerous articles from academia and industry prove John Ioannidis’ prescient theoretical 2005 paper ‘Why most published research findings are false’ (Why most published research findings are false) to be true. On the positive side, however, the academic community appears to have taken up the challenge, and we are witnessing a surge in international collaborations to replicate key findings of biomedical and psychological research. Three important articles appeared over the last weeks which on the one hand further demonstrated that the replication crisis is real, but on the other hand suggested remedies for it:
Two consortia have pioneered the concept of preclinical randomized controlled trials, very much inspired by how clinical trials minimize bias (prespecification of a primary endpoint, randomization, blinding, etc.), and with much improved statistical power compared to single laboratory studies. One of them (Llovera et al.) replicated the effect of a neuroprotectant (CD49 antibody) in one, but not another model of stroke, while the study by Kleikers et al. failed to reproduce previous findings claiming that NOX2 inhibition is neuroprotective in experimental stroke. In Psychology, the Open Science Collaboration conducted replications of 100 experimental and correlational studies published in three psychology journals using high-powered designs and original materials when available. Disapointingly but not surprisingly, replication rates were low, and studies that replicated did so with much reduced effect sizes.