Category: Quality – Predictiveness

Due dilligence in ALS research

mouse targetSteve Perrin in a recent issue of NATURE (Vol 507, p.423) summarizes the struggle of the amyotrophic lateral sclerosis (ALS) field to explain the multiple failures of clinical trials testing compounds to improve the symptoms and survival of patients with this disease. He reports the efforts of the  ALS Therapy Development Institute (TDI) in Cambridge, Massachusetts, to reproduce the results of around 100 mouse studies which had yielded promising results. As it turned out, most of them, including the ones that led to clinical trials, could not be reproduced, and those where an effect was seen it was dramatically lower than the one reported initially. He discusses a number of measures that need to be taken to improve this situation, all of which have been emphasized independently  in other fields of biomedicine where bench to bedside translation has failed.

Have the ARRIVE guidelines been implemented?

 

arriveResearch on animals generally lacks transparent reporting of study design and implementation, as well as results. As a consequene of poor reporting, we are facing problems in replicating published findings, publication of underpowered studies and excessive false positives or false negatives, publication bias, and as a result difficulties in translating promising preclinical results into effective therapies for human disease. To improve the situation, in 2010 the ARRIVE guidelines for the reporting of animal research (www.nc3rs.org.uk/ARRIVEpdf) were formulated, which were adopted by over 300 scientifc journals, including the Journal of Cerebral Blood Flow and Metabolism (www.nature.com/jcbfm). Four years after, Baker et al. ( PLoS Biol 12(1): e1001756. doi:10.1371/journal.pbio.1001756) have systematically investigated the effect of the implementation of the ARRIVE guidelines on reporting of in vivo research, with a particular focus on the multiple sclerosis field. The results are highly disappointing:

‘86%–87% of experimental articles do not give any indication that the animals in the study were properly randomized, and 95% do not demonstrate that their study had a sample size sufficient to detect an effect of the treatment were there to be one. Moreover, they show that 13% of studies of rodents with experimental autoimmune encephalomyelitis (an animal model of multiple sclerosis) failed to report any statistical analyses at all, and 55% included inappropriate statistics.. And while you might expect that publications in ‘‘higher ranked’’ journals would have better reporting and a more rigorous methodology, Baker et al. reveal that higher ranked journals (with an impact factor greater than ten) are twice as likely to report either no or inappropriate statistics’ (Editorial by Eisen et al., PLoS Biol 12(1): e1001757. doi:10.1371/journal.pbio.1001757).

It is highly likely that other fields in biomedicine have a similar dismal record. Clearly, there is a need for journal editors and publishers to enforce the ARRIVE guidelines and to monitor its implementation!

 

Found in translation

Lost or found in translation? Stroke is a major cause of global morbidity and mortality, yet therapeutic options are very limited. Numerous preclinical studies promised highly effective novel treatments, none of which have made it into practice despite a plethora of clinical trials. This failure to bridge the gap between bench and bedside deeply frustrates researchers, clinicians, the pharmaceutical industry, and patients. Dirnagl and Endres  argue that despite the apparent translational failures in neuroprotection research, and counter to current nihilism, basic and preclinical stroke research has in fact been able to predict human pathophysiology, clinical phenotypes, and therapeutic outcomes. The understanding of stroke pathobiology that has been achieved through basic research has led to changes in stroke care whose value can be demonstrated. Preclinical investigations have informed the clinical realm even in the absence of intermediary phase 2 or phase 3 trials. Their arguments rest on examples of successful bench-to-bedside translation in which experimental studies preceded human trials and successfully predicted  outcomes or phenotypes, as well as on examples of successful ‘back-translation’, where studies in animals recapitulated what we already knew to be true in human beings.  An analysis of the reasons for the apparent (or only perceived) translational failures  further strenghtens their proposition, and suggests measures to improve the positive predictive value of preclinical stroke research. Researchers, funding agencies, academic institutions, publishers, and professional societies should work together to harness the tremendous potential of basic and preclinical research, in stroke research as well as in other fields of medicine

Ulrich Dirnagl and Matthias Endres. Found in Translation: Preclinical Stroke Research Predicts Human Pathophysiology, Clinical Phenotypes, and Therapeutic Outcomes. Stroke. 201445: 1510-1518

Nachkochen unmöglich

METRICS

metaresearchThe Economist reported that John Ioannidis, together with Steven Goodman, later this month will open the Meta – Research Innovation Center at Standford University (METRICS). Generously supported by the Buck foundation , it will fight bad science, bias, and lack of evidence in all areas of biomedicine.  The institute’s moto is to ‘Identify and minimise persistent threats to medical research quality’. Those who have followed the work of Ioannidis and Goodman know that this is good news indeed! A concise overview of Ioannidis research can be found in this online article at Maclean’s.

The probability of replicating ‘true’ findings is low…

coinflipDue to small group sizes and presence of substantial bias experimental medicine produces a  large number of false positive results (see previous post). It has been claimed that 50 – 90 % of all results may be false (see previous post). In support of these claims is the staggerlingly low number of experiments that can be replicated. But what are the chances to reproduce a finding that is actually true?

Continue reading

Is more than 80% of medical research waste?

The Lwasteancet has published a landmark series of 5 papers on quality problems in biomedical research, which also propose a number of measures to increase value and reduce waste. Here is our commentary and summary . All articles are freely available on the internet (rather unusual for an Elsevier journal…).

From the Lancet pages:

The Lancet presents a Series of five papers about research. In the first report Iain Chalmers et al discuss how decisions about which research to fund should be based on issues relevant to users of research. Next, John Ioannidis et al consider improvements in the appropriateness of research design, methods, and analysis. Rustam Al-Shahi Salman et al then turn to issues of efficient research regulation and management. Next, An-Wen Chan et al examine the role of fully accessible research information. Finally, Paul Glasziou et al discuss the importance of unbiased and usable research reports. These papers set out some of the most pressing issues, recommend how to increase value and reduce waste in biomedical research, and propose metrics for stakeholders to monitor the implementation of these recommendations.

How Science goes wrong

How science goes wrong Economist Cover 19.10.2013Scepticism regarding the quality and predictiveness of modern science has finally arrived in the lay press. This week The Economist has devoted its issue, including, cover, editorial, and leader to what they call ‘unreliable research’. Even closer to home, this weeks New Scientist (also with cover, editorial and leader) turns on neuroscience, with a similar message and material, and the bottom line that ‘the vast majority of brain research is now drowning in uncertainty.’ A clear signal that it is either time to abandon ship, or to clean up the mess!

 

 

The failure of peer review – A game of chance?

 

Reviewing

In 2000, two undisclosed  neuroscience journals opened their database to an interesting study, which was subsequently published in Brain : Rothwell and Martyn set out to determine the ‘reproducibility’ of the assessments of submitted articles by independent reviewers. They found, not surprisingly, that the recommendations of the reviewers had a strong influence on the acceptance of the articles. However, there was no or only little agreement between reviewers regarding priory. The agreement between reviewers regarding recommendation (accept, reject, revise) was also not better than chance.

Two recent publication have picked up this thread, and found rather horrifying results:

In Science this week John Bohannon reports the results of an interesting experiment. He deliberately faked completely flawed studies reporting the anticancer effects of non-existing phytodrugs, following the template:

‘Molecule X from lichen species Y inhibits the growth of cancer cell Z. To substitute for those variables, [he] created a database of molecules, lichens, and cancer cell lines and wrote a computer program to generate hundreds of unique papers. Other than those differences, the scientific content of each paper [was] identical.’

The studies included ethical problems, reported results that were not reflected in the experiments, the study design was wrong, etc. He then submitted them to 304 open access journals. 157 accepted it for publication! While this may reflect more a problem of some open access journals which are dedicated to so called ‘predatory publishing’ (to skim off publication fees from willing authors), some journals were published by respectable publishers.

Eyre-Walker and Stoletzki in the same week published an article in PLOS Biol, comparing peer review, impact factor, and number of citations to assess the ‘merit’ of a paper. They use a dataset of 6500 articles (e.g. from the F1000 database) for which they had post publication peer review by at least two authors. Again, just like in the Rothwell and Martyn Study, agreement between reviewers was not much better than chance (r2 of 0,07). The score of the assessors also very weakly correlated with the number of citations drawn by those articles (2=0,06).  They summarize that ‘we have shown that none of the measures of scientific merit that we have investigated are reliable.’

What follows from all this? A good to-do list can be found in the editoral accompanying the Eyre-Walker & Stoletzky article. Eisen et al. advocate multidimensional assessment tools (‘altmetrics’), but for now ‘Do what you can today; help disrupt and redesign the scientific norms around how we assess, search, and filter science.’

 

References

Rothwell PM, Martyn CN (2000) Reproducibility of peer review in clinical neuroscience. Is agreement between reviewers any greater than would be expected by chance alone? Brain.123 ( Pt 9):1964-9.

Bohannon J (2013) Who’s afraid of peer review? Science 342:60-65

Eyre-Walker A, Stoletzki N (2013) The Assessment of Science: The Relative Merits of Post-Publication Review, the Impact Factor, and the Number of Citations. PLoS Biol 11(10): e1001675. doi:10.1371/journal.pbio.1001675

Eisen JA, MacCallum CJ, Neylon C (2013) Expert Failure: Re-evaluating Research Assessment. PLoS Biol 11(10): e1001677. doi:10.1371/journal.pbio.1001677

Too good to be true: Excess significance in experimental neuroscience

pvalueIn a massive metaanalysis of animal studies of six neurological diseases (EAE/MS; Parkinsons; Ischemic stroke; Spinal cord injury; Intracerebral hemorraghe; Alzheimer’s disease) Tsilidis at al. have demonstrated that the published literature in these fields has an excess of statistically significant results that are due to biases in reporting (PLoS Biol. 2013 Jul;11(7):e1001609). By including more than 4000 datasets (from more than 1000 individual studies!) which they synthesized in 160 metaanalyses they impressively substantiate that there are way too many ‘positive’ results in the literature!  Underlying reasons are reporting bias, including study publication bias, selective outcome reporting bias (where null results are omitted) and selective analysis bias (where data are analysed with different methods that favour ‘positive’ results). Study size was low (mean 16 animals), less than 1/3 of the studied randomized, or evaluated outcome in a blinded fashion, and only 39 of 4140 studies performed sample size calculations!