Category: Statistics

Is more than 80% of medical research waste?

The Lwasteancet has published a landmark series of 5 papers on quality problems in biomedical research, which also propose a number of measures to increase value and reduce waste. Here is our commentary and summary . All articles are freely available on the internet (rather unusual for an Elsevier journal…).

From the Lancet pages:

The Lancet presents a Series of five papers about research. In the first report Iain Chalmers et al discuss how decisions about which research to fund should be based on issues relevant to users of research. Next, John Ioannidis et al consider improvements in the appropriateness of research design, methods, and analysis. Rustam Al-Shahi Salman et al then turn to issues of efficient research regulation and management. Next, An-Wen Chan et al examine the role of fully accessible research information. Finally, Paul Glasziou et al discuss the importance of unbiased and usable research reports. These papers set out some of the most pressing issues, recommend how to increase value and reduce waste in biomedical research, and propose metrics for stakeholders to monitor the implementation of these recommendations.

The failure of peer review – A game of chance?

 

Reviewing

In 2000, two undisclosed  neuroscience journals opened their database to an interesting study, which was subsequently published in Brain : Rothwell and Martyn set out to determine the ‘reproducibility’ of the assessments of submitted articles by independent reviewers. They found, not surprisingly, that the recommendations of the reviewers had a strong influence on the acceptance of the articles. However, there was no or only little agreement between reviewers regarding priory. The agreement between reviewers regarding recommendation (accept, reject, revise) was also not better than chance.

Two recent publication have picked up this thread, and found rather horrifying results:

In Science this week John Bohannon reports the results of an interesting experiment. He deliberately faked completely flawed studies reporting the anticancer effects of non-existing phytodrugs, following the template:

‘Molecule X from lichen species Y inhibits the growth of cancer cell Z. To substitute for those variables, [he] created a database of molecules, lichens, and cancer cell lines and wrote a computer program to generate hundreds of unique papers. Other than those differences, the scientific content of each paper [was] identical.’

The studies included ethical problems, reported results that were not reflected in the experiments, the study design was wrong, etc. He then submitted them to 304 open access journals. 157 accepted it for publication! While this may reflect more a problem of some open access journals which are dedicated to so called ‘predatory publishing’ (to skim off publication fees from willing authors), some journals were published by respectable publishers.

Eyre-Walker and Stoletzki in the same week published an article in PLOS Biol, comparing peer review, impact factor, and number of citations to assess the ‘merit’ of a paper. They use a dataset of 6500 articles (e.g. from the F1000 database) for which they had post publication peer review by at least two authors. Again, just like in the Rothwell and Martyn Study, agreement between reviewers was not much better than chance (r2 of 0,07). The score of the assessors also very weakly correlated with the number of citations drawn by those articles (2=0,06).  They summarize that ‘we have shown that none of the measures of scientific merit that we have investigated are reliable.’

What follows from all this? A good to-do list can be found in the editoral accompanying the Eyre-Walker & Stoletzky article. Eisen et al. advocate multidimensional assessment tools (‘altmetrics’), but for now ‘Do what you can today; help disrupt and redesign the scientific norms around how we assess, search, and filter science.’

 

References

Rothwell PM, Martyn CN (2000) Reproducibility of peer review in clinical neuroscience. Is agreement between reviewers any greater than would be expected by chance alone? Brain.123 ( Pt 9):1964-9.

Bohannon J (2013) Who’s afraid of peer review? Science 342:60-65

Eyre-Walker A, Stoletzki N (2013) The Assessment of Science: The Relative Merits of Post-Publication Review, the Impact Factor, and the Number of Citations. PLoS Biol 11(10): e1001675. doi:10.1371/journal.pbio.1001675

Eisen JA, MacCallum CJ, Neylon C (2013) Expert Failure: Re-evaluating Research Assessment. PLoS Biol 11(10): e1001677. doi:10.1371/journal.pbio.1001677

Too good to be true: Excess significance in experimental neuroscience

pvalueIn a massive metaanalysis of animal studies of six neurological diseases (EAE/MS; Parkinsons; Ischemic stroke; Spinal cord injury; Intracerebral hemorraghe; Alzheimer’s disease) Tsilidis at al. have demonstrated that the published literature in these fields has an excess of statistically significant results that are due to biases in reporting (PLoS Biol. 2013 Jul;11(7):e1001609). By including more than 4000 datasets (from more than 1000 individual studies!) which they synthesized in 160 metaanalyses they impressively substantiate that there are way too many ‘positive’ results in the literature!  Underlying reasons are reporting bias, including study publication bias, selective outcome reporting bias (where null results are omitted) and selective analysis bias (where data are analysed with different methods that favour ‘positive’ results). Study size was low (mean 16 animals), less than 1/3 of the studied randomized, or evaluated outcome in a blinded fashion, and only 39 of 4140 studies performed sample size calculations!

Power failure

 

powerfistIn a highly cited paper in 2005, John Ioannidis answered the question ‘Why most published research findings are false’  (PLoS Med. 2, e124). The answer, in one sentence, is ‘because of low statistical power and bias’. A current analysis in Nature Reviews Neuroscience ‘Power failure: why small sample size undermines the reliability of neuroscience’ (advance online publication, Ioannidis is a coauthor) now focuses on the neurosciences, and provides empirical evidence that in a wide variety of neuroscience fields (including imaging and animal modeling) exceedingly low statistical power and hence very low positive predictive values are the norm. This explains low reproducibility (e.g. special issue in Exp. Neurol. with (lack of) reproduction in spinal cord injury research, Exp Neurol. 2012 Feb;233(2):597-605) and inflated effect sizes. Besides this meta-analysis on power in neuroscience research, the article also contains a highly readable primer on the concepts of power, positive predictive value, type I and II error, as well as effect size. Must read.