Category: Medicine

Open evaluation of scientific papers

drownScientific publishing should be based on open access, and open evaluation. While open access is on its way, open evaluation (OE) is still controversial and only slowly seeping into the the system. Kriegeskorte, Walther, and Deca have edited a whole issue on Frontiers in Computational Neuroscience devoted to this topic, with some very scholarly and thoughtful discussions on the pros and cons of OE. I highly recommend the editorial (An emerging consensus for open evaluation), which tries to synthesize the arguments into ’18 visions’. The beauty of their blueprint for the future of scientific publication (which was already published a year ago) is that it is possible to start with the current system and slowly evolve it into a full blown OE system, while checking on the way whether the different measures  deliver their promises.

Otto Warburg’s research grant

warburg dfgOtto Warburg (1883 – 1970), the famous German biochemist and Nobel laureate, is often creditet for having written the perfect, the ideal research grant. I just stumbled into a ‘reconstruction’ of his classic application to the Notgemeinschaft der Deutschen Wissenschaft (Emergency Association of German Science, the fore runner of the Deutsche Forschungsgemeinschaft), probably in 1921. The application consisted of a single sentence, ‘I require 10,000 marks’ and was fully funded. It was (re)printed in a review in Nature Cancer Reviews  entiteled ‘Otto Warburg’s contributions to current concepts of cancer metabolism’ by Koppenol et al. Thanks! Those were the days!

Is more than 80% of medical research waste?

The Lwasteancet has published a landmark series of 5 papers on quality problems in biomedical research, which also propose a number of measures to increase value and reduce waste. Here is our commentary and summary . All articles are freely available on the internet (rather unusual for an Elsevier journal…).

From the Lancet pages:

The Lancet presents a Series of five papers about research. In the first report Iain Chalmers et al discuss how decisions about which research to fund should be based on issues relevant to users of research. Next, John Ioannidis et al consider improvements in the appropriateness of research design, methods, and analysis. Rustam Al-Shahi Salman et al then turn to issues of efficient research regulation and management. Next, An-Wen Chan et al examine the role of fully accessible research information. Finally, Paul Glasziou et al discuss the importance of unbiased and usable research reports. These papers set out some of the most pressing issues, recommend how to increase value and reduce waste in biomedical research, and propose metrics for stakeholders to monitor the implementation of these recommendations.

How Science goes wrong

How science goes wrong Economist Cover 19.10.2013Scepticism regarding the quality and predictiveness of modern science has finally arrived in the lay press. This week The Economist has devoted its issue, including, cover, editorial, and leader to what they call ‘unreliable research’. Even closer to home, this weeks New Scientist (also with cover, editorial and leader) turns on neuroscience, with a similar message and material, and the bottom line that ‘the vast majority of brain research is now drowning in uncertainty.’ A clear signal that it is either time to abandon ship, or to clean up the mess!

 

 

The failure of peer review – A game of chance?

 

Reviewing

In 2000, two undisclosed  neuroscience journals opened their database to an interesting study, which was subsequently published in Brain : Rothwell and Martyn set out to determine the ‘reproducibility’ of the assessments of submitted articles by independent reviewers. They found, not surprisingly, that the recommendations of the reviewers had a strong influence on the acceptance of the articles. However, there was no or only little agreement between reviewers regarding priory. The agreement between reviewers regarding recommendation (accept, reject, revise) was also not better than chance.

Two recent publication have picked up this thread, and found rather horrifying results:

In Science this week John Bohannon reports the results of an interesting experiment. He deliberately faked completely flawed studies reporting the anticancer effects of non-existing phytodrugs, following the template:

‘Molecule X from lichen species Y inhibits the growth of cancer cell Z. To substitute for those variables, [he] created a database of molecules, lichens, and cancer cell lines and wrote a computer program to generate hundreds of unique papers. Other than those differences, the scientific content of each paper [was] identical.’

The studies included ethical problems, reported results that were not reflected in the experiments, the study design was wrong, etc. He then submitted them to 304 open access journals. 157 accepted it for publication! While this may reflect more a problem of some open access journals which are dedicated to so called ‘predatory publishing’ (to skim off publication fees from willing authors), some journals were published by respectable publishers.

Eyre-Walker and Stoletzki in the same week published an article in PLOS Biol, comparing peer review, impact factor, and number of citations to assess the ‘merit’ of a paper. They use a dataset of 6500 articles (e.g. from the F1000 database) for which they had post publication peer review by at least two authors. Again, just like in the Rothwell and Martyn Study, agreement between reviewers was not much better than chance (r2 of 0,07). The score of the assessors also very weakly correlated with the number of citations drawn by those articles (2=0,06).  They summarize that ‘we have shown that none of the measures of scientific merit that we have investigated are reliable.’

What follows from all this? A good to-do list can be found in the editoral accompanying the Eyre-Walker & Stoletzky article. Eisen et al. advocate multidimensional assessment tools (‘altmetrics’), but for now ‘Do what you can today; help disrupt and redesign the scientific norms around how we assess, search, and filter science.’

 

References

Rothwell PM, Martyn CN (2000) Reproducibility of peer review in clinical neuroscience. Is agreement between reviewers any greater than would be expected by chance alone? Brain.123 ( Pt 9):1964-9.

Bohannon J (2013) Who’s afraid of peer review? Science 342:60-65

Eyre-Walker A, Stoletzki N (2013) The Assessment of Science: The Relative Merits of Post-Publication Review, the Impact Factor, and the Number of Citations. PLoS Biol 11(10): e1001675. doi:10.1371/journal.pbio.1001675

Eisen JA, MacCallum CJ, Neylon C (2013) Expert Failure: Re-evaluating Research Assessment. PLoS Biol 11(10): e1001677. doi:10.1371/journal.pbio.1001677

Too good to be true: Excess significance in experimental neuroscience

pvalueIn a massive metaanalysis of animal studies of six neurological diseases (EAE/MS; Parkinsons; Ischemic stroke; Spinal cord injury; Intracerebral hemorraghe; Alzheimer’s disease) Tsilidis at al. have demonstrated that the published literature in these fields has an excess of statistically significant results that are due to biases in reporting (PLoS Biol. 2013 Jul;11(7):e1001609). By including more than 4000 datasets (from more than 1000 individual studies!) which they synthesized in 160 metaanalyses they impressively substantiate that there are way too many ‘positive’ results in the literature!  Underlying reasons are reporting bias, including study publication bias, selective outcome reporting bias (where null results are omitted) and selective analysis bias (where data are analysed with different methods that favour ‘positive’ results). Study size was low (mean 16 animals), less than 1/3 of the studied randomized, or evaluated outcome in a blinded fashion, and only 39 of 4140 studies performed sample size calculations!

Call for international collaboration in preclinical research

maus auf schlafendem mannTranslational stroke medicine requires renewal, and international collaboration in preclinical research may be an important step to overcome hurdles impeding progress. The tremendous power of international research collaboration has been convincingly demonstrated in physics, and several transnational collaborations have already delivered proof of concept in the stroke field. The experience gleaned from such collaborations is paving the way for an exciting new era in stroke research, which strives to harness the multitude of benefits achievable through international collaboration. Now is the time for concrete action to advance the agenda and establish an international preclinical stroke network (click here for full article:  Concerted Appeal for International Collaboration Stroke 2013)-

Overcoming negative (positive) publication bias

1-negativeF1000 Research starts initiative to overcome ‘positive publication bias’ (aka ‘negative publication bias)’. Until end of August publication fees  are waived for submission of Null results.

Only data that are available via publications—and, to a certain extent, via presentations at conferences—can contribute to progress in the life sciences. However, it has long been known that a strong publication bias exists, in particular against the publication of data that do not reproduce previously published material or that refute the investigators’ initial hypothesis. The latter type of contradictory evidence is commonly known as ‘negative data.’ This slightly derogatory term reflects the bias against studies in which investigators were unable to reject their null hypothesis (H0), a tool of frequentist statistics that states that there is no difference between experimental groups.

Researchers are well aware of this bias, as journals are usually not keen to publish the nonexistence of a phenomenon or treatment effect. They know that editors have little interest in publishing data that refute, or do not reproduce, previously published work—with the exception of spectacular cases that guarantee the attention of the scientific community, as well as garner extra citations (Ioannidis and Trikalinos, 2005). The authors of negative results are required to provide evidence for failure to reject the null hypothesis under numerous conditions (e.g., dosages, assays, outcome parameters, additional species or cell types), whereas a positive result would be considered worthwhile under any of these conditions . Indeed, there is a dilemma: one can never prove the absence of an effect, because, as Altman and Bland (1995) remind us, ‘absence of evidence is not evidence of absence’.

Several journals have already opened their pages to ‘negative’ results. For example, the  Journal of Cerebral Blood Flow and Metabolism: Fighting publication bias: introducing the Negative Results section  publishes such studies as a one-page summary (maximum 500 words, two figures) in the print edition of the journal, and the accompanying full paper online.

Power failure

 

powerfistIn a highly cited paper in 2005, John Ioannidis answered the question ‘Why most published research findings are false’  (PLoS Med. 2, e124). The answer, in one sentence, is ‘because of low statistical power and bias’. A current analysis in Nature Reviews Neuroscience ‘Power failure: why small sample size undermines the reliability of neuroscience’ (advance online publication, Ioannidis is a coauthor) now focuses on the neurosciences, and provides empirical evidence that in a wide variety of neuroscience fields (including imaging and animal modeling) exceedingly low statistical power and hence very low positive predictive values are the norm. This explains low reproducibility (e.g. special issue in Exp. Neurol. with (lack of) reproduction in spinal cord injury research, Exp Neurol. 2012 Feb;233(2):597-605) and inflated effect sizes. Besides this meta-analysis on power in neuroscience research, the article also contains a highly readable primer on the concepts of power, positive predictive value, type I and II error, as well as effect size. Must read.

 

Diversity outcross mice

mice

Most rodent models of disease (in stroke research, anyway) use young, healthy male, inbred mouse strains kept under specific pathogen free (SPF) conditions, restricted antigen exposure in their environment,  and on a diet optimized for maximum reproduction rates  (high in antioxidants, trace elements and other supplements, etc.). It is like studying cohorts of 12 year old male identical twins kept on an identical health diet in a single  sealed room, without any contact to the outside world  (the ‘plastic bubble’). What may be good for reproducible modeling, is potentially problematic for translational research, as patients often have comorbidities (e.g. hypertension and diabetes in stroke), already take various medicines, are elderly, and include females… Thus, external validity of the models often is low, at least partially explaining some of the failures when moving from promising new therapeutic strategies in rodents to real life patients in randomized clinical trials. Fortunately, external validity can be improved by studying comorbid animals at advanced age and of both genders. It is trickier in rodents to produce a mature immune system  that had contact with pathogens and multiple antigens. The answer to reduced genetic diversity  may be to use populations specifically developed to provide wide genetic variability, such as the diversity outbred population or the partially inbred collaborative cross strains developed by the Jackson Laboratory. However, in my field  (stroke research), which is particularly hit hard by the ‘translational roadblock’ I have not seen a single study making use of these strains.