Category: Medicine
Journals unite for reproducibility
Amidst what has been termed ‘reproducibility crisis’ (see also a number of previous posts) in June 2014 the National Institutes of Health and Nature Publishing Group had convened a workshop on the rigour and reproducibility of preclinical biomedicine. As a result, last week the NIH published ‘Principles and Guidelines for Reporting Preclinical Research‘, and Nature as well as Science ran editorials on it. More than 30 journals, including the Journal of Cerebral Blood Flow and Metabolism, are endorsing the guidelines. The guidelines cover rigour in statistical analysis, transparent reporting and standards (including randomization and blinding as well as sample size justification), and data mining and sharing. This is an important step forward, but implementation has to be enforced and monitored: The ARRIVE guidelines (many items of which reappear in the NIH guidelines) have not been adapted widely yet (see previous post). In this context I highly recommend the article by Henderson et al in Plos Medicine in which they systematically review existing guidelines for in vivo animal experiments. From this the STREAM collaboration distilled a checklist on internal, external, and construct validity which I found more comprehensive and relevant than the one published now by the NIH. In the end, however, it is not so relevant to which guideline (ARRIVE, NIH, STREAM, etc.) researchers, reviewers, editors, or funders comply, but rather whether they use one at all!
Note added 12/15/2014: Check out the PubPeer postpublication discussion on the NIH/Nature/Science initiative (click here)!
PubMed Commons to the rescue
I have earlier posted on the many virtues of post-publication commenting. Only few journals presently allow readers to comment, discuss, or criticise papers online. Since 2013 the National Library of Medicine of the US National Institutes of Health, which provides us with more than 24 million citations of the MEDLINE, offers a tool to comment on any of these millions of articles: PubMed COMMONS. Although it had a slow start, and at present only less than 3000 comments are listed, it is almost doomed to be a success and has the potential to propel biomedical publishing into the 21st century. I believe that the PubMed COMMONS model is superior to post-publication commenting schemes of indvidual journals. The main advantage of their model is that every article which receives a comment is directly visible and accessible with one click for everyone retrieving the article via PubMed. Comments published in individual journals may be burried on their websites, only a tiny fraction of journals allow commenting, and no other model would allow commenting on papers dating back more than a few years. How does it work?
Appraising and rewarding biomedical research
In academic biomedicine, and in most countries and research environments, grants, performance based funding, positions, etc., are appraised and rewarded based on very simple quantitative metrics: The impact factor (IF) of previous publications, and the amount of third party funding. For example, at the Charite in Berlin researchers receive ‘Bonus’ funding (to be spent only in research, of course!) which is calculated by adding the IFs of the journals in which the researcher has published over the last 3 years, and the cumulative third party funding during that period (weighted depending on whether the source was the Deutsche Forschungsgemeinschaft (DFG, x3), the ministry of science (BMBF), the EU, foundations (x2), or others (x1). IF and funding contribute 50% each to the bonus. In 2014, this resulted in a bonus of 108 € per IF point, and 8 € per 1000 € funding (weighted).
Admittedly, IF and third party funding are quantitative, hence easily comparable, and using them is easy and ‘just’. But is it smart to select candidates or grants based on a metric that measures the average number of citations to recent articles published in a journal? In other words, a metric of a specific journal, and not of authors and their research findings. Similar issues concern third party funding as an indicator: It reflects the past, and not the presence or future, and is affected by numerous factors that are only losely dependent or even independent of the quality and potential of a researcher or his/her project. But it is a number, and it can be objectively measured, down to the penny! Despite widespread criticism of these conventional metrics, they remain the backbone of almost all assessment exercises. Most researchers and research administrators admit that this approach is far from being perfect, but they posit that they are the best of all the worse solutions. In addition, they lament that there are no alternatives. To those I recommend John Ioannidis’ and Muin Khoury’s recent opinion article in JAMA. 2014 Aug 6;312(5):483-4. [Sorry for continuing to feature articles by John Ioannidis, but he keeps on pulling brilliant ideas out of his hat]
They propose the ‘PQRST index’ for assessing value in biomedical research. What is it?
p-value vs. positive predictive value
Riddle me this:
What does it mean if a result is reported as significant at p < 0.05?
A If we were to repeat the analysis many times, using new data each time, and if the null hypothesis were really true, then on only 5% of those occasions would we (falsely) reject it.
B Without knowing the statistical power of the experiment, and not knowing the prior probability of the hypothesis, I cannot estimate the probability whether a significant research finding (p < 0.05) reflects a true effect.
C The probability that the result is a fluke (the hypothesis was wrong, the drug doesn’t work, etc.), is below 5 %. In other words, there is a less than 5 % chance that my results are due to chance.
(solution at the end of this post)
Be honest, although it doesn’t sound very sophisticated (as opposed to A and B), you were tempted to chose C, since it makes a lot of sense, and represents your own interpretation of the p-value when reading and writing papers. You are in good company. But is C really the correct answer?
Pick one: Genomic responses in mouse models POORLY/GREATLY mimic human inflammatory diseases
About a year ago Seok et al. shocked the biomedical world with the verdict that mice are not humans, or more specifically, that the blood genomic responses in various inflammatory conditions do not correlate at all between human patients, and the corresponding disease models (see previous post , as well as this one). Now another paper, by Takao et al. and also published in PNAS, concludes the exact opposite, that is that there is a near perfect correlation between blood genomic responses in mouse and man.
Meanwhile, the initial publication is among the top cited medical publications of the last year, and hundreds of newspapers and blogs (including this one) have covered it. It will be interesting to see how much media coverage the Takao paper will receive, probably much less. But what happened, which paper should we believe?
Sind die meisten Forschungsergebnisse tatsächlich falsch?
In July, Laborjournal (‘LabTimes’), a free German monthly for life scientists (sort of a hybrid between the Economist and the British Tabloid The Sun), celebrated its 20th anniversary with a special issue. I was asked to contribute an article. In it I try to answer the question whether most published research findings are false, as John Ioannidis rhetorically asked in 2005.
To find out, you have to be able to read German, and click here for a pdf of the article (in German).
Higgs’ boson and the certainty of knowledge
“Five sigma,” is the gold standard for statistical significance in physics for particle discovery. When the New Scientist reported about the putative confirmation of the Higgs boson, they wrote:
‘Five-sigma corresponds to a p-value, or probability, of 3×10-7, or about 1 in 3.5 million. There’s a 5-in-10 million chance that the Higgs is a fluke.’
Does that mean that p-values can tell us the probability of being correct about our hypotheses? Can we use p-values to decide about the truth (correctness) of hypotheses? Does p<0.05 mean that there is a smaller than 5 % chance that an experimental hypothesis is wrong?
Can mice be trusted?
I started this blog with an post on a PNAS paper which at that time had received a lot of attention in the scientific community and lay press. In this article, Seok et al. argued that ‘genomic responses in mouse models poorly mimic human inflammatory diseases‘. With this post I am returning to this article, as I recently was asked by the Journal STROKE to contribute to their ‘Controversies in Stroke’ series. The Seok article had disturbed the Stroke community, so a pro/con discussion seemed timely. In the upcoming issue of STROKE Sharp and Jickling will argue that ‘the peripheral inflammatory response in rodent ischemic stroke models is different than in human stroke. Given the important role of the immune system in stroke, this could be a major handicap in translating results in rodent stroke models to clinical trials in patients with stroke.‘ This is of course true! Nevertheless, I counter by providing some examples of translational successes regarding stroke and the immune system, and conclude that ‘the physiology and pathophysiology of rodents is sufficiently similar to humans to make them a highly relevant model organism but also sufficiently different to mandate an awareness of potential resulting pitfalls. In any case, before hastily discarding highly relevant past, present, and future findings, experimental stroke research needs to improve dramatically its internal and external validity to overcome its apparent translational failures.’ For an in depth treatment, follow the debate:
Article: Dirnagl: Can mice be trusted
Article: Sharp Jickling: Differences between mice and humans
Quality assurance and management in experimental research
Currently, a worldwide discussion among stakeholders of the biomedical research enterprise revolves around the recent realization that a significant proportion of the current resources spent on medical research are wasted, as well as around potential actions to increase its value. The reproducibility of results in experimental biomedicine is generally low, and the vast majority of medical interventions introduced into clinical testing after successful preclinical development prove unsafe or ineffective. One prominent explanation for these problems is flawed preclinical research. There is consensus that the quality of biomedical research needs to be improved. ‘Quality’ is a broad and generic term, and it is clear that a plethora of factors together determine the robustness and predictiveness of basic and preclinical research results. Against this background the experimental laboratories of the Center for Stroke Research Berlin (CSB, Dept. of Experimental Neurology) have decided to take a systematic approach and to implement a structured quality management system. In a process involving all members of the department from student to technician, post doc, and group leader in over more than one year we have established standard operating produres, defined common goals and indicators, improved communication structures and document management, implemented an error management, are developing an electronic laboratory notebook, among other measures. On July 3rd 2014 this quality management system successfully passed an ISO 9001 certification process (Certificate 12 100 48301 TMS). The auditors were impressed by the quality oriented ‘spirit’ of all members of the Department, and the fact that to their knowledge the CSB is the first academic institution worldwide which has established a structured quality management in experimental research of this standard and reach. The CSB is fully aware of the fact that the mere fact that a certified quality management has been implemented does not guarantee translational success. However, we believe that innovation will only have impact on the improvement of the outcome of patients if it thrives on the highest possible standards of quality. Certification of our standards renders them transparent and verifiable to the research community, and serves as a first step towards a preclinical medicine in which research conduct and results can be monitored and audited by peers.
Post-publication commenting
Amidst a flurry of retractions of research articles from high level journals and growing concerns about the non-reproducibility of research findings, the time-honored (some say old-fashioned) closed pre-publication mode of peer review has come under critique. Major issues concern the quality of reviews and lack of transparency. A number of modifications and alternative models have been proposed (e.g. Front. Comput. Neurosci. 6:94;2012), including open post-publication review. Most publications are no longer read in printed and bound issues of a journal, but rather accessed in digital form via the internet. This allows for novel forms of readership participation, such as post-publication review and online commenting and discussion of articles. Several journals are experimenting with such novel features (e.g. PLOS One), and some are based on it (e.g. F1000Research or eLife). Nevertheless, most established journals are hesitant to give up their time honored modes of publishing. They argue that closed pre-publication review may not be perfect, but that the alternatives are untested, and may actually be worse. Post-publication commenting requires software upgrades to journal websites, as well as monitoring and moderation of content, and there may be legal issues. Another problem relates to the troubling fact that a substantial fraction of the biomedical literature is not read at all (even if cited!), which means that we may not be able to solely rely on processes that take place after publication.






