On March 17, 2015 five panelists from cognitive neuroscience and psychology (Sam Schwarzkopf, Chris Chambers, Sophie Scott, Dorothy Bishop, and Neuroskeptic) publicly debated “Is science broken? If so, how can we fix it?” . The event was organized by Experimental Psychology, UCL Division of Psychology and Language Sciences / Faculty of Brain Sciences in London.
The debate revolved around the ‘reproducibility crisis’, and covered false positive rates, replication, faulty statistics, lack of power, publication bias, study preregistration, data sharing, peer review, you name it. Understandably the event caused a stir in the press, journals, and the blogosphere (Nature, Biomed central, Aidan’s Aviary, The Psychologist, etc…).
Remarkably, some of the panelists (notably Sam Schwarzkopf) respectfully opposed the current ‘crusade for true science’ (to which I must confess I subscribe) by arguing that science is not broken at all, but rather, by trying to fix it we run the risk to wreck it for good. Already a few days before the official debate, he and Neuroskeptic had started to exchange arguments on Neuroskeptic’s blog. While both parties appear to agree that science can be improved, they completely disagree in their analysis of the current status of the scientific enterprise, and consequently also on action points.
This predebate argument directed my attention to a blog, which was run by Sam Schwarzkopf, or rather his alter ego, the ‘Devil’s neuroscientist’ for a short, but very productive period. Curiously, the Devil’s neuroscientist retired from blogging the night before the debate, threatening that there will be no future posts! This is sad, because albeit somewhat aggressively, but very much to the point, the Devil’s neuroscientist tried to debunk the thesis that there is any reproducibility crisis, that science is not self-correcting, that studies should be preregistered, etc. In other words, he was arguing against most of the issues raised and remedies suggested also on my pages. In passing, he provided a lot of interesting links to proponents on either side of the fence. Although I do not agree with many of his conclusions, his is by far the most thoughtful treatment on the subject. Most of the time I discuss with fellow scientist who dismiss problems of the current model of biomedical research I get rather unreflected comments. They usually simply celebrate the status quo as the best of all possible worlds and don’t get beyond the statement that there may be a few glitches, but that the model has evolved over centuries of undeniable progress. “If it’s not broken, don’t fix it.”
The Devil’s blog stimulated me to produce a short summary of key arguments of the current debate, to organize my own thoughts and as a courtesy to the busy reader. Continue reading
In 2009, Chalmers and Glasziou investigated sources of avoidable waste in biomedical research and estimated that its cumulative effect was that about 85% of research investment is wasted (Lancet 2009; 374: 86–89). Critical voices have since then questioned the exceedingly high number (85%), or claimed that because of non-linearity’s and idiosyncrasies of the biomedical research process a large number of failures are needed to produce a comparably small number of breakthroughs, and therefore hailed the remaining 15%. Waste is defined as ‘resources consumed by inefficient or non-essential activities’. Does progress really thrive on waste?
Many of the concrete measures proposed to improve the quality and robustness of biomedical research are greeted with great skepticism: ‘Good idea, but how can we implement it, and will it work?’. So here are a few recent best practice examples regarding two key areas: Replication, and the review process. Continue reading
Since the 17th century, when gentlemen scientists were typically seen as trustworthy sources for the truth about humankind and the natural order the tenet is generally accepted that ‘science is based on trust‘. This refers to trust between scientists, as they build on each others data and may question a hypothesis, or a conclusion, but not the quality of the scientifc method applied or the faithfulness of the report, such as a publication. But it also refers to the trust of the public in the scientists which societies support via tax-funded academic systems. Consistently, scientists (in particular in biomedicine) score highest among all professions in ‘trustworthiness’ ratings. Despite often questioning the trustworthiness of their competitors when chatting over a beer or two, they publically vehemently argue against any measure proposed to underpin confidence in their work by any form of scrutiny (e.g. auditing). They instead swiftly invoke Orwellian visions of a ‘science police’ and point out that scrutiny would undermine trust and jeopardize the creativity and ingenuity inherent to the scientific process. I find this quite remarkable. Why should science be exempt from scrutiny and control, when other areas of public and private life sport numerous checks and balances? Science may indeed be the only domain in society which is funded by the public and gets away with strictly rejecting accountability. So why do we trust scientists, but not bankers?
Amidst what has been termed ‘reproducibility crisis’ (see also a number of previous posts) in June 2014 the National Institutes of Health and Nature Publishing Group had convened a workshop on the rigour and reproducibility of preclinical biomedicine. As a result, last week the NIH published ‘Principles and Guidelines for Reporting Preclinical Research‘, and Nature as well as Science ran editorials on it. More than 30 journals, including the Journal of Cerebral Blood Flow and Metabolism, are endorsing the guidelines. The guidelines cover rigour in statistical analysis, transparent reporting and standards (including randomization and blinding as well as sample size justification), and data mining and sharing. This is an important step forward, but implementation has to be enforced and monitored: The ARRIVE guidelines (many items of which reappear in the NIH guidelines) have not been adapted widely yet (see previous post). In this context I highly recommend the article by Henderson et al in Plos Medicine in which they systematically review existing guidelines for in vivo animal experiments. From this the STREAM collaboration distilled a checklist on internal, external, and construct validity which I found more comprehensive and relevant than the one published now by the NIH. In the end, however, it is not so relevant to which guideline (ARRIVE, NIH, STREAM, etc.) researchers, reviewers, editors, or funders comply, but rather whether they use one at all!
Note added 12/15/2014: Check out the PubPeer postpublication discussion on the NIH/Nature/Science initiative (click here)!
In academic biomedicine, and in most countries and research environments, grants, performance based funding, positions, etc., are appraised and rewarded based on very simple quantitative metrics: The impact factor (IF) of previous publications, and the amount of third party funding. For example, at the Charite in Berlin researchers receive ‘Bonus’ funding (to be spent only in research, of course!) which is calculated by adding the IFs of the journals in which the researcher has published over the last 3 years, and the cumulative third party funding during that period (weighted depending on whether the source was the Deutsche Forschungsgemeinschaft (DFG, x3), the ministry of science (BMBF), the EU, foundations (x2), or others (x1). IF and funding contribute 50% each to the bonus. In 2014, this resulted in a bonus of 108 € per IF point, and 8 € per 1000 € funding (weighted).
Admittedly, IF and third party funding are quantitative, hence easily comparable, and using them is easy and ‘just’. But is it smart to select candidates or grants based on a metric that measures the average number of citations to recent articles published in a journal? In other words, a metric of a specific journal, and not of authors and their research findings. Similar issues concern third party funding as an indicator: It reflects the past, and not the presence or future, and is affected by numerous factors that are only losely dependent or even independent of the quality and potential of a researcher or his/her project. But it is a number, and it can be objectively measured, down to the penny! Despite widespread criticism of these conventional metrics, they remain the backbone of almost all assessment exercises. Most researchers and research administrators admit that this approach is far from being perfect, but they posit that they are the best of all the worse solutions. In addition, they lament that there are no alternatives. To those I recommend John Ioannidis’ and Muin Khoury’s recent opinion article in JAMA. 2014 Aug 6;312(5):483-4. [Sorry for continuing to feature articles by John Ioannidis, but he keeps on pulling brilliant ideas out of his hat]
They propose the ‘PQRST index’ for assessing value in biomedical research. What is it?
In 2005 PLOS Medicine published John Ioannidis’ paper ‘Why most published research findings are false’ . The article was a wake up call for many, and now is probably the most influential publication in biomedicine of the last decade (>1.14 Mio views on the PLOS Med webside, thousands of citations in the scientific and lay press, featured in numerous blog posts, etc.). Its title has never been refuted, if anything, it has been replicated, for examples see some of the posts of this blog. Almost 10 years after, Ioannidis now revisits his paper, and the more constructive title ‘How to make more published research true” (PLoS Med. 2014 Oct 21;11(10):e1001747. doi: 10.1371/journal.pmed.1001747.) already indicates that the thrust this time is more forward looking. The article contains numerous suggestions to improve the research enterprise, some subtle and evolutionary, some disruptive and revolutionary, but all of them make a lot of sense. A must read for scientists, funders, journal editors, university administrators, professionals in the health industry, in other words: all stakeholders within the system!
Riddle me this:
What does it mean if a result is reported as significant at p < 0.05?
A If we were to repeat the analysis many times, using new data each time, and if the null hypothesis were really true, then on only 5% of those occasions would we (falsely) reject it.
B Without knowing the statistical power of the experiment, and not knowing the prior probability of the hypothesis, I cannot estimate the probability whether a significant research finding (p < 0.05) reflects a true effect.
C The probability that the result is a fluke (the hypothesis was wrong, the drug doesn’t work, etc.), is below 5 %. In other words, there is a less than 5 % chance that my results are due to chance.
(solution at the end of this post)
Be honest, although it doesn’t sound very sophisticated (as opposed to A and B), you were tempted to chose C, since it makes a lot of sense, and represents your own interpretation of the p-value when reading and writing papers. You are in good company. But is C really the correct answer?
About a year ago Seok et al. shocked the biomedical world with the verdict that mice are not humans, or more specifically, that the blood genomic responses in various inflammatory conditions do not correlate at all between human patients, and the corresponding disease models (see previous post , as well as this one). Now another paper, by Takao et al. and also published in PNAS, concludes the exact opposite, that is that there is a near perfect correlation between blood genomic responses in mouse and man.
Meanwhile, the initial publication is among the top cited medical publications of the last year, and hundreds of newspapers and blogs (including this one) have covered it. It will be interesting to see how much media coverage the Takao paper will receive, probably much less. But what happened, which paper should we believe?
In July, Laborjournal (‘LabTimes’), a free German monthly for life scientists (sort of a hybrid between the Economist and the British Tabloid The Sun), celebrated its 20th anniversary with a special issue. I was asked to contribute an article. In it I try to answer the question whether most published research findings are false, as John Ioannidis rhetorically asked in 2005.
To find out, you have to be able to read German, and click here for a pdf of the article (in German).