Due to small group sizes and presence of substantial bias experimental medicine produces a large number of false positive results (see previous post). It has been claimed that 50 – 90 % of all results may be false (see previous post). In support of these claims is the staggerlingly low number of experiments that can be replicated. But what are the chances to reproduce a finding that is actually true?
In a massive metaanalysis of animal studies of six neurological diseases (EAE/MS; Parkinsons; Ischemic stroke; Spinal cord injury; Intracerebral hemorraghe; Alzheimer’s disease) Tsilidis at al. have demonstrated that the published literature in these fields has an excess of statistically significant results that are due to biases in reporting (PLoS Biol. 2013 Jul;11(7):e1001609). By including more than 4000 datasets (from more than 1000 individual studies!) which they synthesized in 160 metaanalyses they impressively substantiate that there are way too many ‘positive’ results in the literature! Underlying reasons are reporting bias, including study publication bias, selective outcome reporting bias (where null results are omitted) and selective analysis bias (where data are analysed with different methods that favour ‘positive’ results). Study size was low (mean 16 animals), less than 1/3 of the studied randomized, or evaluated outcome in a blinded fashion, and only 39 of 4140 studies performed sample size calculations!
In a highly cited paper in 2005, John Ioannidis answered the question ‘Why most published research findings are false’ (PLoS Med. 2, e124). The answer, in one sentence, is ‘because of low statistical power and bias’. A current analysis in Nature Reviews Neuroscience ‘Power failure: why small sample size undermines the reliability of neuroscience’ (advance online publication, Ioannidis is a coauthor) now focuses on the neurosciences, and provides empirical evidence that in a wide variety of neuroscience fields (including imaging and animal modeling) exceedingly low statistical power and hence very low positive predictive values are the norm. This explains low reproducibility (e.g. special issue in Exp. Neurol. with (lack of) reproduction in spinal cord injury research, Exp Neurol. 2012 Feb;233(2):597-605) and inflated effect sizes. Besides this meta-analysis on power in neuroscience research, the article also contains a highly readable primer on the concepts of power, positive predictive value, type I and II error, as well as effect size. Must read.