Category: Research

Higgs’ boson and the certainty of knowledge

Peter Higgs

Peter Higgs

“Five sigma,” is the gold standard for statistical significance in physics for particle discovery.  When the New Scientist reported about the putative confirmation of the Higgs boson, they wrote:

‘Five-sigma corresponds to a p-value, or probability, of 3×10-7, or about 1 in 3.5 million. There’s a 5-in-10 million chance that the Higgs is a fluke.’ 

Does that mean that p-values can tell us the probability of being correct about our hypotheses? Can we use p-values to decide about the truth (correctness)  of hypotheses? Does p<0.05 mean that there is a smaller than 5 % chance that an experimental hypothesis is wrong?

Continue reading

Can mice be trusted?

Katharina Frisch ‘Mouse and man’

I started this blog with an post on a PNAS paper which at that time had received a lot of attention in the scientific community and lay press. In this article, Seok et al. argued that ‘genomic responses in mouse models poorly mimic human inflammatory diseases‘. With this post I am returning to this article, as I recently was asked by the Journal STROKE to contribute to their ‘Controversies in Stroke’ series. The Seok article had disturbed the Stroke community, so a pro/con discussion seemed timely. In the upcoming issue of STROKE Sharp and Jickling will argue that ‘the peripheral inflammatory response in rodent ischemic stroke models is different than in human stroke. Given the important role of the immune system in stroke, this could be a major handicap in translating results in rodent stroke models to clinical trials in patients with stroke.‘ This is of course true! Nevertheless, I counter by providing some examples of translational successes regarding stroke and the immune system, and conclude that ‘the physiology and pathophysiology of rodents is sufficiently similar to humans to make them a highly relevant model organism but also sufficiently different to mandate an awareness of potential resulting pitfalls. In any case, before hastily discarding highly relevant past, present, and future findings, experimental stroke research needs to improve dramatically its internal and external validity to overcome its apparent translational failures.’ For an in depth treatment, follow the debate:

Article: Dirnagl: Can mice be trusted

Article: Sharp Jickling: Differences between mice and humans

Quality assurance and management in experimental research

tüvsüd

Currently, a worldwide discussion among stakeholders of the biomedical research enterprise revolves around the recent realization that a significant proportion of the current resources spent on medical research are wasted, as well as around potential actions to increase its value. The reproducibility of results in experimental biomedicine is generally low, and the vast majority of medical interventions introduced into clinical testing after successful preclinical development prove unsafe or ineffective. One prominent explanation for these problems is flawed preclinical research. There is consensus that the quality of biomedical research needs to be improved. ‘Quality’ is a broad and generic term, and it is clear that a plethora of factors together determine the robustness and predictiveness of basic and preclinical research results. Against this background the experimental laboratories of the Center for Stroke Research Berlin (CSB, Dept. of Experimental Neurology) have decided to take a systematic approach and to implement a structured quality management system. In a process involving all members of the department from student to technician, post doc, and group leader in over more than one year we have established standard operating produres, defined common goals and indicators, improved communication structures and document management, implemented an error management, are developing an electronic laboratory notebook, among other measures. On July 3rd 2014 this quality management system successfully passed an ISO 9001 certification process (Certificate 12 100 48301 TMS). The auditors were impressed by the quality oriented ‘spirit’ of all members of the Department, and the fact that to their knowledge the CSB is the first academic institution worldwide which has established a structured quality management in experimental research of this standard and reach. The CSB is fully aware of the fact that the mere fact that a certified quality management has been implemented does not guarantee translational success. However, we believe that innovation will only have impact on the improvement of the outcome of patients if it thrives on the highest possible standards of quality. Certification of our standards renders them transparent and verifiable to the research community, and serves as a first step towards a preclinical medicine in which research conduct and results can be monitored and audited by peers.

 

Why post-hoc power calculation is not helpful

powerfistStatistical power is a rare commodity in experimental biomedicine (see previous post), as most studies have very low n’s and are therefore severly underpowered. The concept of statistical power, although almost embarrassingly simple (for a very nice treatment see Button et al.), is shrouded in ignorance,  mysteries and misunderstandings among many researchers. A simple definition states that Power is the probability that, given a specified true difference between two groups, the quantitative results of a study will be deemed statistically significant. The most common misunderstanding may be that power should only be a concern to the researcher if the Null hypothesis could not rejected (p>0.05). I need to deal with this dangerous fallacy in a future post. Another common albeit less perilous misunderstanding is that calculating post-hoc (or ‘retrospective )’ power can explain why an analysis did not achieve significance. Besides proving a severe bias of the researcher towards rejecting the Null hypothesis (‘There must be another reason for not obtaining a significant result than that the hypothesis is incorrect!), this is the equivalent of a statistical tautology. Of course the study was not powerful enough, this is why the result was not significant! To look at this from another standpoint: Provided enough n’s, the Null of every study must be reject. This by the way, is one of the most basic criticisms of Null hypothesis significance testing. Power calculations are useful for the design of studies, but not for their analysis. This was nicely explained by Steven Goodman in his classic article  ‘Goodman The use of predicted confidence intervals when planning experiments and the misuse of power when interpreting results Ann IntMed 1994‘:

First, [post-hoc Power analysis] will always show that there is low power (< 50%) with respect to a nonsignificant difference, making tautological and uninformative the claim that a study is “underpowered” with respect to an observed nonsignificant result. Second, its rationale has an Alice-in-Wonderland feel, and any attempt to sort it out is guaranteed to confuse. The conundrum is the result of a direct collision between the incompatible pretrial and post-trial perspectives. […] Knowledge of the observed difference naturally shifts our perspective toward estimating differences, rather than deciding between them, and makes equal treatment of all nonsignificant results impossible. Once the data are in, the only way to avoid confusion is to not compress results into dichotomous significance verdicts and to avoid post hoc power estimates entirely.  

NB: To avoid misunderstandings: Calculating the n’s needed in future experiments to achieve a certain statistical power based on effect sizes and variance obtained post – hoc from a (pilot) experiment is not called post-hoc power analysis (and the subject of this post), but rather sample size calculation.

For further reading:

Post-publication commenting

 

commentAmidst a flurry of retractions of research articles from high level journals and growing concerns about the non-reproducibility of research findings, the time-honored (some say old-fashioned) closed pre-publication mode of peer review has come under critique. Major issues concern the quality of reviews and lack of transparency. A number of modifications and alternative models have been proposed (e.g. Front. Comput. Neurosci. 6:94;2012), including open post-publication review. Most publications are no longer read in printed and bound issues of a journal, but rather accessed in digital form via the internet. This allows for novel forms of readership participation, such as post-publication review and online commenting and discussion of articles. Several journals are experimenting with such novel features (e.g. PLOS One), and some are based on it (e.g. F1000Research or eLife). Nevertheless, most established journals are hesitant to give up their time honored modes of publishing. They argue that closed pre-publication review may not be perfect, but that the alternatives are untested, and may actually be worse. Post-publication commenting requires software upgrades to journal websites, as well as monitoring and moderation of content, and there may be legal issues. Another problem relates to the troubling fact that a substantial fraction of the biomedical literature is not read at all (even if cited!), which means that we may not be able to solely rely on processes that take place after publication.

Continue reading

Loose cable, significant at p<0.0000002

6sI  just stumbled into a very instructive example which illustrates that p-values should not be misinterpreted as  measures of the probablity with which a research hypothesis is true. In 2011 the OPERA collaboration reported evidence that neutrinos travel faster than light, a finding which violates Einstein’s theory of relativity and if true would have shattered physics as we know it! Their analysis was significant at the 6 sigma level, even more stringent than the accepted but already brutal 5 sigma level of particle discovery (p=3.5 x 10-7). Extraordinary claims require extraordinary evidence ! The results were replicated by the same group, published, and hailed by the world scientific and lay press. A short while later it turned out that the GPS systems were not properly synchronized, and a cable was loose. Neutrinos are back at the speed of light, and we can learn from this that p-values are ignorant of simple systematic errors!

Exploratory and confirmatory preclinical research

explore_confirmIn the current issue of PLOS Biology Kimmelman, Mogil, and Dirnagl argue that distinguishing between exploratory and confirmatory preclinical research will improve  translation: ‘Preclinical researchers confront two overarching agendas related to drug development: selecting interventions amid a vast field of candidates, and producing rigorous evidence of clinical promise for a small number of interventions. They suggest that each challenge  is best met by two different, complementary modes of investigation. In the first (exploratory investigation), researchers should aim at generating robust pathophysiological theories of disease. In the second (confirmatory investigation), researchers should aim at demonstrating strong and reproducible treatment effects in relevant animal models. Each mode entails different study designs, confronts different validity threats, and supports different kinds of inferences. Research policies should seek to disentangle the two modes and leverage their complementarity. In particular, policies should discourage the common use of exploratory studies to support confirmatory inferences, promote a greater volume of confirmatory investigation, and customize design and reporting guidelines for each mode.’

For full article click here.

Blogging as a form of postpublication review

bloggingIn a Neuron View article, Zen Faulkes argues for blogging as a kind of postpublication peer review. He is a veteran blogger, and science tweeter, and knows what he is talking about. The article compares social media  to the the classical forms of scientific discourse (from letter to the editor to talk at a conference) and likens science blogging to an online research conference, although which a much wider reach, even into the lay community. Read it here: Faulkes Neuron View

 

 

Systemic flaws of biomedical research ecosystem

pnasIn the current issue of the Proceedings of the National Academy of Science (USA), four heavyweights, Bruce Alberts, Marc W. Kirschner, Shirley Tilghman, and Harold Varmus, provide fundamental criticism of the US biomedical research system, and offer ideas for ‘Rescuing US biomedical research from its systemic flaws’. Their main point is that ‘The long-held but erroneous assumption of never-ending rapid growth in biomedical science has created an unsustainable hypercompetitive system that is discouraging even the most outstanding prospective students from entering our profession—and making it difficult for seasoned investigators to produce their best work. This is a recipe for long-term decline, and the problems cannot be solved with simplistic approaches.’ Most of the issues they raise are equally applicable to European biomedical research. Full article: PNAS-2014-Alberts-5773-7

 

Correlation vs causation

correlation

Still not convinced that US spending on science, space, and technology correlates with suicides by hanging, strangulation and suffocation? Or that the number of people who drowned by falling into a swimming-pool highly correlates with the number of films Nicolas Cage appeared in? Check Spurious Correlations, a website that teaches you to understand the difference between correlation and causation!