May 12, 2019

Love thy NULL result as thy statistically most significant!

Damn! What an effort: Generation of a knockout mouse line, back crossing in background strain and litermates, all the genotyping. Followed by a plethora of experiments in a disease model: surgery, magnetic resonance imaging, histology, behavioral studies, and so on. Finally the result: No phenotype! The knockout mouse appears to be a mouse like any other. Not different from the wild type background strain. But wait, we rather need to phrase it like this: We did not find a statistically significant difference between knockout and wild type. So we cannot even conclude that wild type are like knowout mice, but rather: If there is a difference, it might be smaller than the detectable effect size, depended on sample size, error level (alpha and beta) and the variance of our results. But we had planned our experiments well: The sample size was determined a priori, and chosen so that we would have been able to detect a difference on the order of one standard deviation. This is what statisticians call a Cohen’s d of 1, which is considered a substantial effect. We could not have done more animals than the (34!), because of limited ressources, the duration of the PhD thesis, and the timing of the grant. But what now? Write a paper? Reporting a NULL result? How would this look like in a resume, besides, who cares about NULL results, and which reputable journal would publish them at all?

It is quite likely that a sequence of events like this, not necessarily involving knockout mice, occurs quite frequently in many laboratories worldwide. Experiments were carried out properly, but the results did not reject the NULL hypothesis, and consequently disappeared in the file drawer.

This is a huge mistake, because we should love our NULL results like our highly significant ones! But isn’t that nonsense? Isn’t a result that takes us one step closer to curing Alzheimer’s disease or breast cancer much more valuable than a NULL?

Consider Christopher Columbus. The discovery of America was a significant result, much better than cruising around on the ocean and just seeing the sea. But wait: To create a nautical chart, which you you need to discover foreign countries, you have to know where there are no islands and no shoals. Columbus would not have been funded by the king of Spain, nor dared to set sail without such a map. A map that preceding seafarers had drawn.

But let’s get back to setting closer to home. An experiment that delivers the key to the cure of Alzheimer’s disease. And compare it to an experiment without a statistically significant result. Let’s be honest: How many of these game-changing results can there be at all? And how likely it is us who will win this jackpot? Not zero, but low. Isn’t it reassuring to have at least contributed to the fact that the ‘map’ of biology and all that can go wrong on it (which we call disease mechanisms) has become more precise? And now we can all ‘navigate’ a little better?

Furthermore, we usually overestimate our statistically ‘significant’ results in their significance! Even a statistically significant result does not tell us how likely it is that our hypothesis was correct. Just as the NULL result does not tell us whether our hypothesis was wrong. This is because we never know how likely the hypothesis was in the first place. And because in most instances our statistical power was too low. With sufficiently large experiments you can make any comparison statistically significant, that is accept the alternative hypothesis. Or vice versa, with too small sample sizes you will never be able to reject any NULL hypothesis. Furthermore, many of our hypotheses (hopefully!) are quite unlikely. Otherwise we would be boring scientists. But unfortunately, the less likely hypotheses are true, the higher the rate of false positive results will be. (Click here for a more comprehensive treatment of this problem).

We should therefore design our experimental studies in such a way that the results are interesting, i.e. informative, even if the NULL hypothesis cannot be rejected. The focus should not be on the statistical significance of the result – but instead on the research question and the appropriate methodology and analysis, as well as on the effect size and its variance. As scientist we can only influence experimental design, not the results! Unless we cheat….

We are justifiably proud of the fact that science is self correcting, that false conclusions are regularly eliminated by subsequent experiments. However, this cannot work properly if unwanted results disappear in the drawer (‘file drawer effect’).

But when can we call NULL results informative? If they are planned and carried out according to state of the art and with sufficient statistical power. If they contribute to the current state of knowledge. If they are potentially useful for the community of researchers. If they stop us from going the wrong way or doing unnecessary experiments, or if we can aggregate the results into meta-analyses.

NULL results have a variety of great properties: they are more robust than statistically significant ones. In other words, as paradoxical as it may sound, a NULL result is much more likely correct than a statistically significant one. NULL results can prevent our colleagues from unnecessarily entering experimental dead ends. NULL results, if published, make evidence syntheses in the form of meta-analyses meaningful. NULL results create boundaries of knowledge, waymarks within which statistically significant results unfold their full power.

But what about the argument that they are more difficult to publish? This may indeed have been the case a number of years ago. It is true that they can seldom be published in ‘top journals’. Unless it’s a NULL result questioning a dogma or textbook knowledge and comes from a prominent lab. But the insight into the usefulness of NULL results, and the damage caused by preferring statistically significant results over NULL results, has led to a paradigm shift among many publishers. And with that, new journals have been brought onto the scene. Established journals now sometimes have ‘NULL and negative result sections’ (e.g. J Cereb Blood Flow Metab). PLOS One, Peer J, or F1000Research publish studies regardless of their statistical outcome. If research question, methodology and analysis are consistent, results will be published. The web tool FIDDLE of the QUEST – Center can help you to find the right publication path for NULL results.

Are NULL results bad for your career, do they ‘contaminate’ resumes? The Charité, for example, rewards the publication of NULL results with additional research funds. We also ask applicants for professorships whether they have ever published NULL results or replicated studies of others, and intend to start or continue doing so. A small step, but an indication that institutions begin to modify their incentives and rewards.

‘In science, the only failed experiment is one that does not lead to a conclusion.’ (Chris Mack)

Inspired by Anne Scheels excellent post: ‚Why we should love null results‘ http://www.the100.ci/2017/06/01/why-we-should-love-null-results/

A German version of this post has been published as part of my monthly column in the Laborjournal: http://www.laborjournal-archiv.de/epaper/LJ_19_05/24/index.html

To infinity, and beyond!

Love thy NULL result as thy statistically most significant!

Leave a comment Cancel reply

Share this:

Related

Leave a comment Cancel reply