It is for good reason that researchers are the object of envy. When not stuck with bothersome tasks such as grant applications, reviews, or preparing lectures, they actually get paid for pursuing their wildest ideas! To boldly go where no human has gone before! We poke about through scientific literature, carry out pilot experiments that surprisingly almost always succeed. Then we do a series of carefully planned and costly experiments. Sometimes they turn out well, often not, but they do lead us into the unknown. This is how ideas become hypotheses; one hypothesis leads to those that follow, and voila, low and behold, we confirm them! In the end, sometimes only after several years and considerable wear and tear on personnel and material, we manage then to weave a “story” out of them (see also). Through a complex chain of results the story closes with a “happy end”, perhaps in the form of a new biological mechanism, but at least as a little piece to fit the puzzle, and it is always presented to the world by means of a publication. Sometimes even in one of the top journals.
In his short story In the Garden of Forking Paths (1944) Jorge Luis Borges (1899 – 1986) describes the mysterious work of the fictive Chinese writer Ts’ui Pen. When in Ts’ui Pen’s stories several developments are possible, these do not occur alternatively! They occur simultaneously, so that the story branches out into a universe of multiple possible story lines that themselves branch apart, but can also draw together. Borges’ metaphor of the garden of forking paths — an endless labyrinth — has inspired many artists, particularly in the domain of hyperfiction. And the statisticians Gelman and Loken recently imported them into the method criticism of psychological and biomedical research. They compare the procedure of scientists with Ts’ui Pen’s garden: In their research they walk on branched-out trails through a garden of knowledge. And as poetic as this wandering might seem, Gelman and Loken claim that it withholds certain dangers. And today I want to investigate this, because these dangers are scarcely known to many experimenters.
Let us follow a fictive scientist in the garden of his/her research (see animated gif below). There we find a veritable labyrinth of paths. He (or she, naturally) looks for a path according to his results, the analyses thereof and any available evidence of other researchers. He enters the labyrinth with an idea (He will say: with an hypothesis). Right away he performs a first experiment to verify it and is pleased with the results for statistical significance: a band at the right spot in the Western Blot! He takes a left turn. In a further, subsequent experiment unfortunately the p-value overshoots the magical 5 %. He therefore takes the path to the right. While wandering, he reads another current paper that confirms his thinking up to now and gives him a new idea for his next experiment: He turns into a path leading to the left. There, alas, the following experiment renders a statistically significant difference. He thus continues straight on but unfortunately this does not produce usable results. Our researcher goes back to the previous fork in the road. Here his mood picks up again: The result of the knock-out mouse can be duplicated in the pharmacological test! Two paths converge again, the path broadens, far ahead he spies exit from the labyrinth. The next experiment also works — a protein suspected to be in the signaling pathway is confirmed by immunohistochemistry. And: Blocking it leads to a statistically significant difference to the control group! In the literature he finds that the signaling pathway had already been described in another model disease – also good news. He thereupon turns left and it’s done: He can leave the labyrinth. After many competently performed experiments, a number of statistically significant comparisons entirely without p-hacking (testing over and over until you get a statistical significance) or harking (hypothesizing after the results are known). The prize awaits him: an article in a journal of repute.
Animated gif to illustrate what it means to wander in the garden of the forking paths:
So good research leads us through the labyrinth of complex biology! So why spoil the party? Well, here I would like to point out a tricky problem. On his (her) way through the labyrinth the researcher proceeds in an inductively deterministic manner. He does not at all notice the many levels of freedom available to him. These will arise, for example, through an alternative analysis (or interpretations) of the experiment. Or through false positive or false negative results. Or from the choice of another article as the basis for further experiments and interpretations. And so on. The labyrinth is of infinite size! There is not only one way through it, but many, and there are many exits. And since our researcher is proceeding exploratively, he set up no advance rules according to which he should carry out his analyses or plan further experiments. So the many other possible results escape his notice, since he is following a trail that he himself laid. So he overestimates the strength of evidence that he generates! In particular, he overestimates what a significant p-value means regarding his explorative wanderings. He should namely compare his results with all the alternative analyses and interpretations that he could have carried out. An absurd suggestion, of course, that wouldn’t fly. Adapting a famoous quote of the American baseball-philosopher of the New York Yankees, Yogi Berra (1925-2015): The researcher, upon encountering a fork in the road, must take it! In Borges’ garden of the forking path this means: always turn both to the right and the left!
In the garden of forking paths, the classic definition of statistical significance (e.g. p<0.05) does not apply. P<0.05 states that in the absence of an effect, the probability of bumping coincidentally into a similarly extreme or even more extreme result is lower than 5 %! You would have to average out all data and analyses that would be possible in the garden. Each of these other paths could also have led to statistically significant results. Such a comparison however is impossible in explorative research. If you nonetheless do generate p-values, you will, according to Gelman and Lokens, get a “machine for producing and publicizing random patterns”. Although, mind you, the published analyses of researchers are absolutely congruent with the hypotheses that originally motivated their experiments.
So what follows from these only seemingly esoteric ruminations? By no means do they refute exploration — the avid wanderings through the garden of forking paths! They do however point out that our knowledge — or the fruit plucked from our wanderings — is less robust than the chain of statistically significant results might have us believe. Consequently, the use of test statistics in exploration is less helpful. And therefore, actually superfluous if not even misleading. I have already pointed out a number of further substantial arguments for greater skepticism in our own results as well as the aberrations of using statistics tests (see also). And another point: Good leadership through the labyrinth is confirmation, i.e. experiments with predetermined procedure and analysis and with a sufficiently high number of cases.
A German version of this post has been published as part of my monthly column in the Laborjournal:
Animated gif added April 15, 2018
Gelman A and Loken E (2013) The garden of forking paths: Why multiple comparisons can be a problem, even when there is no “fishing expedition” or “p-hacking” and the research hypothesis was posited ahead of time. http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf
Gelman A and Loken E (2014) The statistical crisis in science. American Scientist 102:460-465
de Groot AD (1956) The meaning of “significance” for different types of research [translated and annotated by Eric-Jan Wagenmakers, Denny Borsboom, Josine Verhagen, Rogier Kievit, Marjan Bakker, Angelique Cramer, Dora Matzke, Don Mellenbergh, and Han L. J. van der Maas]. Acta Psychol (Amst). 2014 May;148:188-94. doi: 10.1016/j.actpsy.2014.02.001. Epub 2014 Mar 3. http://www.sciencedirect.com/science/article/pii/S0001691814000304?via%3Dihub
Yogi Berra: When you come to a fork in the road, take it! https://quoteinvestigator.com/2013/07/25/fork-road/