Can (Non)-Replication be a Sin?

I failed to reproduce the results of my experiments! Some of us are haunted by this horror vision. The scientific academies, the journals and in the meantime the sponsors themselves are all calling for reproducibility, replicability and robustness of research. A movement for “reproducible science” has developed. Sponsorship programs for the replication of research papers are now in the works.In some branches of science, especially in psychology, but also in fields like cancer research, results are now being systematically replicated… or not, thus we are now in the throws of a “reproducibility crisis”.
Now Daniel Fanelli, a scientist who up to now could be expected to side with those who support the reproducible science movement, has raised a warning voice. In the prestigious Proceedings of the National Academy of Sciences he asked rhetorically: “Is science really facing a reproducibility crisis, and if so, do we need it?” So todayon the eve, perhaps, of a budding oppositional movement, I want to have a look at some of the objections to the “reproducible science” mantra. Is reproducibility of results really the fundament of scientific methods? Continue reading

When you come to a fork in the road: Take it

It is for good reason that researchers are the object of envy. When not stuck with bothersome tasks such as grant applications, reviews, or preparing lectures, they actually get paid for pursuing their wildest ideas! To boldly go where no human has gone before! We poke about through scientific literature, carry out pilot experiments that surprisingly almost always succeed. Then we do a series of carefully planned and costly experiments. Sometimes they turn out well, often not, but they do lead us into the unknown. This is how ideas become hypotheses; one hypothesis leads to those that follow, and voila, low and behold, we confirm them! In the end, sometimes only after several years and considerable wear and tear on personnel and material, we manage then to weave a “story” out of them (see also). Through a complex chain of results the story closes with a “happy end”, perhaps in the form of a new biological mechanism, but at least as a little piece to fit the puzzle, and it is always presented to the world by means of a publication. Sometimes even in one of the top journals. Continue reading

Of Mice, Macaques and Men

Tuberculosis kills far more than a million people worldwide per year. The situation is particularly problematic in southern Africa, eastern Europe and Central Asia. There is no truely effective vaccination for tuberculosis (TB). In countries with a high incidence, a live vaccination is carried out with the diluted vaccination strain Bacillus Calmette-Guérin (BCG), but BCG gives very little protection against tuberculosis of the lungs, and in all cases the vaccination is highly variable and unpredictable. For years, a worldwide search has been going on for a better TB vaccination.

Recently, the  British Medical Journal has published an investigation in which serious charges have been raised against researchers and their universities: conflicts of interest, animal experiments of questionable quality, selective use of data, deception of grant-givers and ethics commissions, all the way up to endangerment of study participants. There was also a whistle blower… who had to pack his bags. It all happened in Oxford, at one of the most prestigious virological institutes on earth, and the study on humans was carried out on infants of the most destitute layers of the population. Let’s have a closer look at this explosive mix in more detail, for we have much to learn from it about

  • the ethical dimension of preclinical research and the dire consequences that low quality in animal experiments and selective reporting can have;
  • the important role of systematic reviews of preclinical research, and finally also about
  • the selective (or non) availability and scrutiny of preclinical evidence when commissions and authorities decide on clinical studies.

Continue reading

Believe it or not!

Medicine is full of myths.  Sometimes you even get the impression that it is actually based mostly on myths.  Many are so plausible that you would have to be a fool to not believe in them.  And so today let us take a closer look at the placebo effect. In doing so we will run into a surprisingly little-known phenomenon: regression to mean.  This has also implications for experimenters.

Hardly anyone doubts the almost magic effects of the placebo effect, so perhaps it will surprise you to hear that hard evidence for its existence is rather weak — and that there are some important arguments against its efficiency.  Cochrane reviews, after all the golden standard for systematic reviews, did not find convincing evidence for its effectivity. They demonstrate that placebos might be effective when it comes to patient reported outcomes, particularly for pain and nausea.  But the effects, should there be any at all, are not that impressive.  For so-called “observer reported outcomes”, i.e. whenever study doctors did the measuring, no effectiveness was found at all.

Since you probably consider the placebo effect to be one of the fundaments of medicine and me to be a fool, you might just shake your head and push this post aside.  Or you allow me to proffer a few arguments as to why the placebo effect is a clearly overrated phenomenon.  You would then also learn something about regression to the mean.  And this might even be relevant to your own research.

Continue reading

And the Moral of the Story is: Don’t believe your p-values!

In my previous post I had a look at the culture of science in physics, and found much that we life scientists might want to copy.  Physics itself, and especially particles physics, present a goldmine of lessons to be learned, two of which I would like to discuss with you today.

Some of you will remember: In 2001 the results of a large international experiment convulsed not only the field of physics; it shook the whole world. On September 22nd the New York Times ran it on the front page: “Einstein Roll Over? Tiny neutrinos may have broken cosmic speed limit”! What had happened? Continue reading

Don’t ask what your Experiment can do for You: Ask what You can do for your Experiment!

I was planning to highlight physics as a veritable model, as champion of publications culture and team science from which we in the life sciences could learn so much.  And then this:  The Nobel Prize for physics went to Rainer Weiss, Barry Barish and Kip Thorne for the “experimental evidence” for the gravitation waves foreseen in 1919 by Albert Einstein.  Published in a paper with more than 3 000 authors!

Once again the Nobel Prize is being criticized:  That it is always awarded to the “old white men” at American universities, or that good old Mr. Nobel actually stipulated that only one person per area of research be awarded, and only for a discovery in the past year.  Oh, well….  I find more distressing that the Nobel Prize is once again perpetuating an absolutely antiquated image of science:  The lone research geniuses, of whom there are so few, or more precisely, a maximum of three per research area (medicine, chemistry, physics) have “achieved the greatest benefits for humanity”.  Awarded with a spectacle that would do honor to a Eurovision Song Contest or the Oscar Awards.  It doesn’t surprise me that this is received enthusiastically by the public. This cartoon-like image of science has been around since Albert Einstein at the latest.  And from Newton up to World War II, before the industrialization and professionalization of research, this image of science was justified.  What disturbs me is that the scientific community partakes so fulsomely in this anachronism.  You will ask why the Fool is getting so set up again –it’s harmless in the worst case, you say, and the winners are almost always worthy of the prize?  And surely a bit of PR can do no harm in these post-factual times where the opponents of vaccination and the climate-change deniers are on the rise? Continue reading

“Next we….” – The history and perils of scientific storytelling

We scientists are pretty smart.  We pose hypotheses and consequently confirm them in a series of logically connected experiments.  Desired results follow in quick succession; our certainty grows with every step.  Almost unfailingly the results have statistical significance, sometimes to the 5 % level, sometimes the p-value also has a whole string of zeros.  Some of our experiments are independent of each other, some are dependent, for they use the same material, e.g. for molecular biology and histology.  Now we turn tired but happy to the job of illustrating and writing up our results.  Not only had we had a hand in the initial hypothesis, now confirmed.  No, our luck was all the lovelier when we saw that the chain of significant p-values remained unbroken.  That is comparable to the purchase of several lottery tickets which one after the other turn out to be a winner.   If we then manage to convince the reviewer, our work will be printed just as it is.

Am I exaggerating?  A casual glance through the leading journals (Nature, Cell, Science, etc.) will attest that the overwhelming majority of original articles published in them follow this pattern.  The linearity of the pattern, unclouded by aborts or dead ends, typically starting with the phrase “First we…” and then followed by many repetitions of “Next we…”.  A further indication is the almost complete absence of any non-significant results.  Where you occasionally find an “n.s.”, it usually belongs there; if a significance had been reached in those, it would have jeopardized the hypothesis.  As in a group that is not supposed to differ from a control in which the same gene, for example, had been manipulated with differing experimental strategies.

The naïve observer would have to come to the conclusion that the authors of these studies are not only incredibly smart but also implausibly lucky.  He could even consider them braggards or swindlers.  After a few years in science, however, we know that there is something quite different behind it all.  We are telling each other stories.  The long years spent in the laboratory working on the story were quite different.  Many things went wrong, some were ambiguous, or the results didn’t match the hypothesis.  Strategies were changed.  The hypothesis was revised.  And so forth.  So the “smooth running” story was developed and told ex-post.  So actually, it really is a “story”.

But is that a problem? We all know that it doesn’t work the way it’s been told.  And besides, we are for good reason not interested in all the problems we encountered and the wrong paths we followed in our scientific explorations.  They don’t make good reading and would swamp us with useless information. On the other hand, however, telling stories opens the flood gates to a number of bad habits, such as “outcome switching” and the selective use of results obtained.  This has been compared to firing a shot at a wooden wall and then drawing a bullseye around the bullet hole.  With the hole in the middle.  Right in the heart!  That way you can “prove” any old hypothesis!  And we don’t learn anything about results that didn’t make it into the story but would lead us to other hypotheses and new discoveries.

So allow us to ask at this point: Just how did it happen that reporting on scientific discoveries has almost completely detached itself from the processes in the laboratory upon which it is based?  Is it a product of our preference for stories that are smooth and spectacular?  Our academic rewards system, which especially rewards them when they are published in journals with a high impact factor?  Surprisingly, no.  The rhetoric of a linear, uninterrupted and impeccable, logical, timely chain, necessarily pressing forward towards proof of the initial hypothesis, is several hundred years old.  Towards the end of the 17th century experiments were hardly ever published, but rather presented to a public persons who would thus be their witnesses.  The expansion and internationalization of the “scientific community”, which was carried out first by privatizing gentlemen and then more and more by “professionals”, made broadly accessible publication necessary.  These publications developed under the patronage of the scientific societies founded at this time. In the lead was the Royal Society in England with is “Proceedings”, still being published today.  In as far as the experiments were now without “witnesses”, and addressed to a very mixed and yet little specialized public, the readers had to be interested in the object of the study and convinced by its results.  The rest is in the true sense of the word “history”.  The dissociation of the actual logic and practice of a study from its representation in the corresponding scientific publication in favor of a “story” is today’s standard – not only in biomedicine.

A long tradition; we have become accustomed to it, and publication in the journals is accepted only in this manner.  So everything is just fine, right?

I don’t think so.  For one thing, because substantially more studies are being published and they contain substantially more information in the form of sub-studies, and these are essentially more complex in their methods and conceptions. This means that the “degree of freedom” has massively increased, which makes it possible for authors by selecting “desired results” to “substantiate” every thinkable hypothesis.  Furthermore, it has become normal today to mix together in single study the generation of hypotheses by exploration and their confirmation.  And this supposedly affords the reader a clear overview.  But many of the experiments carried out did not make it into the publication?  And why not?  Was the hypothesis “un-biased” by means of explorative experiments and then confirmed in subsequent independent experiments?  Was the hypothesis formulated unambiguously for the confirmation, was the number of required cases determined and bias excluded as far as possible?  I.e. were the experiments randomized and blinded?

But how could the risk that we and our readers be led astray by the selective use of results for the proposes of “story-telling” be minimized?  How can we make the results more robust and make it available in its entirety to the scientific community?

Actually, that is easy.  First, we have to separate exploration and confirmation more clearly from each other.  In exploring, we are searching for new phenomena.  You cannot plan everything in advance, e.g. sample sizes.  You can change the direction the experiments are taking on the basis of the findings coming in.  You have to give serendipity a chance.  You don’t need test statistics; you only have to describe very well the data you have culled in terms of their distribution (e.g. confidence intervals) —  just as you have to describe everything required to make your results sufficiently comprehensible and reproducible.  The result of such discovery phases are the hypotheses.  Because of the originality of hypotheses acquired in this manner and because of the low case numbers, you will necessarily produce many false positive results.  The effect sizes will also be overestimated (see post “How original are your hypotheses really?”).  In a subsequent phase, the results and hypotheses must then be confirmed in a separate study in so far as you consider them interesting and important.  This would entail sorting out the false positives and establishing realistic effect sizes.  Then a preliminary formulation of the hypothesis has to be made, a proper sample size estimated so that the Type I and II error rates are acceptable etc.   Before the experiments are started, you will draw up a detailed analysis plan and will no longer deviate from this or from the test statistics proposed in it.  Should any deviations from the study and analysis plan come necessary in the course of the study, these will be accounted for and reported.  Ideally, you should register this study before it starts (e.g. with a time stamp (and if you wish embargo) at Open Science Framework until publication) so that you can demonstrate at publication that you haven’t told a “story”.  With this you will have a clean separation of explorative and confirmatory studies, which could even published together, as suggested in Nature by Mogil and Macleod for all experimental studies in high-grade journals.

Such a simple separation in design, analysis and publication of explorative and confirmatory studies, together with preregistration, could significantly increase the transparence, validity and reproducibility in experimental biomedical research.  It only has one  disadvantage:  We have to forego all spectacular (but then unreproducible) studies.

A German version of this post has been published as  part of my monthly column in the Laborjournal