In my previous post I had a look at the culture of science in physics, and found much that we life scientists might want to copy. Physics itself, and especially particles physics, present a goldmine of lessons to be learned, two of which I would like to discuss with you today.
Some of you will remember: In 2001 the results of a large international experiment convulsed not only the field of physics; it shook the whole world. On September 22nd the New York Times ran it on the front page: “Einstein Roll Over? Tiny neutrinos may have broken cosmic speed limit”! What had happened? Continue reading
I was planning to highlight physics as a veritable model, as champion of publications culture and team science from which we in the life sciences could learn so much. And then this: The Nobel Prize for physics went to Rainer Weiss, Barry Barish and Kip Thorne for the “experimental evidence” for the gravitation waves foreseen in 1919 by Albert Einstein. Published in a paper with more than 3 000 authors!
Once again the Nobel Prize is being criticized: That it is always awarded to the “old white men” at American universities, or that good old Mr. Nobel actually stipulated that only one person per area of research be awarded, and only for a discovery in the past year. Oh, well…. I find more distressing that the Nobel Prize is once again perpetuating an absolutely antiquated image of science: The lone research geniuses, of whom there are so few, or more precisely, a maximum of three per research area (medicine, chemistry, physics) have “achieved the greatest benefits for humanity”. Awarded with a spectacle that would do honor to a Eurovision Song Contest or the Oscar Awards. It doesn’t surprise me that this is received enthusiastically by the public. This cartoon-like image of science has been around since Albert Einstein at the latest. And from Newton up to World War II, before the industrialization and professionalization of research, this image of science was justified. What disturbs me is that the scientific community partakes so fulsomely in this anachronism. You will ask why the Fool is getting so set up again –it’s harmless in the worst case, you say, and the winners are almost always worthy of the prize? And surely a bit of PR can do no harm in these post-factual times where the opponents of vaccination and the climate-change deniers are on the rise? Continue reading
We scientists are pretty smart. We pose hypotheses and consequently confirm them in a series of logically connected experiments. Desired results follow in quick succession; our certainty grows with every step. Almost unfailingly the results have statistical significance, sometimes to the 5 % level, sometimes the p-value also has a whole string of zeros. Some of our experiments are independent of each other, some are dependent, for they use the same material, e.g. for molecular biology and histology. Now we turn tired but happy to the job of illustrating and writing up our results. Not only had we had a hand in the initial hypothesis, now confirmed. No, our luck was all the lovelier when we saw that the chain of significant p-values remained unbroken. That is comparable to the purchase of several lottery tickets which one after the other turn out to be a winner. If we then manage to convince the reviewer, our work will be printed just as it is.
Am I exaggerating? A casual glance through the leading journals (Nature, Cell, Science, etc.) will attest that the overwhelming majority of original articles published in them follow this pattern. The linearity of the pattern, unclouded by aborts or dead ends, typically starting with the phrase “First we…” and then followed by many repetitions of “Next we…”. A further indication is the almost complete absence of any non-significant results. Where you occasionally find an “n.s.”, it usually belongs there; if a significance had been reached in those, it would have jeopardized the hypothesis. As in a group that is not supposed to differ from a control in which the same gene, for example, had been manipulated with differing experimental strategies.
The naïve observer would have to come to the conclusion that the authors of these studies are not only incredibly smart but also implausibly lucky. He could even consider them braggards or swindlers. After a few years in science, however, we know that there is something quite different behind it all. We are telling each other stories. The long years spent in the laboratory working on the story were quite different. Many things went wrong, some were ambiguous, or the results didn’t match the hypothesis. Strategies were changed. The hypothesis was revised. And so forth. So the “smooth running” story was developed and told ex-post. So actually, it really is a “story”.
But is that a problem? We all know that it doesn’t work the way it’s been told. And besides, we are for good reason not interested in all the problems we encountered and the wrong paths we followed in our scientific explorations. They don’t make good reading and would swamp us with useless information. On the other hand, however, telling stories opens the flood gates to a number of bad habits, such as “outcome switching” and the selective use of results obtained. This has been compared to firing a shot at a wooden wall and then drawing a bullseye around the bullet hole. With the hole in the middle. Right in the heart! That way you can “prove” any old hypothesis! And we don’t learn anything about results that didn’t make it into the story but would lead us to other hypotheses and new discoveries.
So allow us to ask at this point: Just how did it happen that reporting on scientific discoveries has almost completely detached itself from the processes in the laboratory upon which it is based? Is it a product of our preference for stories that are smooth and spectacular? Our academic rewards system, which especially rewards them when they are published in journals with a high impact factor? Surprisingly, no. The rhetoric of a linear, uninterrupted and impeccable, logical, timely chain, necessarily pressing forward towards proof of the initial hypothesis, is several hundred years old. Towards the end of the 17th century experiments were hardly ever published, but rather presented to a public persons who would thus be their witnesses. The expansion and internationalization of the “scientific community”, which was carried out first by privatizing gentlemen and then more and more by “professionals”, made broadly accessible publication necessary. These publications developed under the patronage of the scientific societies founded at this time. In the lead was the Royal Society in England with is “Proceedings”, still being published today. In as far as the experiments were now without “witnesses”, and addressed to a very mixed and yet little specialized public, the readers had to be interested in the object of the study and convinced by its results. The rest is in the true sense of the word “history”. The dissociation of the actual logic and practice of a study from its representation in the corresponding scientific publication in favor of a “story” is today’s standard – not only in biomedicine.
A long tradition; we have become accustomed to it, and publication in the journals is accepted only in this manner. So everything is just fine, right?
I don’t think so. For one thing, because substantially more studies are being published and they contain substantially more information in the form of sub-studies, and these are essentially more complex in their methods and conceptions. This means that the “degree of freedom” has massively increased, which makes it possible for authors by selecting “desired results” to “substantiate” every thinkable hypothesis. Furthermore, it has become normal today to mix together in single study the generation of hypotheses by exploration and their confirmation. And this supposedly affords the reader a clear overview. But many of the experiments carried out did not make it into the publication? And why not? Was the hypothesis “un-biased” by means of explorative experiments and then confirmed in subsequent independent experiments? Was the hypothesis formulated unambiguously for the confirmation, was the number of required cases determined and bias excluded as far as possible? I.e. were the experiments randomized and blinded?
But how could the risk that we and our readers be led astray by the selective use of results for the proposes of “story-telling” be minimized? How can we make the results more robust and make it available in its entirety to the scientific community?
Actually, that is easy. First, we have to separate exploration and confirmation more clearly from each other. In exploring, we are searching for new phenomena. You cannot plan everything in advance, e.g. sample sizes. You can change the direction the experiments are taking on the basis of the findings coming in. You have to give serendipity a chance. You don’t need test statistics; you only have to describe very well the data you have culled in terms of their distribution (e.g. confidence intervals) — just as you have to describe everything required to make your results sufficiently comprehensible and reproducible. The result of such discovery phases are the hypotheses. Because of the originality of hypotheses acquired in this manner and because of the low case numbers, you will necessarily produce many false positive results. The effect sizes will also be overestimated (see post “How original are your hypotheses really?”). In a subsequent phase, the results and hypotheses must then be confirmed in a separate study in so far as you consider them interesting and important. This would entail sorting out the false positives and establishing realistic effect sizes. Then a preliminary formulation of the hypothesis has to be made, a proper sample size estimated so that the Type I and II error rates are acceptable etc. Before the experiments are started, you will draw up a detailed analysis plan and will no longer deviate from this or from the test statistics proposed in it. Should any deviations from the study and analysis plan come necessary in the course of the study, these will be accounted for and reported. Ideally, you should register this study before it starts (e.g. with a time stamp (and if you wish embargo) at Open Science Framework until publication) so that you can demonstrate at publication that you haven’t told a “story”. With this you will have a clean separation of explorative and confirmatory studies, which could even published together, as suggested in Nature by Mogil and Macleod for all experimental studies in high-grade journals.
Such a simple separation in design, analysis and publication of explorative and confirmatory studies, together with preregistration, could significantly increase the transparence, validity and reproducibility in experimental biomedical research. It only has one disadvantage: We have to forego all spectacular (but then unreproducible) studies.
A German version of this post has been published as part of my monthly column in the Laborjournal: http://www.laborjournal-archiv.de/epaper/LJ_17_10/28/
I recently read “Excellence R Us”: University research and the fetishisation of excellence. by Samuel Moore, Cameron Neylon, Martin Paul Eve, Daniel Paul O’Donnell & Damian Pattinson. This excellent (!) article, and Germany’s upcoming third round of the ‘Excellence strategy’ incited me to the following remarks on ‘Excellence’
So much has already been written on Excellence. (In)famously, in German Richard Münch’s 500 page tome on the “Academic elite”. In it he characterized the concept of excellence as a social construct for the distribution of research funds and derided the text bubbles socialized under that concept. He castigated each and every holy cow in the scientific landscape of Germany, including the Deutsche Forschungsgemeinschaft (DFG, Germany’s leading funding organization). Upon its publication in 2007 shortly after the German Excellence initiative was launched for the first time, Münch’s book filled with indignation the representatives of what he disparaged as ‘cartels, monopolies and oligarchies’, and a mighty flurry rustled through the feuilletons of the republic.
Today, on the eve of the 3rd round of the Excellence initiative (now: Excellence Strategy), only the old timers remember that, and that is precisely the reason why I am going to tackle the topic once again and fundamentally. And because I believe there is a direct connection between the much-touted crisis of the (life) sciences and the Excellence rhetoric.
What do we mean by ‘Excellence’? Isn’t that an easy question? It means the pinnacle, the elite, the extraordinary, something exceptional etc. A closer look will tell you that the concept is void of content. In the world of science, we find excellent biologists, physicists, experts in German studies, sociologists. That they are excellent or extraordinary, means only that they are far better than others in those fields, but according to what measure? We only learn that they are the few on the extreme of a Gaussian distribution. These few are considered worthy of reward – with a professorship, with more research funds or with entire initiatives. And that is not only a German phenomenon. The English for example have their Research Excellence Framework (REF). Whole universities receive their means relative to their scientific excellence. And you will say: Quite rightly! And I will say: Think again!
The first question to ask is who actually sorts out the researchers, projects and the universities according to excellence and non-excellence. And according to what criteria could this be performed? Jack Stilgoe formulated it in the Guardian : “’Excellence’ is an old-fashioned word appealing to an old-fashioned ideal. ‘Excellence’ tells us nothing about how important the science is, and everything about who makes the selection.” Because this is how it goes: The search for excellence will be successful according to the criteria that were set for the search. In biomedicine, these criteria are set: publications in a handful of select journals. Or in more practical terms, the most abstract of all metrics, the Journal Impact Factor (JIF). What is excellent? Publications in journals with a high impact factor. How do we select excellent researchers and their projects? By counting publications with high JIF. How does the excellence in a project manifest itself: Through publications with a high JIF. For those of you who find this self-referential loop too simplistic: You can add a few more criteria, and the loop will just get bigger. What is excellent? Plenty of external support, preferably from the DFG (or the NIH, MRC, etc). How do you get a lot of third party funding? By publishing in journals with a high JIF, and so on and so forth.
But is not a top publication a good predictor for future pioneering results? Unfortunately not, because we, the scientists who rated the paper in peer review as publishable, find it difficult to judge the significance and future relevance of research. Many studies have demonstrated this. For example: The evaluation of NIH applications (to be exact: the “percentile” score) correlates very poorly with the relevance of the funded projects as extrapolated from citations. [Here just a footnote: With DFG applications you cannot investigate this connection at all, because the DFG does not give access to the relevant information.] What is the most striking is our inability to recognize projects or publications when they are of high relevance. A vast number of papers in our “rejected” histories are awarded years later with the Nobel Prize. “Breakthrough findings” are not advertised in funding programs or hyped up to excellence. Usually, they just happen when “chance favors the prepared mind” as Louis Pasteur formulated it.
Thus it seems that the sensitivity and specificity of reviewing and assessing top research are exceedingly low. Some of the false negatives will perhaps be discovered years later, the false positives will just draw system resources. Moreover, the rhetoric around excellence has other corrosive effects. It promotes narratives that exaggerate the importance and effect sizes of one’s own results. It rewards “shortcuts” in the form of “more flexible” analysis and publication on the way to supposedly spectacular results. This explains the inflation the of significant p-values and effect sizes, the allegedly imminent breakthrough in the therapy of diseases etc. Some researchers even fall prey to the temptation to misconduct to achieve guaranteed and immediate excellent results. When the drive for excellence entices to questionable scientific practice, it obstructs “normal science”. Normal science means for Thomas Samuel Kuhn the daily, unspectacular theorizing, observation and experimenting by researchers which leads to knowledge and consolidation. Normal science is simply given an occasional thorough shaking up with a “paradigm shift” and then set up anew. Normal science does not lead to spectacular findings (“stories”); it is based on competent methods, high rigor and transparency. It is replicable. In a word, everything that might be swept under the table in the quest for excellence. At the same time, normal science is the very substrate for “breakthrough science” – the paradigm shift. However, it cannot be called up at will; it happens serendipitously and cannot be smoked out with a call for application. So even if it sounds paradoxical: If we want top research we must fund normal science! Anyone who funds ‘excellence’ only, gets excellence, with all its effects and side-effects. That includes of course top publications which in and of themselves do not constitute value… apart from boosting researchers, initiatives, universities, and state excellence ratings to the top.
The choice according to excellence criteria also leads to funder-homophilia, the tendency to support scientists doing research similar to one’s own. And it leads to concentration of resources (Matthäus Effect, “He who has, gets”), usually to the disadvantage of non-excellent areas of research such as ordinary science. The rhetoric of excellence is inherently regressive: it makes decisions based on past excellence. That reduces the chance of funding something that is really new, while rigor, creativity and diversity drop through the net.
The rhetoric of excellence does, however, have one essential function that at first glance seems irreplaceable. It delivers science a criterion for the distribution of scarce state research funds, and arguments for increasing research funding that is understandable for the man on the street. Have a look: “With us you are supporting Excellence!” And when the politicians hear those golden words, they’ll jump onstage too. How drab it would be to call for funding for “normal science”!
In Germany the stage is set again for an ‘Excellence Show’, but wouldn’t it be time to change the production or at least the set? We could ever so gently slip in some “sound science” rhetoric to stand up there beside Excellence rhetoric. For this, the English notion of “Soundness” is great, for it means conclusiveness, validity, soundness, dependability. A more pluralistic starting point for research distribution would to fund sound science. It incorporates the many qualities that constitute (good) science. Can we evaluate “soundness”, or is that not as “empty” a “signifier” as ”Excellence”? Team science and cooperation, open science, transparency, adherence to scientific and ethical standards, replicability — all this and more can not only be named, but also to a certain point even quantified. These would then be criteria for a broad funding. No additional finances are required, because less excellence would be funded, with the resulting side effect that the funders would be buying more “tickets” in the lottery, and funding research without predictive criteria for breakthrough science is indeed a lottery. She who has more tickets, wins more often. This explains the paradox why funding less excellence can produce more of it. The top research, the new therapies, the paradigm shifts rise out of a higher number of qualitatively high-value projects in normal science. But only a fool would think that feasible?
A German version of this post has been published as part of my monthly column in the Laborjournal: http://www.laborjournal-archiv.de/epaper/LJ_17_09/22/
It’s noon on Saturday, the sun is shining. I am just evaluating nine applications of a call for applications by a German Science Ministry (each approximately 50 pages). Fortunately, last weekend I was already able to finish evaluations of four applications for an international foundation (each approximately 60 pages). Just to relax I am working on a proposal of my own for the Deutsche Forschungsgemeinschaft (DFG) and on one for the European Union. I’ve lost my overview as to how many article evaluations I have agreed to do but have not yet delivered. But tomorrow is Sunday, I can still get a few done. Does this agenda ring a bell with you? Are you one of the average scientists who according to various independent statistics spend 40% of their work time reviewing papers or bids? No problem, because there are 24 hours in every day, and then there are still the nights too, to do your research.
I don’t want to complain, though, but rather make you a suggestion as to how to get more time for research. Interested? Careful, it is only for those with nerves of steel. I want to break a lance for a scattergun approach and whet your appetite for this idea: We allot research money not per bid but have it given to all as a basic support. With the tiny modification, that a fraction of the funds received must be passed on to other researchers. You think it sounds completely crazy, as in NFG (North Korean Research Community)?
Scarcely noticed by the scientific community in Germany, an astounding development is taking place: The alliance of German scientific organizations with the Conference of University Rectors (DEAL Consortium) is flexing its muscles at the publishing houses. We are witnessing the beginning of the end of the business model current in scientific publishing: an exodus out of institutional library subscriptions to journals and into open access to all to scientific literature (OA), financed by a once-only article publishing charge (APC). The motive for this move is convincing: Knowledge financed by society must be freely accessible to society, and the costs for accessing scientific publications have risen immensely, increasing every year by over 5% and all but devouring the last resources of the universities.
The big publishing houses are merrily pocketing fantastic returns for research that is financed by taxes and produced, curated, formatted, and peer reviewed by us. These returns run at a fulsome 20 – to 40%, which would probably not be legal in any other area of business. At the bottom of this whole thing is a bizarre swap: With our tax money we are buying back our own product – scientific knowledge in manuscript form — after having handed it over up front to the publishers. It gets even wilder: The publishing houses give us back our product on loan only, with limited access, without any rights over the articles. The taxpayer, having paid for it all, cannot access it, meaning that not only Joe Blow the taxpayer is left standing in the cold, but with him practicing medical doctors or clinicians, and scientists outside of the Universities. Continue reading
Have you ever wondered what percentage of your scientific hypotheses are actually correct? I do not mean the rate of statistically significant results you get when you dive into new experiments. I mean rather the rate of the hypotheses that were confirmed by others, or that postulated a drug that was in fact effective in other labs or even patients. Nowadays, unfortunately, only very few studies are independently repeated (more on that later) and even therapies long established are often withdrawn from the market as ineffective or even harmful. You can only hope to approach a rate of “success”, and that is exactly what I will now attempt to do. You are wondering why I am posing this apparently esoteric question: It is because knowing approximately how high the percentage of hypotheses is that actually prove to be correct would have wide-reaching consequences for evaluating research results for your own results, and for those of others. This question has an astonishing but direct relevance in the discussion relating the current crisis in biomedical science. Indeed, a ghost is haunting biomedical research!