Judging science by proxies Part II: Back to the future
Science gobbles up massive amounts of societal resources, not just financial ones. For academic research in particular, which is self-governing and likes to invoke the freedom of research (which in Germany is even enshrined in the constitution), this raises the question of how it allocates the resources made available to it by society. There is no natural limit to how much research can be done – but there is certainly a limit to the resources that society can and will allocate to research. So which research should be funded, which scientists should be supported?
Mechanisms for answering these questions, which are central to the academic enterprise, have evolved evolutionarily over many decades. However, these mechanisms control not only the distribution of funds among researchers and within as well as between institutions, but ultimately also the content and quality of research. The mechanisms by which research funds and tenure are evaluated and allocated and the metrics used in these processes determine scientists’ daily routines and the way they do research more than their reading literature, their views through a microscope, or their presentations at conferences.
In the previous post I discussed how the current tenure and promotion as well as peer review system has evolved, from the beginnings of modern science in the 17th century to a worldwide de facto standard today. How was it possible that quantitative indicators such as Journal Impact Factor (JIF) and amount of external funding today play a more important role in the evaluation of researchers and their grant applications than the content, relevance, or quality of their research? ‘Tell me your JIF and I’ll tell you whether you’ll be promoted or not’.
It turns out that the way we evaluate research and researchers is only a few decades old. Probably only one generation of scientists has been completely socialized into it. The system essentially grew out of two developments: First, the industrialization and massive expansion of academic research. This, in turn, is the result of its own immense success, but is also due to the simultaneous decline in the efficiency of research as a result of this success. Because the ‘fruits of knowledge’ are hanging higher and higher, more and more research and researchers are needed to keep the gain of knowledge constant, if not to increase it. A classical red queen scenario! This flood of researchers, projects, proposals and articles at some point started to overwhelm the evaluation system. To nevertheless cope, we adopted evaluation criteria that are easy and quick to collect. Preferably ones that can be applied without going to the trouble of assessing the actual content, quality and impact of the science.
The second major driving force behind the emergence of today’s assessment system was the understandable desire for distributive justice. We want objective criteria that are reproducible, not subject to arbitrariness. And allow clear discrimination between applicants or applications, preferably even a simple ranking. No one should be promoted because a powerful mentor has interfered behind the scenes. JIF and accumulated third-party funding provide simple, objective, and quantifiable metrics. You don’t need to have read the candidates’ articles, nor their entire CVs. A glance at the bibliography and the list of third-party funds are sufficient. If you are experienced and frequently sit on review panels, you can easily do this in a few minutes per candidate.
So can we improve a system, which somehow appears to work – science is indeed progressing, think CRISPR, SARS-COV2 vaccine, etc. – and is used all over the world?
Let me start with three premises:
- Research can only be judged by those who competently scrutinize its contents, methods, results, and interpretations. This is of course very unpleasant, because such assessments are time-consuming, cannot be automated, and are not quantifiable.
- We must not reduce society’s use of resources for research, but rather use the available resources more efficiently. Indeed, one could come up with the idea of simply funding less research. This would allow us to reduce the output to such an extent that it could again be evaluated in terms of content. However, this would set the clock back more than 100 years and induce a scientific ice age. Not a good idea.
- Substantial changes in the evaluation and allocation system can only be achieved top down, i.e. they must come from the (state) funding institutions, the universities, the non-university scientific organizations. The scientists who seek their way into academia have no choice but to accept the conditions of competing for funding and professional positions. They are, after all, the object of the evaluation mechanisms.
What would be a good starting point? In order to enforce a content and quality-oriented evaluation of research performance, one would first have to quite simply and completely ban the use of abstract indicators (JIF, third-party funding, etc.) – and not just recommend their frugal and merely supportive use! In other words, we would need to ban JIF, h-factor, etc., and instead mandate obligatory use of narratives. Own papers in CVs and applications should only state title, authors, and an identifier such as the PMID (Pubmed Identifier), but not the name of the journal. Thus, the reference might be downloaded and read, but a mere search of literature lists by journal name is no longer possible. These short narratives also afford a restriction to a few relevant literature references. After all, who would want to write more than 10 or more of such narratives?
A focus on first and last author positions is then also no longer necessary and should be dropped altogether. The assignment of first and last authorship in most cases is a harmful fiction anyway: Today, most relevant biomedical papers result from the collaboration of a large number of scientists making diverse contributions. These cannot be reduced to two positions on the list of authors, which are not even clearly defined. Joint coauthorships expose, but don’t solve the attibution problem. A related, and easily solvable problem is the current practice to require a certain number of publications needed for a PhD, Habilitation, etc. This practice leads to the ‘slicing’ of studies into smaller units, to the inflation of publications that nobody needs and reads, to nonsensical and unnecessary discussions about author positions, etc. Instead, a narrative should present the scientific contribution of individual scientists. Whether this is then worthy of a doctorate or Habilitation must be decided by the relevant committees based on the actual research oeuvre of the candidates, but not, as is currently the case, the ranking of publications in a spreadsheet.
We could instead use alphabetical author lists, as has been successfully practiced for a long time in multi-author collaborations in high energy physics. There is already an excellent taxonomy for this ‘film roll credit’ practice (https://casrai.org/credit/ ), which is also suitable for the life sciences. The reputation and renown of scientists is based on their contributions and their standing in the community. Reviews and evaluations by peers should also be taken into account. Such reviews are increasingly found post-publication either in the journal which has published the article, or on social media. In many fields, ‘Science Twitter’ is already much more transparent, comprehensible, and up-to-date than traditional formats (Letter to the Editor, etc.) of discourse. Quality control of scientific publications takes place more efficiently in social media than in traditional peer review anyway. Most high profile retraction of papers were triggered by comments on Twitter or in blogs. Classical peer review had overlooked the problems in those articles.
The above measures would already lead to a massive reduction in the flood of articles, making it easier to engage with their content. Content and quality would determine scientific reputation and renown, not proxies such as JIF and third-party funding. But something essential is still missing for this to work: career paths in academia must change. Eighty-three percent of academic staff in Germany are in temporary employment! The immense pressure of competition not to be kicked out of the system, or to advance in the academic hirarchy, leads to the selection of characteristics that are neither conducive to quality nor to cooperation in science. The pyramid must be re-formed into a trapezoid! The top must become flatter, and the base somewhat narrower. However, this also means that fewer PhD students (as ‘cheap’ labor) will enter the system than before. Those who take the tough path into the academic world must have a real chance to make a living in the long run through good science (and not just through 3 ‘top publications’).
I have an additional suggestion: After introducing a purely content- and quality-based evaluation system and capping the academic career pyramid, one factor to consider is chance! Since true innovation is unpredictable, and any review process tends to favor the mainstream, some funding should be awarded in lotteries! I have written about this in an earlier post in more detail. (see here). Such an award scheme would also give every one in the system more time to do research, because some of the proposal writing, and then peer reviewing of those proposals, would be eliminated. In such a scheme a lot of projects would be funded that are rather mediocre and will not deliver the promised breakthrough. But this is what happens already today. The probability, however, that something groundbreaking new would be promoted, increases considerably.
Will we ever see substantial changes of this system in Germany? The Deutsche Forschungsgemeinschaft (DFG), the main funding body for scientific research in Germany has sofar not been known for its enthusiasm for experimentation. However, the DFG is currently working on a quite revolutionary position paper which contains a lucid analysis of the weaknesses of current research assessment and funding, including its own. The paper also proposes concrete solutions, close to the ones I was sketching above (minus the lottery, though…). Germany’s most influential research funder is about to finally join the ranks of other funding bodies and institutions worldwide (e.g. Wellcome Trust, ZonMW) who are serious about the fact the system needs a serious makeover! Although not by the DFG, even the funding lottery is being tried out in some places, for example at Germany’s Volkswagen Foundation. So maybe even without a time machine, the future has already begun (a little bit)?
A German version of this post has been published earlier as part of my monthly column in the Laborjournal: Wie konnte es eigentlich soweit kommen?