January 3, 2021

Jugdging science by proxies: A short and incomplete history

In this post I’ll be looking at the question of why scientific careers today depend so much on the Journal Impact Factor (JIF). And the acquisition of as much third-party funding as possible. Or, more generally, why the content, originality and reliability of research results are often a secondary matter when commissions talk their heads off about who to include in their own ranks. And who not. Or which grant applications deserve to be funded. In short, follow me on a brief and incomplete history of how and why we ended up judging the quality of science through proxies such as JIF and amount of third party funding. Perhaps a historical perspective will also yield clues as to how we can overcome this mess. But I am getting ahead of myself. Let’s start where it all began, with the founding fathers of modern science.

The early pioneers of modern scientific work like Galileo, Hooke, Boyle, Newton were ‘gentleman scientists’. Not only were they all men, they were all financially independent. Either by birth, or by patronage. Driven by curiosity – ‘how the world works’ – their inner driving force was of course not only to advance knowledge, but also to gain personal fame and honor. The benefit of new knowledge about nature acquired through experimentation was not seen in creating the foundations for a more rational appropriation of nature by man. Far from it, these deeply religious gentlemen struggled to decipher the book of nature written by God and thus the order of the world – and to thereby promote better faith and more God-fearing behavior. Science was worship. At that time, not scientists but inventors and engineers were courted by princes and kings, because only they promised help in subjugating the world through conquest and war.

The way Newton and his colleagues dealt with their competitors was often anything but gentlemanly. After all, it was a matter of primacy and posterity. The starting point of their ideas and hypotheses was what Lorraine Daston called ‘ground zero empiricism’. In other words, they were writing on an almost blank page. The community of researchers was very manageable, perhaps a few hundred, at most a few thousand like-minded individuals worldwide. Loosely organized in academies, they presented and critizised each other’s theories and experiments. Besides books, they mostly published in the annals of their national scientific academies. The Royal Society of England led the way in speed and reach: copies were printed twice a year, e.g. 800 in 1829, and sent to corresponding academies and selected scientists. Often, no more than half a year passed between presentation or submission of the paper and publication. At that time, of course, the competition was not for academic positions or research funding, but for reputation and membership in these academies as well as access to their international correspondents. In addition to the originality and quality of the science, hierarchies, relationships, and power games were certainly important already in those days.

However, with the increasing understanding of what holds the world together at its core, sovereigns and society also began to take a greater interest in the practical usefulness of scientific knowledge. As bourgeois societies became established and industrialization blossomed in the 18th and 19th centuries, states began to systematically organize science, promoting it especially through universities. Maxwell, Pasteur, Virchow, etc. were university scientists who did research on a state-supported budget. For them, too, their quest was not about personal wealth, but still primarily about advancement of knowledge, and the recognition and fame to be gained through it.

At the same time, the sciences became more and more specialized, specialist journals appeared and became the most important media of scientific discourse alongside lectures. Most scientists in a field still knew each other personally. Scientific controversies were not fought out anonymously, but face to face. What was new, however, was the academic competition for employment as an assistant, or appointment and tenure as a professor. To be successful, the most relevant factors were reputation among colleagues, but also academic hierarchies and affiliation with ‘scientific schools’. In any case, quantitative bibliometric indicators or third-party funding did not play a role, because they did not yet exist. In some places, already good scientific practice was not taken too seriously if it only served personal advancement. In 1830, Charles Babbage, the inventor of the calculating machine, described in his ‘Reflections on the Decline of Science in England, and on Some of Its Causes’ the main types of bad scientific practice and misconduct still practiced today. He distinguished ‘hoaxing’, ‘forging’, ‘trimming’ (data dredging and selective data analysis) and ‘cooking’ (torturing data with statistics).

In the early 20th century, third party funding was born. Immediately after the lost First World War, the German universities, academies and the Kaiser-Wilhelm Gesellschaft (todays Max Planck Gesellschaft) had an idea how they could improve their financial situation, which had become precarious due to war and crisis. They founded the ‘Notgemeinschaft der deutschen Wissenschaft’ (‘Emergency association of German Science’ – whose legal successor is todays Deutsche Forschungsgemeinschaft – German Research Assocationa or DFG) and were thus able to fund individual scientists on an application basis. But this was done quite differently than today. Otto Warburg’s application to the Notgemeinschaft has been preserved. It consisted of a single sentence: ‘Need 10,000 Reichsmark’ and below it ‘signed Otto Warburg’. It was presumably approved, but not after review – as there was no written proposal! The name Warburg guaranteed the originality and potential of the application. A few years later, party membership became an additional important evaluation criterion. During the era of a ‘German physics’, political orientation and party affiliation were relevant for employment or appointment at the university. The JIF and amount of third-party funding, however, were still a long way off!

Only by the 2nd World War this system changed fundamentally, and worldwide. During the war, there was an unprecedented industrialization of research, most dramatically in the USA. Research programs, which provided the basis for the development of long-range missiles, RADAR, atomic bombs, computers, etc., were endowed with gigantic sums and executed on a massive scale. At the end of WW2, most of academic science was in the service of the military. Usefulness of research, in this case to secure military superiority, had top priority. So much so that at that time states had to worry seriously about the survival of ‘Blue Skies’ basic research. Vannevar Bush’s report ‘Science – the last frontier’ is still widely read and quoted. Commissioned by the American president in 1945, it is considered a manifesto of the government’s mandate to promote research even for its own sake. After all, basic research provides the knowledge for later, as yet unanticipated applications. Bush advised a ‘hands off policy’, and told the government to generate the next generation of scientists through university programs and attractive postitions.

These developments catalyzed a steep rise in research output through ever-increasing specialization of the various disciplines and government spending on academic research. Nevertheless, it was still all quite manageable for researchers in their specialties and even beyond. Editors decided on the publication of manuscripts on their desks. Peer review as we know it had not yet been born. There were only a few journals per research subject, published in the respective national languages. The exchange of information was still mainly on a national level, where it was also decided who was ‘excellent’ and who was not.

At some point, around the 1980s, the exponential proliferation of knowledge, its specialization, and the sheer number of ‘knowledge producers’ reached a critical threshold. It became increasingly difficult to judge the quality and originality of researchers and to make funding and career decisions based on knowledge of their content. Add to this, beginning in the late 1960ties, a rebellion in particular amongst academics against hierarchies in society. The desire for objective quantification of performance, also in research, was born. Meanwhile, as a result of these developments, a hierarchy of journals had also been established, which became quantifiable through Eugene Garfield’s ingenious invention of the impact factor in 1955, which was consequently also massively commercialized by him (and the publishers).

The rest is history. According to UNESCO, there are now more than 400,000 full-time scientists in Germany alone, and many millions around the world. Welcome to the club! These scientists now publish millions of articles every year. Within a century, the average number of authors per paper has increased from 1 to 6. But in those hundred years, the productivity of science, defined as the ratio of output of knowledge to input into science, has also plummeted. Science is nevertheless progressing, because the number of scientists (input!) has increased in parallel by about the same factor, probably even disproportionately. After all, we already know quite a lot, good ideas have become scarcer, the low-hanging fruits have been picked, everything is becoming more and more complex – with respect to content as well as to methods. In a classical red queen scenario, to keep moving forward, we need more and more scientists, and more and more complicated and expensive apparatus to wrest the secrets from nature.

The expanding universe of the academic industry also offered an excellent substrate for the perfection of objective, simple, and transparent criteria for the evaluation of researchers and research: JIF, Hirsch factor, third-party funding. What is the point of reading the articles of applicants if you know that their impact factor is, on average 20.162? Or ‘only’ 6.531? Note the remarkable accuracy of this indicator: In most CVs and applications it is given with 2, sometimes with 3 decimal places!

Apart from the fact that this objective quantification of the quality of individual science is based on false premises: The JIF measures, if anything, the popularity of the journal and/or research subject in question. Moreover: 80% of the citations in Nature (and comparable) are generated by 20% of the articles (including reviews). The vast majority of articles in these journals, also known as ‘glam journals’, draw no more citations than those published in a good scholarly journal. Or, indeed, none at all. Even more corrosive than this ineptness of the metrics, however, was that two long-known mechanisms could now take effect. One was Goodhart’s Law, formulated in 1975, which predicts ‘that a measure that becomes a target ceases to be a good measure.’ And that’s exactly what happened. The mining of impact factor points began to corrupt epistemic interest. More and more papers must generate more and more impact points. Research results that promise such points are prioritized. With all consequences, from clever selection and overinterpretation of results to fraud. Babbage sends his regards. In addition, there is the Matthew effect ‘To him who has, to him will be given’ (Matthew 25:29), applied to science for the first time by Robert Merton in 1968: Third-party funds generate third-party funds – especially if the science does not venture too far off the mainstream. Science papers generate Nature papers, and vice versa. Of course, there is a lot of competition, as not everyone can benefit, because the ‘currency’ for which the impact points determine the exchange rate is controlled by the publishers via rejection rates. That’s their business model. The more than 10,000 Max Planck scientists, the German research elite, do not manage to publish more than 400 articles per year in Nature and Nature-brand journals!

The special attractiveness, but also toxicity of these indicators lies in their apparent plausibility, transparency, simplicity and practicability. And the fact that the obvious alternative, the examination of scientific content and its quality and originality, seems to be without alternative in view of the paper and scientist tsunami described above. Hence the world wide triumph of the JIF, h-index, and the likes. At least one generation of scientists and administrators has already been socialized with them – they often cannot even imagine other mechanisms to judge the quality of research or scientists. Citation rates and journal reputation, or the accumulation of external funding, to them seem logical, or even natural criteria.

So the question arises whether knowledge production in the 21st century, with its armada of scientists and the sheer mass of their outputs, needs other criteria of ‘performance evaluation’? And if so, how they would look like, and whether they would be practicable? Stay tuned, as in the next installment I will try to give a (quite personal) answer.

A German version of this post has been published earlier as part of my monthly column in the Laborjournal: https://www.laborjournal.de/rubric/narr/narr/n_20_12.php

Some references:

Steven Shapin. The Scientific Revolution. University of Chicago Press, 2^nd Ed. 2018

Lorraine Daston. Ground Zero Empiricism. Critical Inquiry April 10, 2020 https://critinq.wordpress.com/2020/04/10/ground-zero-empiricism/

Vannevar Bush. Science The Endless Frontier. A Report to the President by Vannevar Bush, Director of the Office of Scientific Research and Development, July 1945. https://www.nsf.gov/od/lpa/nsf50/vbush1945.htm

Nicholas Bloom et al. ARE IDEAS GETTING HARDER TO FIND? NBER Working Paper 23782 NATIONAL BUREAU OF ECONOMIC RESEARCH. http://www.nber.org/papers/w23782

Stephen J. Bensman. Garfield and the impact factor. Annual Review of Information Science and Technology 2008 https://doi.org/10.1002/aris.2007.1440410110

Ioannidis JP, Boyack KW, Klavans R. Estimates of the continuously publishing core in the scientific workforce. PLoS One. 2014 Jul 9;9(7):e101698. https://doi.org/10.1371/journal.pone.0101698

Robert Aboukhalil, The rising trend in authorship, The Winnower 7:e141832.26907 (2014). https://doi.org/10.15200/winn.141832.26907

Richard Van Noorden. Global scientific output doubles every nine years. May 2014 http://blogs.nature.com/news/2014/05/global-scientific-output-doubles-every-nine-years.html

UNESCO Science Report 2015 https://en.unesco.org/unescosciencereport

Charles Babbage. Reflections on the Decline of Science in England, and on Some of Its Causes. London 1830 https://archive.org/details/reflectionsonde00mollgoog

Robert Merton. The Matthew Effect in Science. Science 159, 56-63 (1968) https://science.sciencemag.org/content/159/3810/56

Eugene Garfield. Citation indexes to science: a new dimension in documentation through association of ideas. Science. 1955;122:108-111 https://science.sciencemag.org/content/122/3159/108

4 comments

January 11, 2021 - 19:32 Othmar Ennemoser

Hallo,
Das Dilemma der Wissenschaft ist ihre gewordene Zweckmäßigkeit! Nicht mehr die wahrscheinlich zu erkennende Wahrheit ist das primäre Ziel, sondern die technisch machbare Wiederholbarkeit, die im akademisch gewordenen “Beweis” durch alleinige Funktionalität schon als Wahrheit zu gelten vermag, auch wenn so dann noch unzählig wichtig seiende Fragen unbeantwortet bleiben, ja aufgrund der reduktionistisch anmutenden Erkenntnisebene, geradezu müssen.
Der Geist als produktives Vehikel der Logik im Sinne einer innovativ seienden Idee bleibt auf der Strecke und seine sorglose Beerdigung scheint nicht mehr fern.
Fakt ist aber allemal, dass doch die wichtigen Wissenschaften, wie sie die zutiefst vorhanden seienden Gemütsverhältnisse der Menschheit betreffen in dieser abgespeckten Wahrnehmung der Welt ums Überleben ringen, so doch diese auf minimalistische Fakten ausgerichtete Wissenschaft der Gegenwart, keine umfassend wohl klingende und vielsagende Wahrheit mehr hervor bringen kann.
Insofern ist es also dringend vonnöten und unbedingt angesagt, dass die so blockierte und auch selbstgefällig agierende Fabrik des Wissens von außen modernisiert wird, auf dass ihr vornehmlich sein sollendes Produkt, nämlich die wissenschaftlich eruierte Wahrheit herzustellen, endlich wieder absoluten Vorrang haben kann. Und dies kann eben nicht durch die durch ökonomisch erst massiv erzwungenen Fallstricke der Zweckbetreiber in den Hochschulen passieren, sondern muss mit frank und freiem Geiste durch eine zweckfreie Instanz erledigt werden!

February 9, 2021 - 22:38 anonym

Ich bin fleißiger Leser Ihrer Kolumne im LJ und ich muss sagen, jedes Mal sprechen Sie mir einfach aus dem Herzen. Sie treffen den Nagel immer direkt auf den Kopf und das gefällt mir so an Ihren Texten. Aber genug der Komplimente.
Die Stelle, die mich in der aktuellen Ausgabe am meisten beeindruckt, ist die, dass ich als PhD Student nicht ausgenutzt und verbraucht werden möchte. Da ist der harmlose Witz vom Professor “für dieses Projekt kann man auf jeden Fall 2 Doktoranden opfern. (Wenn nichts dabei herauskommt, auch nicht schlimm)” gar nicht mehr so harmlos, sondern pure Realität. Daher sehe ich das als derzeitiger Master Student als größte Hürde und auch als Grund, die Promotion auszulassen oder allenfalls eine Industrie Promotion anzustreben. Mal ehrlich, wer möchte schon mindestens 3 Jahre seines Lebens für etwas aufwenden, woran nicht einmal der Prof wirklich glaubt.
Ich gebe Ihnen vollkommen recht. Mit dem jetzigen System kann dass so nicht weitergehen und ich hoffe wirklich, dass jegliche Reformationen wirksam sein werden und möglichst bald umgesetzt werden.

March 13, 2021 - 13:50 Pingback: Judging science by proxies Part II: Back to the future | To infinity, and beyond!
May 3, 2021 - 20:00 Pingback: Boost your score: Digital narcissism in the competition of scientists | To infinity, and beyond!