March 12, 2023

Fraud in science is rare. But is this actually true?

After a break of 2 years, here again an English translation of one of the musings of the ‚Wissenschaftsnarr‘ (i.e. ‚science jester‘) – my alter ego that writes a monthly column for the German periodical Laborjournal. My apologies, to save time I used DeepL, but the AI has some difficulty with the writing style of the science jester…

Wissenschaftsnarr # 53 (German version available here)

Almost weekly we read about cases of scientific misconduct. Often, renowned journals and prominent scientists play a role in them. The website Retractionwatch by Ivan Oranski and Adam Marcus provides us with such news and their backgrounds in an incessant stream. The Laborjournal, too, has a story in almost every issue about a lab where things were not going right. Most of the time, this came to light after an article with manipulated, falsified or even invented data was exposed. Whistleblowers or attentive readers who anonymously publish their doubts about the dignity of figures on PubPeer often bring this to the attention of the scientific public. Universities, funding agencies, or journals, on the other hand, conspicuously seldom uncover such malignant machinations.

The science pages of daily newspapers also provide us quite frequently with news from the moral lowlands of science. Currently with reports about questionable papers from the stables of Nobel laureate Gregg Semenza, the ‘discoverer’ of hypoxia-induced factor (HIF), or neuroscientist and current president of Stanford University Marc Tessier-Lavigne. This is not just about basic research papers: COVID has given us a veritable tsunami of article retractions, most notably the Lancet and New England Journal papers which were based on completely fabricated data.

Reports about scientific misconduct are often greeted with pleasant creeps, but we don’t get too excited about them. We file it under the heading ‘Every industry has its black sheep’, because that is human, all too human. Or we dismiss it – for example, when it comes to mass retractions of articles from so-called ‘papermills’ (i.e. completely invented articles ordered over the Internet and written for a fee) as exotic phenomena of the scientific business in regions of the world far away from us. In our country, scientific fraud, i.e. plagiarism and falsification or fabrication of data, does not play an important role. Plagiarism, perhaps, but rather in non-scientific subjects – such copying is a nuisance, but a rather minor offense. In its press releases, the Deutsche Forschungsgemeinschaft (DFG) lists a total of 6 cases for the year 2022 in which scientists were reprimanded for scientific misconduct and excluded from funding for a few years. Doesn’t that show how rare fraud is in our scientific enterprise?

Even the ‚Wissenschaftsnarr‘ has indulged in this convenient illusion until recently. And believed that it is almost exclusively so-called ‘questionable scientific practices’ that deserve our attention. That is, the omission of findings that make a story seem less than smooth. Or running multiple tests until you come across one that yields the statistical significance you crave, also known and popular as p-hacking. Or formulating hypotheses after you already know the results – but pretending the experiments were conducted to test those very hypotheses (‘HARKING‘ – hypothesizing after the results are known’). But editing bands on western blots with Photoshop? Or manipulating results in spreadsheets? Using control results that were not even part of the current experiment? Not in our lab, and not in the neighboring lab!

But why are we so sure? There is a lot to be said for the fact that scientific misconduct is far more frequent than we admit. A recent, elaborate and methodologically excellent paper found that 8% of about 7000 Dutch scientists who responded to an anonymous survey on research practices had falsified and/or fabricated data at least once between 2017 and 2020! In medicine and life sciences, it was even more than 10%. More than half also admitted to frequently (!) using questionable science practices. Do Dutch scientists have more criminal energy than German scientists? Probably not, if only because in 2012 the Netherlands experienced a scientific fraud scandal that shook the entire nation to its core. This had far-reaching consequences for the Dutch scientific system. For example, a national plan involving universities and funding agencies aimed at promoting open science (‘Open Science’). And a reform of the academic career and reward system (‘Every talent counts’). We now envy our neighbors for both.

As shocking as the results of the Dutch survey are, they fit the picture. Meanwhile, scientific images can be automatically analyzed for manipulation. The application of such techniques shows that more than 4% of all biomedical publications contain graphs and figures that are highly suggestive of malignant manipulations. For example, shifting bands, duplications, implausible error bars, etc. These figures are also confirmed by papers in which humans examined the mappings. At the same time, a race has begun between software that can produce barely detectable ‘deep fakes’ of scientific graphs, and software that is able to detect just that. The increasing number of retractions, as well as the increased reports of proven scientific fraud, can only show us the tip of the iceberg of actual misconduct, probably with a bias towards the more blatant violations. To conclude from this the size of the problem, i.e. the total mass of the iceberg, is not possible. But one thing is clear – it must be much larger than what is visibly sticking out of the water: after all, it is sanctioned, if not justiciable, behavior. That is probably why surveys, such as the Dutch one mentioned above, also show too low prevalences of violations.

Startled and unsettled by the Dutch figures, the Wissenschaftsnarr looked around in the literature and found surprisingly large amounts of evidence – e.g., survey results, random samples, systematic reviews, which in their totality allow only one conclusion: Scientific misconduct beyond raking and p-hacking, i.e. plagiarism, falsification and fabrication of data, is much more common than we admit.

Incidentally, illuminating clues as to why this is so can be found in the autobiography of Diederik Stapel, the Dutch science fraudster mentioned above. He describes how easy it was for him to make the ‘stories’ of his papers more interesting by undisclosed selection of data and analysis procedures, and thus was able to publish in prestigious journals. Thus he began to make a name for himself in psychology, tenure was within reach. The transition to manipulating his data was then smooth. No one at the university, and no reviewer, asked or wanted to see data. It all went so easily and smoothly that he gradually moved to completely fabricating study results. His students conducted the surveys, he then prettied up the data in a big way. This made the results so spectacular that Science and Nature printed them. For example, he ‘invented’ data whose analysis showed that in a littered environment, survey participants tend to give more right-wing answers than in a clean one. Such results made it into the New York Times, and he quickly became a shooting star in psychology! At one point in his autobiography, he describes feeling like a child left alone in a candy store. The only thing he was told was not to steal any candy. What finally broke his neck were his own students. At first, they were happy to be co-authors on great papers, but after a while they found it strange that they were never allowed to evaluate the data themselves, that they only ever saw data that Stapel had already processed.

The case of Diederik Stapel is certainly extreme, and the way he casually blames his misconduct on the system, which made it too easy for him, is, of course complacent. But nevertheless, one can study the essential elements of modern scientific fraud from his career: The academic reward system based on a journal-reputation economy, journals that favor spectacular studies over solid ones, reviewers overburdened with quality control, appointment committees and university boards fooled by stories and self-promoters, questionable scientific practices considered normal and not sanctioned as a gateway drug, poor discussion and leadership culture in the research group, and methodological incompetence among all involved. Several elements of this toxic mix are found at any given time in most research institutions. But when they all come together, it’s only a matter of time before individual scientists succumb to the temptation to give their scientific careers a bit of a boost and take shortcuts. Only if they push it too hard, like Mr. Stapel, do they have to expect to be exposed. And even then, the consequences, if there are any sanctions at all, are quite manageable.

So is the solution to the problem to impose tougher sanctions on scientific fraud? This would certainly not do any harm, because cases in which penalties were imposed can be counted on one hand. So not only is scientific fraud rarely detected, it is even more rarely punished. Do we need to teach and train more Good Scientific Practice? That’s a good idea, too, but it probably won’t change much. After all, there are no courses in which it is explained to pupils and students that bank robbery and document forgery violate social norms, are forbidden, and are consequently punished. Science fraudsters know what they are doing, they do not do it out of ignorance of rules unknown to them. Do we perhaps need a science police force to carry out unannounced inspections of Western blots and search hard drives in laboratories? Certainly not, modern science is far too complex to be controlled by such visits. Not to mention that the resulting Big Brother atmosphere would be anything but conducive to good research.

A much more obvious approach to remedy the situation is to address the core of the problem, and reform the toxic career and evaluation system – that is, to evaluate researchers not on the basis of questionable metrics, but with a focus on research quality, content, and actual scientific or societal impact. This is indeed the silver bullet, and fortunately, some of it is currently taking place. The Coalition for Reforming Reserach Assessment (CoARA) initiated by the European Union, which, by the way, the DFG has already joined, will play an important role here. However, all this is happening at a snail’s pace, so a faster fix would be desirable.

Here is a suggestion. Scientific fraud is only possible where individuals have monopolized the evaluation and analysis of research data, often combined with a lack of methodological competence in the immediate environment. Only if Western blots are made and analyzed by only one person, and nobody with the necessary competence is looking at them, they can be manipulated by Photoshop. The analogy applies to data series and the analysis methods applied to them: So if only one person manages the databases or spreadsheets and runs self-written codes over them, the results are not controlled by others. Which would be necessary already because of the honest mistakes we all unfortunately often make. If the group leader then sits alone in front of the computer and turns the results into a story, it can happen that an overzealous group member lays a ‘ golden egg’ for him, or vice versa: he himself becomes ‘creative’ and turns the results into a story.

Thus, a functioning structure and work culture is needed in work groups, and then scientific misconduct is practically impossible. Problems arise when working groups become too large, expertise is too fragmented, or even missing. Unfortunately, these conditions are not uncommon, especially in biomedical research. How can this be remedied? In any case, by addressing and focusing on a good work culture and group leadership wherever possible. Of course in training, where this is usually neglected. Many universities offer ‘leadership training’ as part of personnel development. But there should be a stronger focus on the importance of an open and collaborative way of working as a bulwark against scientific misconduct.

But we should also pay more attention to this issue in appointments and tenure. The relevant committees could interview candidates about the size, structure, and interactions in their research groups. They could even conduct interviews with former (or even still active) members of the group. In some places this is already practiced, for example at EU-LIFE, an alliance of renowned European research institutes. In clinical appointments, by the way, it is common practice for appointment committees to visit the applicants’ department and gain an on-site insight into their ‘way of working’. One suggestion for universities to control working group dynamics would be, for example, to cap performance-based funding above a certain group size. Whether these suggestions are ultimately unrealistic or ineffective, and whether they could reduce scientific misconduct, must remain open. Just a broad discourse about the work culture in scientific working groups and how we can improve it could take us further.

References / Further reading

Bettencourt Dias, Monica. (2021). Moving forward in research assessment. EU-LIFE Policy Webinar: https://www.youtube.com/watch?v=QfVO0I_jxCY&t=2276s

Bik, E. M., Casadevall, A., & Fang, F. C. (2016). The prevalence of inappropriate image duplication in biomedical research publications. MBio, 7(3). https://doi.org/10.1128/MBIO.00809-16/SUPPL_FILE/MBO003162846SD1.DOCX

Bik, E. M., Fang, F. C., Kullas, A. L., Davis, R. J., & Casadevall, A. (2018). Analysis and Correction of Inappropriate Image Duplication: the Molecular and Cellular Biology Experience . Molecular and Cellular Biology, 38(20) https://doi.org/10.1128/MCB.00309-18

Bonn, N. A., & Pinxten, W. (2021). Advancing science or advancing careers? Researchers’ opinions on success indicators. PLoS ONE, 16(2 February). https://doi.org/10.1371/journal.pone.0243664

Bouter, L. M., Tijdink, J., Axelsen, N., Martinson, B. C., & Ter Riet, G. (2016). Ranking major and minor research misbehaviors: results from a survey among participants of four World Conferences on Research Integrity. Research Integrity and Peer Review, 1(1), 17. https://doi.org/10.1186/s41073-016-0024-5

Bucci, E. M. (2018). Automatic detection of image manipulations in the biomedical literature. Cell Death & Disease 2018 9:3, 9(3), 1–9. https://doi.org/10.1038/s41419-018-0430-3

CoARA. Coalition for Advancing Research Assessment. https://coara.eu/

Fanelli, D. (2009). How many scientists fabricate and falsify research? A systematic review and meta-analysis of survey data. PLoS One, 4(5) e5738. https://doi.org/10.1371/journal.pone.0005738

Gao CA, Howard FM, Markov NS, Dyer EC, Ramesh S, Luo Y, et al. Comparing scientific abstracts generated by ChatGPT to original abstracts using an artificial intelligence output detector, plagiarism detector, and blinded human reviewers. bioRxiv [Internet]. 2022 Dec 27 [cited 2023 Jan 18];2022.12.23.521610. Available from: https://www.biorxiv.org/content/10.1101/2022.12.23.521610v1

Gopalakrishna, G., ter Riet, G., Vink, G., Stoop, I., Wicherts, J. M., & Bouter, L. M. (2022). Prevalence of questionable research practices, research misconduct and their potential explanatory factors: A survey among academic researchers in The Netherlands. PLOS ONE, 17(2), e0263023. https://doi.org/10.1371/JOURNAL.PONE.0263023

Levelt Committee, Noort Committee, & Drenth Committee. (2012). Flawed science: The fraudulent research practices of social psychologist Diederik Stapel (pp. 1–104). Netherlands: Commissie Levelt. Retrieved from https://www.tilburguniversity.edu/upload/3ff904d7-547b-40ae-85fe-bea38e05a34a_Final report Flawed Science.pdf

Mehra, M. R., Ruschitzka, F., & Patel, A. N. (2020). Retraction—“Hydroxychloroquine or chloroquine with or without a macrolide for treatment of COVID-19: a multinational registry analysis” (The Lancet, (S0140673620311806), (10.1016/S0140-6736(20)31180-6)). The Lancet, 395(10240), 1820. https://doi.org/10.1016/S0140-6736(20)31324-6

Mejlgaard, N., Bouter, L. M., Gaskell, G., Kavouras, P., Allum, N., Bendtsen, A. K., Charitidis, C. A., Claesen, N., Dierickx, K., Domaradzka, A., Reyes Elizondo, A., Foeger, N., Hiney, M., Kaltenbrunner, W., Labib, K., Marušić, A., Sørensen, M. P., Ravn, T., Ščepanović, R., … Veltri, G. A. (2020). Research integrity: nine ways to move from talk to walk. Nature, 586(7829), 358–360. https://doi.org/10.1038/d41586-020-02847-8

Nobel Prize winner Gregg Semenza retracts four papers – Retraction Watch. Retrieved December 27, 2022, from https://retractionwatch.com/2022/09/03/nobel-prize-winner-gregg-semenza-retracts-four-papers/

Piller, C. (2022). Blots on a field? Science, 377(6604), 358–363. https://doi.org/10.1126/SCIENCE.ADD9993

Shen, H. (2020). Meet this super-spotter of duplicated images in science papers. Nature, 581(7807), 132–136. https://doi.org/10.1038/D41586-020-01363-Z

Singh Chawla, D. (2021). 8% of researchers in Dutch survey have falsified or fabricated data. Nature. https://doi.org/10.1038/D41586-021-02035-2

Stanford misconduct probe of president stumbles as new journal launches inquiry | Science | AAAS. (n.d.). Retrieved December 27, 2022, from https://www.science.org/content/article/stanford-misconduct-probe-president-stumbles-new-journal-launches-inquiry

Stanford President’s Research Draws Concern From Scientific Journals – WSJ. (n.d.). Retrieved December 27, 2022, from https://www.wsj.com/articles/stanford-presidents-research-draws-concern-from-scientific-journals-11671142757

Stapel, D. (2014). Faking Science: A True Story of Academic Fraud. Brown, J. L. (Translator). https://errorstatistics.files.wordpress.com/2014/12/fakingscience-20141214.pdf

The Economist (2023) There is a worrying amount of fraud in medical research. Retrieved February 26, 2023, from https://www.economist.com/science-and-technology/2023/02/22/there-is-a-worrying-amount-of-fraud-in-medical-research

Xie, Y., Wang, K., & Kong, Y. (2021). Prevalence of Research Misconduct and Questionable Research Practices: A Systematic Review and Meta-Analysis. Science and Engineering Ethics, 27(4), 41. https://doi.org/10.1007/s11948-021-00314-9

One comment

April 2, 2023 - 21:42 Brooke Morriswood

Dear Uli,
An excellent and clear-sighted diagnosis. My own riff on a very similar theme in the link below… 😉

That said, I think that a lot of the unreproducible material in the scientific literature is generated as a result of inadequate training rather than a deliberate desire to mislead (these “true positive” cases are much easier to censure). And the shortcomings in the training of young scientists are probably exacerbated when pressure to publish sensational results is so high, coupled to a lack of effective oversight within groups.

http://totalinternalreflectionblog.com/2022/12/18/short-sight-oversight-the-ballad-of-marc-tessier-lavigne/

To infinity, and beyond!

Fraud in science is rare. But is this actually true?

One comment

Leave a comment Cancel reply

Share this:

Related

One comment

Leave a comment Cancel reply