Trust but verify: Institutions must do their part for reproducibility
The crisis in scientific reproducibility has crystalized as it has become increasingly clear that the faithfulness of the majority of high-profile scientific reports is with little foundation, and that the societal burden of low reproducibility is enormous. In todays issue of Nature, C. Glenn Begley, Alastair Buchan, and myself suggest measures by which academic institutions can improve the quality and value of their research. To read the article, click here.
Our main point is that research institutions that receive public funding should be required to demonstrate standards and behaviors that comply with “Good Institutional Practice”. Here is a selection of potential measures, implementation of which shuld be verified, certified and approved by major funding agencies.
Compliance with agreed guidelines: Ensure compliance with established guidelines such as ARRIVE, MIAME, data access (as required by National Science Foundation and National Institutes of Health, USA).
Full access to the institution’s research results: Foster open access and open data; preregistration of preclinical study designs.
Electronic laboratory notebooks: Provide electronic record keeping compliant with FDA Code of Federal Regulations Title 21 (CFR Title 21 part 11). Electronic laboratory notebooks allow data and project sharing, supervision, time stamping, version control, and directly link records and original data.
Institutional Standard for Experimental Research Conduct (ISERC): Establish ISERC (e.g. blinding, inclusion of controls, replicates and repeats etc); ensure dissemination, training and compliance with IMSERC.
Quality management: Organize regular and random audits of laboratories and departments with reviews of record keeping and measures to prevent bias (such as randomization and blinding).
Critical incidence reporting: Implement a system to allow the anonymous reporting of critical incidences during research. Organize regular critical incidence conferences in which such ‘never events’ are discussed to prevent them in the future and create a culture of research rigor and accountability.
Incentives and disincentives: Develop and implement novel indices to appraise and reward research of high quality. Honor robustness and mentoring as well as originality of research. Define appropriate penalties for substandard research conduct or noncompliance with guidelines. These might include decreased laboratory space, lack of access to trainees, reduced access to core facilities.
Training: Establish mandatory programs to train academic clinicians and basic researchers at all professional levels in experimental design, data analysis and interpretation, as well as reporting standards.
Research quality mainstreaming: Bundle established performance measures plus novel institution-unique measures to allow a flexible, institution-focused algorithm that can serve as the basis for competitive funding applications.
Research review meetings: create forum for routine assessment of institutional publications with focus on robust methods: the process rather than result.
Due to the page restrictions of a Nature commentary, a few of our thoughts did not make it in the published manuscript, in particular those on ‘trust’. In fact, our working title was: ‘Trust but Verify: Prevent Bias and Increase Robustness with Good Institutional Practice’.
What has been termed the ‘Reproducibility crisis’ has led to an appropriate flurry of new Guidelines for Authors among journals and funding agencies. Many journals are updating their guidelines and introducing checklists in an attempt to improve data robustness and quality. They are to be congratulated for taking the lead on addressing this crucial issue. But a checklist that is completed at the end of an experiment and immediately prior to publication cannot guarantee that the experiment was designed and executed properly from the outset. Instead it appears that our focus still remains on the exciting, outstanding, spectacular result rather than on the legitimacy of the research process by which that result was generated. Furthermore, most of the items on these lists (such as blinding, randomization, sample size calculation) can be simply claimed by investigators but without any attempt at validation. It is simple and convenient to check a box, but this does not absolve the investigator and their host Institution from their primary responsibility to ensure that experiments were properly performed: the certification of appropriate, prospective application of good scientific methodology is all based on trust.
But is it reasonable to trust scientists to self-regulate in a way that we would not trust politicians, CEOs, or bankers? For those groups there are transparent review and regulatory processes. Not so for scientists. Scientists are exempt from external scrutiny and accountability, even though funded from the public purse. When modern science, and the trust it continues to enjoy was born in 17th century England, independently wealthy gentlemen scientists were largely immune to the personal biases and vested interests that are endemic in today’s world of biomedicine. Today it is impossible to imagine Darwin thinking for decades instead of immediately rushing into print to ensure his promotion and next grant!
Although it is true that most scientists do not become wealthy from their research, they do pursue a personal agenda and have vested interests. They aspire to the next post-doctoral position, scientific recognition, tenure, and for some the power to decide the fate of fellow scientists as referees or committee members. Science is rife with conflicts of interest. The conflict that is most frequently highlighted is perhaps also the most trivial: the potential monetary reward that may ensue from membership of industrial advisory boards. Much more important, far more subversive, all-pervasive and seldom even challenged, is the universal conflict of interest resulting from the ‘currency’ with which scientists advance their careers: spectacular findings, high level publications, invitations to present at scientific meetings, secured grant funding, prizes, promotion, and election to learned Academies. These are all intimately inter-related and mutually self-reinforcing. Sufficient levels of accomplishment in these arenas is used as unequivocal evidence of scientific achievement. Furthermore that accomplishment then serves as a surrogate for any need to evaluate the quality of the science itself. This ‘currency’ is easily computed (using for example impact factor, number of invited presentations and awards), and facilitates the marginalization of the actual content, robustness, or reproducibility of a piece of scientific work. Although western scientists and journal editors frown on the recently disclosed practice of Chinese Universities rewarding their scientists with expensive gifts for high-profile publications, a similar but more sophisticated, less crass reward system exists in western countries: publication in a top-tier journal can ensure an academic appointment, promotion, or election to a prestigious academic body.
Another all-pervasive bias is the prejudice to prove our own hypotheses, to prove that we were ‘right’. Although science is considered to be an objective pursuit of the data, it is difficult for any scientist to display the degree of objectivity that demands. Often instead of dispassionately reporting the results, we design and perform experiments with the intent to prove, not disprove, our hypotheses. This deeply rooted confidence in our scientific models is extremely arrogant: it is more realistic that our constructs and preconceptions will be challenged and overturned by the harsh realities of biology. Despite this there is a very real temptation to ignore a result that does not conform, or recast it so that it supports our prejudice.
Given these obvious, albeit often unspoken conflicts, it is easy to understand the motives of scientists are often self-serving. With these powerful incentives at play, it seems even more important that experiments are designed at the outset to conform to good scientific practice, with blinded evaluation of data, incorporation of positive and negative controls, repeated experiments, use of validated reagents, avoidance of data-exclusion at the investigator’s whim, and appropriate analysis of data sets.
While there are some disincentives for outright fraud, our system currently lacks effective disincentives that ensure sloppy, irreproducible, poor quality work is spurned. In fact it is commonly joked that an investigator’s retraction can result in two publications in Nature. Disincentives should serve, at least in part, to balance the powerful incentive to be first rather than to be right. But for disincentives to be effective requires a commitment to auditing and enforcement. We need a mechanism to ensure that good scientific practice is routinely upheld. Although Editors can provide checklists, this cannot be the responsibility of Journal Editors or their reviewers; the work is essentially complete when they first see it. The primary responsibility rests with investigator and their host Institution. While some investigators are clearly able to self-censor, others are not. It is for those investigators, that the Research Institution has a critical role to play.