From mouse to man through the valley of death?

Translation’ – from mouse to human and back – the mantra and eternal quest of university medicine! Where else than in academic medical centers can you find all this under one roof: basic biomedical and clinical research, the patients necessary for clinical trials, government funding, as well as motivated and excellently trained personnel! ‘Translation’ is as old as academic medicine itself – but the term for it was only coined in the 1980s and since then it has adorned the websites and mission statements of all university hospitals worldwide. Translation has certainly been a model of success – just think of, treatment of chronic neurological disorders like epilepsy, Parkinson’or multiple sklerosis, therapies of some forms of cancer, or HIV.

However, not only notorious skeptics like myself, but even the German Research Organization (Deutsche Forschungsgemeinschaft, DFG) or the German Science Council (Wissenschaftsrat, WR) have been complaining for some time that bench to bedside translation is in a crisis. Too many therapies which are highly effective in experimental models dismally fail when tested in clinical trials.  All kinds of poetic metaphors are used to describe this problem, such as the ‘translational roadblock’, or even the translational ‘valley of death’. In many areas of medicine, despite massive international research efforts, things are not really moving forward. In my field, stroke medicine, thousands of researchers including myself have been researching the basic pathophysiology of stroke with great enthusiasm for decades, fantastic papers, spectacularly effective therapies for rodents – with a little luck some of us even got tenure – but little to none of our findings have benefitted patients with stroke! Is stroke an exception, are stroke researchers perhaps simply incapable? But then what about Alzheimer’s researchers? Where are the stem cell therapies that have been promised for so long and already are highly effective in animal models? Where are the wonder treatments that should result from decoding the human genome?

Anyone who has not yet fully spun himself into the protective cocoon of the ivory tower – measuring our success in terms of the amount of third-party funding raised or the impact factor of our publications – may already be puzzling over this. How successful are we in translation, especially when measured by the use of resources and our own promises? For some time now an international soul searching is ongoing, trying to pinpoint the causes of the disappointing results of translational research. And they were found indeed, or so we think. It all happens in the valley of death, which must be crossed alive, and the scientists and clinicians meandering through the valley don’t have the right ‘mindset’, in other words we lack the right inner attitude.

But the very metaphor of the valley of death already leads us on the wrong track. It suggests two antipodes: Here basic research, there clinical research, in both everything is going very well – but the inhospitable conditions in between are the problem. This line of thinking suggests a strategy for improving the success rate of the translation process: we need to take the researchers and clinicians by the hand and explain to them how to do it right. We should always think of the patients when investigating disease mechanisms experimentally, or of the mice when treating humans. All we have to do is make sure we have the right ‘mindset’. And provide the enlightened researchers with a few critical infrastructures to support them in their efforts. At least that is how the DFG sees it in its recently published ‘Recommendations for the Promotion of Translational Research in University Medicine’. I’m afraid it’s not that simple – and this approach misses the main reasons for the disappointing results of translation. This is tragic, because some of them could be eliminated quite easily.

Perhaps the most trivial translational obstacle, of course, is the incredible complexity of (patho)biology. Paradoxically, as our understanding of a disease mechanism increases, we usually move further away from a potential therapy than we do towards it. Interventions in signaling pathway A, which have the desired effect, often lead to opposing, undesirable effects on signaling pathway B. How to overcome complexity? By more research, of course, and mostly of the very basic sort. Related to complexity and just as unpleasant is the phenomenon of the ‘low hanging fruits’, which we have already picked. The few disease mechanisms that can be treated easily and with few side effects are already being targeted – think penicillin, insulin, dopamine, beta-blockers, proton pump blockers, cyclooxygenase inhibitors (even then, a lot could still go wrong, just think of Vioxx). We can already treat many common diseases very successfully. To treat high blood pressure even better, or epilepsies, or multiple sclerosis, is very difficult. Much to the chagrin of the pharmaceutical industry, by the way, which does not thrive on nature-papers but on profitable drugs. Having ‘picked’ the blockbusters and without great ideas for some time now, the industry mainly exploits me-too approaches, resulting from past successes.

And then there is the problem of low internal validity, especially in preclinical research. To put it more bluntly, I mean low research quality. The majority of all experimental studies, whose results are the foundation of the clinical developments based on them, do not control for biases and are not conducted in a randomized or blinded fashion. In addition, the group sizes are almost always smaller than 10, which is equivalent to throwing a dice given the biological variance of the results. However, we are playing with a prepared dice, because the lack of pre-registration of the planned experiments and analyses gives us scientists a great deal of freedom in the selection of desired results, or in the omission of undesired ones. This selective use of data is supported by erroneous statistics, especially the popular ‘p-hacking’, i.e. the execution of statistical tests until a significant result is obtained. Once our ‘story’ is nailed, the motto is: ‘Take the paper and run’. Under these circumstances it is better to refrain from repeating the results (replication), maybe even by independent examiners. We don’t do replications anyway, as this does not help our careers – especially since a great story no longer looks so clearly black and white in the biological penumbra of ambiguous results. And because the NULL and negative results, where we didn’t get what we had hoped for, would only be published in F1000Research or PLOS One. And these papers might ‘contaminate’ our CVs – so we rather archive such results on our hard disks.

Internal validity of our research is often low, so what about external validity? The majority of preclinical models are quite remote from the patients with the disease under investigation, not only because of the species mismatch. Here is another example from preclinical stroke research: The mice we use are genetically virtually identical (inbreeding), predominantly male, juvenile, and are fed a vitamin-rich muesli diet and kept under clean room conditions (SPF). This means that they have never had an infection or other diseases, and thus have immature, even neonatal immune systems, even in adulthood. I will spare you the comparison of these mice to typical stroke patients. Why have we practiced this for decades, even though none of the therapies which were highly effective in rodent models have been successful in humans? Because we have become accustomed to our it,  everyone else does it, and it resulted in publications. These help to acquire third-party funding, which can in turn be used to publish prolifically.

At this point in the translational chain, i.e. the description of a new disease mechanism or even a new therapy that is effective in animal experiments, the horse is already out of the barn. In other words, a clinical development begins that might be based on an unsound preclinical foundation. If it really were the case that animal experiments with low internal and external validity, small sample sizes despite high variance, selective selection of data and problematic statistical analysis could be used to justify successful therapies in patients, then there would be no need for animal experiments at all!

But let’s assume, just theoretically, that a really solid candidate for clinical testing has been found, despite all the above-mentioned adversities. What are the chances that we will find an effective therapy in a randomized clinical trial? I’ll skip some intermediate steps that are required by law, all of which can also break the translational chain, such as pharmacology/toxicology, and the study of absorption, distribution, metabolism and elimination (ADME) of the drug. On average, the success rate must not be greater than 50%! This is because of equipoise, which is required for ethical reasons in clinical studies! Possible benefits and risks for the patient in a clinical trial must be balanced before starting the study. There should be no clear evidence that the study medication will be superior to placebo. Otherwise it would be unethical to withhold the new treatment from the patient and give a placebo instead! This is another reason why we should not expect translation to be 100% effective. Clinical studies must be allowed to fail! But they must be designed in such a way, and this applies equally to preclinical experiments, that even a ‘negative’ result generates usable and relevant evidence. For example, knowledge about an ineffective dose, whereupon another dose can be tried, or about a side effect, etc. And results must be published promptly. And here we have another reason for translational failure. 60 % of all clinical studies of German university medical centers have not yet published any results 2 years after completion, in 40 % this is still the case after 5 years. This is not only unscientific, but also unethical. Patients have participated in the studies because the knowledge generated benefits future generations of patients. Although patients could hope for their own benefit, because of the placebo arm and equipoise chances are equivalent to a coin toss, even if they receive the study medication.

Do clinical studies, which are regulated and controlled by various authorities and conducted under the clinical quality management, produce more robust results than the preclinical studies on which they are often based? This is probably true in most cases, but flawed study design is another common cause of translational failure. Here again is a typical example from stroke research: If neuroprotective substances, i.e. substances that protect the brain from damage after a stroke, are only effective in rodents in the first few hours after vascular occlusion, one should not really be surprised if they are not effective in the patient if treatment is only given after 12 hours. But this is what has happened in many (unsuccessful) acute stroke studies.

So the translational chain can break at very different points. I have only mentioned a few. It is enough to break the weakest link in the chain to expose thousands of patients to unnecessary risks and to waste gigantic resources. Because the entire process usually costs hundreds of millions of euros. Notwithstanding, it is paradoxically good news that translational success does not and cannot occur in 100% of cases. And here is more good news: a considerable proportion of the weak links can be replaced relatively easily. An increase in internal and external validity as well as sufficient sample sizes and proper statistics, as well as pre-registration of studies, publication of NULL results, and replication of important findings could put translation on a solid foundation. Analogous in the clinical area: Ensuring that robust preclinical evidence is available, sufficiently powered trials, study designs that are informative even if the hoped-for result is not achieved, and timely publication of results. Once we have achieved this, we can also take care of the ‘translational mindset’ of the researchers involved. But here is the bad news: None of this is included in the recommendations of the DFG. I’m afraid this is because many of the measures I mentioned do not really fit into our academic career – and funding system. After all, improving the success of translation also means changing the standards on which professional advancement in university medicine depends!


A German version of this post has been published as part of my monthly column in the Laborjournal: 


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s