In preparing my lectures for this course each year, I review the current literature and remove or qualify the discussion of topics where the foundational studies for those topics were challenged by newly published data, or where the original studies failed to replicate. That the scientific literature contains errors, and that attempts to replicate the findings of published studies frequently fail, is surprising to many who are new to science. However, failures to replicate and challenges to the published literature are common in all branches of science. Indeed, it is rare that any single experiment is definitive and scientists often live with uncertainty about key findings until sufficient numbers of publications by independent groups have confirmed the original results.
In the ideal, scientific questions are posed as a contest between two outcomes, or a test between two hypotheses. Let’s use a simple example typical of biomedical research: testing an inactive placebo and an active drug in lowering blood pressure. One hypothesis is the null, which specifies there is no difference in blood pressure when treated with either the placebo than the drug. The alternate hypothesis is that the active drug is more effective than the placebo in lowering blood pressure. The logic of the experiment is designed to evaluate the null hypothesis; i.e., that there is no difference between the active and inactive compounds. One rejects that null hypothesis when the evidence is statistically overwhelming that the active drug does a better job of lowering blood pressure than the placebo. This seems to be straight-forward, so how do incorrect results get into the published literature?
There are many possibilities. One possibility is that blood pressure varies day-to-day for intrinsic reasons unrelated to the experiment, and this variation, or ‘noise’, is a major contributor to the measurement of blood pressure. Perhaps the real effect of the active drug was smaller than this day-to-day variability, and so the null hypothesis was accepted when it should have been rejected. Or, it could have been that too few subjects were tested and, a few subjects had randomly low readings unrelated to the active drug when blood pressure was measured after active drug treatment. This may have caused the experimenter to erroneously reject the null hypothesis and conclude that the active drug was working. Small effect sizes (how much the true effect is larger than intrinsic noise) and small and inadequate sample sizes are major contributors to incorrect results being published.
Another set of reasons for publication of incorrect results is that the experiment used poor experimental design or included confounding variables. Perhaps blood pressure was measured in the morning following active drug treatment and in the evening for the placebo treatment, and thus introduced a systematic time-of-day confound. Perhaps all of the subjects in the active drug treatment conditions were younger than the subjects in the active drug treatment. You get the idea.
Finally, it is painfully true that some published studies are biased towards particular outcomes. It has been found that experiments supported financially by pharmaceutical companies more often report positive effects of experimental drugs than do independently funded studies. This may reflect the selective publication of positive drug effects (positivity bias) and/or the absence of publication of negative results (the ‘file-drawer’ problem). Unfortunately, it can also result from experimental malfeasance.
Regardless of the reasons, science (again, in the ideal) should be self-correcting. That is, studies by independent groups that test the same drugs may, or may not, replicate the initial studies. Over time, errors become corrected by the accumulating weight of the evidence, and scientific progress continues. Or does it?
There are severe impediments to the self-correcting nature of science. One is the sheer volume of scientific papers. A NSF worldwide survey of all science publications revealed more than 2.5M articles were published in all disciplines in 2018 alone. Another impediment is that scientific journals prioritize publishing positive results over negative results, reasoning that scientists don’t want to read about experiments that failed. This results in the aforementioned file-drawer problem where negative results (i.e., those that do not reject the null hypothesis) are never published but rather sit in the investigator’s file drawer. Finally, there is the (accurate, I believe) perception among scientists that publishing negative results does not lead to career advancement.
My own opinion is that science remains self-correcting, but that the pace of correction is slow and, perhaps, slowing. And once a positive result is published, it takes great effort to dislodge it. A single non-replication is often not enough.
One recent example of correction in science concerns the efficacy of hydroxychloroquine (an anti-malaria drug) in the prevention and treatment of Covid–19. The specific issue of hydroxychloroquine, and the general issue of self-correction in science, was reported at the online health news website Statnews. The following summary is taken from their reporting.
The original report of positive results of hydroxychloroquine for treating Covid was published in the International Journal of Antimicrobial Agents after first appearing in mid-March 2020 on a non-peer reviewed preprint server. The peer review process was concluded in a single day (most journals take months for peer review). The study itself was a small sample (20 patients with Covid–19 who received hydroxychloroquine and 16 controls), non-randomized, open-label study. According to Statnews, independent researchers had raised extremely serious concerns with the study’s methods and conclusions with days after its publication. A subsequent independent review concluded that “this study suffers from major methodological shortcomings which make it nearly if not completely uninformative.” Despite this, the paper excited great interest and millions of individuals subsequently received hydroxychloroquine as a treatment for Covid–19.
Subsequent large-scale and well-controlled studies of hydroxychloroquine confirmed that it was useless as a preventative measure or treatment for Covid–19. Indeed, meta-analysis reported in Nature Communications that considered data from more than 10,000 patients demonstrated that patients with Covid–19 receiving hydroxychloroquine as a treatment were more likely to die than patients with Covid–19 who were not given this drug. Why this is so is not clear, but may reflect bias in who received hydroxychloroquine treatment (for example, perhaps the drug was administered only to the most severe cases). So, ultimately, science self-corrected but at great cost in time, money, and, perhaps, lives.
A telling coda to this story is that the journal article that first reported the value of hydroxychloroquine was cited 4000 times. The review paper that ‘corrected’ the science was only cited 38 times (both figures from Google Scholar, as reported by Statnews).
The case of hydroxychloroquine, though illustrative, is extreme. Most issues of scientific self-correction do not involve life or death issues. However, there are many instances where unsubstantiated scientific reports affect public policy. We will discuss some of those instances in the realm of neuroscience throughout the semester.