In “natural experiments,” the treatment or intervention is determined by variation not under the control of the researcher. These designs, used in economics and epidemiology to support inferences about causal relationships between interventions and outcomes, are useful tools to help improve the rigor of observational studies in health policy and medicine. Perhaps the first natural experiment in medicine was that of the English physician John Snow in the mid-nineteenth century. In 1854, a cholera outbreak struck Broad Street in London, killing hundreds. Studying case clusters, Snow discovered that neighborhoods supplied with water downstream of where sewage was discharged into the Thames River experienced high levels of disease, while neighborhoods receiving upstream water had low disease levels.1 Snow described the populations as similar in age, occupation, income, and social rank, divided into groups without choice, illustrating an essential component of natural experiments: similar but distinct populations that are exposed to a condition outside the researchers’ control, allowing for reasonable conclusions about the potential causal link between exposure and outcome.
Randomized clinical trials (RCTs) have traditionally been viewed as the primary method for establishing causality in health care, but they have important limitations: they are expensive; it is not always possible to randomize patients; and their findings may not be generalizable to different patient populations or nonexperimental settings. When RCTs are not possible, medical and health policy researchers have turned to observational studies. In observational studies, however, individuals are not assigned to the intervention independently of potential confounding factors that could also influence outcomes, making it difficult to separate the treatment effect from other factors that may be associated with receiving the treatment.
By contrast, natural experiments rely on variation in treatment exposure that may be unrelated to other factors associated with the outcomes. Suppose researchers are interested in examining the likelihood of long-term use and adverse outcomes for patients after an initial opioid prescription. An observational analysis might be confounded if the factors that influence a clinician’s decision to prescribe opioids (eg, cancer-related pain) also affect long-term outcomes (eg, opioid dependence). An RCT might resolve this issue but would be ethically and practically challenging. Instead, researchers could examine how long-term opioid use varies among opioid-naive individuals who, by chance, are exposed to physicians with a high propensity vs low propensity to prescribe opioids (eg, when assigned to the next available physician in an emergency department).2 In this scenario, the long-term outcomes following an initial opioid prescription could be identified by variation in the drug’s use associated with prescriber variation that is plausibly unrelated to variation in unobservable patient factors associated with both initial opioid use and long-term outcomes.
Natural experiments use quasi-randomization, a method of allocation to study groups that is not truly random and is not assigned by a researcher, such as a specific date, age, or event. These study designs have an important feature: the similarity of the groups can be measured. Treatment and control groups should be similar in sociodemographic characteristics, comorbidities, prior health care utilization, and any other factors that might be associated with outcomes, but often this is not the case and adjustments are needed based on these observed variables. Natural experiments attempt to control for unobserved variables. When well implemented, natural experiments may be more informative than traditional observational studies that do not control for unobservable confounders, but are less informative than RCTs in establishing true cause and effect. With natural experiments, the more closely the study design resembles an RCT, the more confidence we may have in the validity of the findings.
Five types of natural experiments are particularly relevant for observational studies in health policy and medicine: regression discontinuity designs (RDD), instrumental variable designs, difference-in-differences (DID) analyses, event-study analyses, and interrupted time-series2-4 (Table). This is an overview of these types of studies with health policy examples and is not intended to provide a detailed assessment of these designs.
Regression discontinuity designs identify effect sizes associated with an intervention by studying individuals with treatment assignment that differs by position on either side of a specific, arbitrary cutoff (eg, a treatment threshold, policy implementation date, an age threshold, or a geographic discontinuity).5 In this design, the probability of being exposed to the intervention changes discontinuously at this cutoff. Studies using RDD rely on the assumption that individuals on either side of the cutoff are similar, so their treatment assignment is nearly independent of their characteristics, both observed and unobserved. For example, a 2018 study evaluated the phased introduction of Medicare’s Value-Based Payment Modifier program.3 Researchers used the program’s practice size thresholds (eg, 100 or more clinicians) to evaluate whether the program was associated with practice performance—with the assumption that practices just above and below the cutoff did not differ in important ways—and found that the program was not associated with improved practice performance and may have exacerbated health disparities.
Instrumental variable analyses using quasi-random variation in assignment to treatment or intervention have also been used to study clinical and health policy interventions.6 For example, health policy researchers have been interested in whether higher spending hospitals achieve better outcomes, a relationship that is confounded by the fact that higher spending hospitals may treat patients that are disproportionately sicker, which could spuriously suggest that higher hospital spending leads to worse outcomes. To address this issue, a study examined the association between hospital spending and mortality by using quasi-random variation in ambulance dispatching patterns as an instrumental variable.7 Ambulances may have preferences for which hospitals patients are taken to for reasons that are unrelated to patient clinical severity; this, in turn, may lead otherwise similar patients to be transported to (and treated at) higher vs lower spending hospitals.
Other natural experiments use different types of analyses to assess potential causal relationships. These include DID, event study, and interrupted time series analyses. In DID analysis, researchers compare outcomes in 2 groups that were similar before an intervention (natural or otherwise) that affected only 1 of the groups.8 The DID analysis postulates that if the treatment had no effect, the differences between the groups would be unchanged after the treatment. One such study found lower long-term mortality rates after Hurricane Katrina among people who had been living in New Orleans compared with those who had been living in other similar cities, which they concluded represented the effect of migration because New Orleans residents migrated to areas with better socioeconomic conditions and lower baseline mortality after the hurricane.9 A randomized experiment on the effects of resettling a population on that scale would have been infeasible.
In event-study analyses, researchers rely on exogenous and variable timing of interventions in exposed groups to study changes within groups over time (eg, estimating the effect size for the association between care continuity and outcomes by studying patients whose primary care physicians retired at different points in time). Although event-study analyses do not require control groups, control groups without any exposure are frequently incorporated into this approach.4 Interrupted time series analyses are similar, but typically focus on changes in outcomes before and after a single event that affects a population of interest (eg, a citywide soda excise tax).10
Each of these types of study designs and analyses have important limitations that should be considered, including not controlling for unobserved or unmeasured differences between the groups, risk of selection bias due to allocation that cannot be concealed from the researchers, non-parallel trends that could affect comparisons between the groups, and spillover influences from 1 group to the other. For example, in RDD studies, assumptions must be tested to ensure that observed variables are continuous at the point where the treatment and outcome discontinuities occur, such that there are no abrupt changes in the relationship between the observed variables and the treatment or outcome except at the discontinuity cutoff. Similarly, studies that use instrumental variable analysis must ensure that an appropriate instrumental variable is selected and should acknowledge the possible threats to validity from unmeasured confounding factors.
Natural experiments offer an important approach for examining potential causal links between interventions and outcomes. Studies that appropriately use these methods could help provide data to inform questions affecting the health of patients that otherwise may remain unanswered.
Published: June 11, 2021. doi:10.1001/jamahealthforum.2021.0290
Open Access: This is an open access article distributed under the terms of the CC-BY License. © 2021 Khullar D et al. JAMA Health Forum.
Corresponding Author: Anupam B. Jena, MD, PhD, Department of Health Care Policy, Harvard Medical School, 180 Longwood Ave, Boston, MA 02115 (jena@hcp.med.harvard.edu).
Conflict of Interest Disclosures: Dr Jena reports consulting fees from Pfizer, Bioverativ, Bristol Myers Squibb, Merck Sharp & Dohme, Janssen, Edwards Life Sciences, Novartis, Amgen, Eli Lilly, Vertex Pharmaceuticals, AstraZeneca, Celgene, Tesaro, Sanofi, Aventis, Precision Health Economics, and Analysis Group, all outside of the submitted work.