JAMA Health Forum – Health Policy, Health Care Reform, Health Affairs | JAMA Health Forum | JAMA Network
[Skip to Navigation]
Sign In
Views 1,556
June 11, 2021

Exploring Potential Causal Inference Through Natural Experiments

Author Affiliations
  • 1Department of Health Care Policy, Harvard Medical School, Boston, Massachusetts
  • 2Statistical Editor, JAMA Health Forum
JAMA Health Forum. 2021;2(6):e210289. doi:10.1001/jamahealthforum.2021.0289

The potential outcomes framework, also called the Rubin causal model,1 is the foundation for inference in “natural experiments” like those discussed in the Viewpoint by Khullar and Jena in this issue of JAMA Health Forum.2 In that framework, the key assumption for causal inference from classical controlled experiments, ie, randomized clinical trials (RCTs), is that treatment assignment is conditionally independent of potential outcomes. In RCTs, a variety of mechanisms enable researchers to rely on this assumption of independence, commonly known as unconfounded treatment assignment.1 Randomization devices (eg, coin tosses and computer-generated pseudorandom numbers) avoid predictable patterns that could bias inference. Strong social and information controls (eg, blinding and placebo treatment) also discourage any actions that might compromise the independence of assignment. Well-established corresponding design-based analytic methods reduce the potential for selective reporting of conventionally significant results (“P-hacking”).

Natural experiments as described by Khullar and Jena2 differ from true experiments in essential features that stem primarily from the researchers’ lack of control over treatment assignment. When treatment assignment is determined by some combination of contextual variables and characteristics of the subjects, it is difficult to argue that it is entirely independent of these variables in the way that a coin toss is independent. One might at best argue that the effect of confounding on estimates is minimal, either from the start or after statistical control. Common control methods include modifying the distributions of covariates through statistical matching or weighting or controlling for their effects by regression modeling of outcomes. In either case, the arguments rely on some combination of empirically based theory about the importance of these effects and exploratory modeling.

Despite these caveats, natural experiments are worthy of attention, especially when the structures of data and potential causal relationships approach those of an experiment and where relevant scientific theory supports a possible causal claim. In this sense, natural experiments are distributed along a continuum with RCTs at one end and purely descriptive observational studies at the other. In this spirit, I offer a brief and partial checklist to encourage the appropriate mix of creativity in and skepticism about studies using natural experiments as reported in JAMA Health Forum and other journals.

Identify the Intervention

Holland and Rubin coined the aphorism, “No causation without manipulation.”3 Rather than conferring a distinct metaphysical status on human agency, this asserts that an empirical study is only interpretable causally if it compares the effects of alternative interventions that could possibly be applied exogenously. For example, a study comparing mortality in the same hospital in 2000 and 2010 cannot be interpreted as causal despite its data structure being similar to that of an experiment—hospitals and patients cannot be assigned to a different year. Instead, the 2010 distribution of measured patient characteristics could be adjusted to resemble that observed in 2000. The resulting noncausal controlled comparison might provide evidence on the correlates of changes in hospital resources, care practices, and patient populations over the decade, without establishing a causal effect of any intervention.

Another problematic hypothetical intervention might concern measuring the “effect” of decreasing students’ body mass index (BMI) on their future health status. There are many possible interventions that could do this. Were students invited to receive intensive personal training or given a cash subsidy to buy healthy food? Or did a budgetary impasse lead to loss of their free lunches? Each of these interventions might reduce BMI, but with diverging effects afterwards. Thus, the intervention may be the assignment to 1 of the weight-reduction programs or events, not the decrease in BMI. The fallacy of treating BMI as a potential causal factor is that it cannot be manipulated directly; thus it is not, in this framework, an intervention. Rather, BMI is an intermediate outcome of several interventions and does not have a uniquely defined causal effect.4

Identify Plausible Nearby Controls

Difference-in-differences analysis controls for differences in baseline levels of a key variable by comparing changes from baseline to posttreatment in treatment and control groups. If these changes from baseline are equal, they might be explained as effects of common processes (contextual trends, maturation, etc) operating on similar samples, while differences in these differences (interactions) are interpreted as treatment effects. However, if the baseline distributions of characteristics in the 2 groups are very different, the interaction might be nonzero even when there is no treatment effect, introducing a bias. Surprisingly, attempting to correct bias by matching units from the treatment and control groups can instead exacerbate the bias. Suppose a new treatment becomes available for a condition that presents with varying severity, and the probability of receiving the new treatment increases with severity. Aiming to correct the resulting bias in the difference-in-differences comparisons, the analyst matches pairs of treated and control patients on the baseline severity measure, finding the most overlap of distributions between the 2 group means. In posttreatment data, the matched cases regress to their (different) means regardless of the treatment’s efficacy, biasing estimates of treatment effects.

Biases like these can take many forms and cannot be identified or corrected by statistical methods alone.5 Interrupted time series analyses are vulnerable to similar problems, which are not generally removed by the common advice to test for parallel trends in the pretreatment period.6

Establish a Null Distribution Through Falsification

In statistical hypothesis testing of an experimental effect, a test statistic that signals detection of an effect is referred (compared) to its distribution under a null hypothesis representing no experimental effect. In a natural experiment, uncontrolled and unmodeled processes unrelated to the potential causal effect of interest might generate background noise that contributes to the probability of a false positive but is difficult to model. To assess the evidence that positive findings of a regression discontinuity, interrupted time series, or event-study analysis represent a real effect rather than background noise, the distribution of the test statistic can be examined respectively at hypothesized thresholds, change points, or event times at which the researcher believes that there are no effects of the kind under examination. If such a falsification test produces evidence of such effects, we conclude that there may be false positive signals. For example, Roberts and colleagues7 reanalyzed data from a regression discontinuity study showing a discontinuity in behavior as a function of physician group size at the cutoff of 100 physicians above which a quality reporting requirement became compulsory. Similar tests on cutoffs ranging from 50 to 150 physicians detected equal or larger discontinuities at other group sizes for which there was no programmatic explanation. Analysis of background noise may reveal statistical patterns, such as seasonal cycles, or mechanisms that are detectable and predictable, identifying confounders and suggesting methods for better control.

Pay Attention to Spillover and Social Effects

The standard conditions for experimental design in the Rubin causal model1 define the potential causal effects as purely individual, assuming that spillovers are excluded. This is appropriate for studies of effects of clinical interventions that will be assigned to individuals. The fact that some naturally assigned interventions are actually assigned to clusters of connected patients is a confounding limitation of natural experiments. In some cases, however, this type of assignment more realistically represents the most effective implementation of the intervention. In fact, this is particularly true for public health interventions and a fundamental concern of epidemiology. When the units for assignment are families, neighborhoods, or patients of a clinical practice, the way in which benefits of an intervention spread across social networks becomes an appropriate object of study and natural experiments may be a suitable context for analyzing such effects.8

Discussion of natural experiments has focused on the extent to which they can replicate causal effects that would be estimated with designed experiments. From that perspective, the potential for confounding is unavoidable. As Khullar and Jena point out, the effects of observed confounders can be estimated, as long as assignment is not entirely predictable.2 However, the influence of unobserved confounders on estimates of potential causal effects remains a concern and cannot be dismissed merely because the estimated effects of observed confounders are small. Heterogeneity of treatment may also affect the consistency of results. Nonetheless, natural experiments may play an important part in a more comprehensive approach to causal inference that combines unconfounded estimates from particular populations with information from less controlled studies.

Back to top
Article Information

Published: June 11, 2021. doi:10.1001/jamahealthforum.2021.0289

Open Access: This is an open access article distributed under the terms of the CC-BY License. © 2021 Zaslavsky AM. JAMA Health Forum.

Corresponding Author: Alan M. Zaslavsky, PhD, Department of Health Care Policy, Harvard Medical School, 180 Longwood Ave, Boston, MA 02115 (zaslavsk@hcp.med.harvard.edu).

Conflict of Interest Disclosures: None reported.

Imbens  G, Rubin  D.  Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge University Press; 2015. doi:10.1017/CBO9781139025751
Khullar  D, Jena  AB.  “Natural experiments” in health care research.   JAMA Health Forum. 2021;2(6):e210290. doi:10.1001/jamahealthforum.2021.0290Google Scholar
Holland  PW.  Statistics and causal inference.   J Am Stat Assoc. 1986;81(396):945-60. doi:10.1080/01621459.1986.10478354 Google ScholarCrossref
Hernán  MA, Taubman  SL.  Does obesity shorten life? The importance of well-defined interventions to answer causal questions.   Int J Obes (Lond). 2008;32(suppl 3):S8-S14. doi:10.1038/ijo.2008.82 PubMedGoogle ScholarCrossref
Daw  JR, Hatfield  LA.  Matching in difference-in-differences: between a rock and a hard place.   Health Serv Res. 2018;53(6):4111-4117. doi:10.1111/1475-6773.13017 PubMedGoogle ScholarCrossref
Bilinski  A, Hatfield  LA. Nothing to see here? Non-inferiority approaches to parallel trends and other model assumptions. Accessed May 5, 2021. https://arxiv.org/pdf/1805.03273.pdf
Roberts  ET, Zaslavsky  AM, McWilliams  JM.  The value-based payment modifier: program outcomes and implications for disparities.   Ann Intern Med. 2018;168(4):255-265. doi:10.7326/M17-1740 PubMedGoogle ScholarCrossref
Christakis  NA, Fowler  JH.  Social contagion theory: examining dynamic social networks and human behavior.   Stat Med. 2013;32(4):556-577. doi:10.1002/sim.5408 PubMedGoogle ScholarCrossref
Limit 200 characters
Limit 25 characters
Conflicts of Interest Disclosure

Identify all potential conflicts of interest that might be relevant to your comment.

Conflicts of interest comprise financial interests, activities, and relationships within the past 3 years including but not limited to employment, affiliation, grants or funding, consultancies, honoraria or payment, speaker's bureaus, stock ownership or options, expert testimony, royalties, donation of medical equipment, or patents planned, pending, or issued.

Err on the side of full disclosure.

If you have no conflicts of interest, check "No potential conflicts of interest" in the box below. The information will be posted with your response.

Not all submitted comments are published. Please see our commenting policy for details.

Limit 140 characters
Limit 3600 characters or approximately 600 words