Randomized clinical trials (RCTs) are considered the “gold standard” for providing actionable evidence to guide clinical decision making. However, they cannot always address important questions. For instance, statistically significant results for low-frequency outcomes like mortality sometimes require longer follow-up times or larger studies than can be practically undertaken.
In such cases, we have a choice: we can either go without evidence or we can turn to observational studies. Such studies often can be much larger and accommodate longer follow-ups. But because participants are not explicitly randomly assigned to treatment and control groups, observational studies can produce biased results. There are, however, advances in methods that can minimize that bias and increase our confidence in findings.
Consider a recent observational study comparing 2 classes of drugs that are used to treat type 2 diabetes when initial treatment fails to control blood-sugar level. The investigators found that use of sulfonylureas (SUs) is associated with greater mortality and more avoidable hospitalizations than the use of thiazolidinediones (TZDs). Both types of medications are frequently prescribed as second-line treatments, but concerns about the cardiovascular safety of TZDs have been a focus of controversy. Given the new findings, should clinicians now favor TZDs? That depends on an assessment of the study’s sample and methods.
An advantage of the study, published in Value in Health, is that it was 20 times larger and had a much longer follow-up than any prior comparative-effectiveness RCT of second-line diabetes medications. This permitted the investigators to obtain outcomes on low-frequency events like mortality; prior RCTs principally only had power to examine blood glucose control. The study, based on merged Veterans Health Administration (VA) and Medicare data, examined more than 80 00 patients for up to 10 years. It found that, relative to TZDs, SUs cause a 68% increase in risk of avoidable hospitalization and a 50% increase in risk of death.
(Full disclosure: the study was led by my colleague Julia Prentice, PhD, and I work closely with another coauthor, Steven Pizer, PhD.)
That the study by Prentice and colleagues is not an RCT should be viewed as only a mild weakness, in my view, because the instrumental variable methods it employed were rigorously tested and found to be very strong. Admittedly, some would disagree with the interpretation of this design as a mild weakness.
The key threat to any observational study is bias from factors that cannot be observed and controlled. Randomized clinical trials address this issue with randomization, the proverbial flip of a coin. As its source of random variation, the Prentice et al study used physician prescribing patterns instead. Patients were effectively randomized to receive an SU or TZD according to the how frequently their physician prescribed one over the other in the prior year. Patients cared for by a clinician more likely to prescribe an SU can be thought of as more likely to be randomized to receive an SU, and similarly for TZDs. Supporting this approach, prescribing pattern has been applied as an instrument in prior work.
Such prescribing patterns are only a valid source of randomization if, like an RCT’s coin flip, they are not correlated with any unobservable factor that also affects outcomes (for example, general quality of care). This is the key assumption and the one many people find hard to swallow for instrumental variable studies. How do we increase our confidence this assumption holds in this case?
If prescribing pattern is a good randomizer, there should be a balance of observable factors, like demographics or prevalence of other diagnoses, among patients more commonly prescribed SUs or TDZs. If we observe such balance, that should increase our confidence that there is also balance among unobservable factors too, just like in an RCT. Checking balance on observables is a falsification test, and one that is standard in RCT reporting. Prentice’s team conducted this falsification test and showed balance in demographic, diagnoses, and provider quality variables. That’s all we expect to see to convince us of an RCT’s validity, but this is an instrumental variable study, so Prentice’s team did more.
Imagine if an RCT’s coin flip was found to affect outcomes even in a population that never received the treatment under study (that is, those assigned to treatment didn’t receive it). This would suggest the coin flip that was thought to be random really was not, invalidating the RCT. This can occur if, for example, there is a breakdown in procedure and people assigned to the treatment group are systematically different than those assigned to the control group.
The same logic applies to instrumental variable studies, and Prentice and coauthors looked for evidence of such a problem. The authors examined 2 populations that bracketed the study population in disease severity and did not receive the treatment under study: a healthier population taking metformin (typically the first-line treatment for type 2 diabetes) but not receiving a second-line treatment and a sicker population that had been prescribed metformin and then insulin without any other diabetes drug. The same potential bias related to causal factors that are unobservable to the researcher (if there are any) is as likely to apply to these 2 populations as to the primary sample. So if prescribing patterns were correlated with outcomes in these populations, that would invalidate it as a randomizer. (A JAMA Viewpoint by Vinay Prasad, MD, and Anupam B. Jena, MD, PhD, also describes the application of falsification tests, but to postmarketing surveillance of medications.)
For neither of these “bracketing” populations in the study by Prentice et al was prescribing pattern related to outcomes. This provides strong support for prescribing pattern as a randomizer. It’s consistent with the assumption that prescribing pattern only affects outcomes through its effect on treatment with SU or TZD, just as one would desire of an RCT’s coin flip.
In summary, there is considerable evidence that, in this study, prescribing pattern randomizes patients to SU or TZD in a manner that one would expect of an RCT. Consequently, there should be some confidence that the findings are reporting something valid. Should they be replicated with an RCT before they can be trusted? Maybe the more important question is, “Could an RCT be fielded?”
It would be unprecedented. Neither a New England Comparative Effectiveness Public Advisory Council literature review nor one by the Canadian Agency for Drugs and Technologies in Health found an adequately powered RCT for examination of mortality or development of diabetes-related complications related to second-line treatments. In principle, one could be undertaken. In practice, amassing enough participants to maintain study protocols for long enough is a big challenge, perhaps an insurmountable one.
If that’s true, and clinicians want to make evidence-based decisions based on mortality for second-line treatment of type 2 diabetes (or in similar clinical situations), they have no choice but to rely on observational studies.
Is the study by Prentice et al strong enough? I’m interested in your answer to this question. You may submit comments to me at afrakt@gmail.com.