[Skip to Content]
Sign In
Individual Sign In
Create an Account
Institutional Sign In
OpenAthens Shibboleth
Purchase Options:
[Skip to Content Landing]


Views 11,445
Citations 0
September 14, 2020

The Case for Algorithmic Stewardship for Artificial Intelligence and Machine Learning Technologies

Author Affiliations
  • 1Berkeley Institute for Data Science, University of California, Berkeley
  • 2Bakar Computational Health Sciences Institute, University of California, San Francisco
  • 3University of California, Berkeley School of Public Health
  • 4Center for Data-Driven Insights and Innovation, University of California Health, Oakland
JAMA. Published online September 14, 2020. doi:10.1001/jama.2020.9371

The first manual on hospital administration, published in 1808, described a hospital steward as “an individual who [is] honest and above reproach,” with duties including the purchasing and management of hospital materials.1 Today, a steward’s job can be seen as ensuring the safe and effective use of clinical resources. The Joint Commission, for instance, requires antimicrobial stewardship programs to support appropriate antimicrobial use, including by monitoring antibiotic prescribing and resistance patterns.

A similar approach to “algorithmic stewardship” is now warranted. Algorithms, or computer-implementable instructions to perform specific tasks, are available for clinical use, including complex artificial intelligence (AI) and machine learning (ML) algorithms and simple rule-based algorithms. More than 50 AI/ML algorithms have been cleared by the US Food and Drug Administration2 for uses that include identifying intracranial hemorrhage from brain computed tomographic scans3 and detecting seizures in real time.4 Algorithms are also used to inform clinical operations, such as predicting which patients will “no show” for scheduled appointments.5 More recently, algorithms that predict in-hospital mortality have been proposed to inform ventilator allocation during the coronavirus disease 2019 pandemic.6

Although the use of algorithms in health care is not new, newer emerging algorithms are increasingly complex. Historically, many simple rule-based algorithms and clinical calculators could be clearly communicated, calculated, and checked by a single person. However, many new algorithms, including predictive and AI/ML algorithms, incorporate far more data and require more complicated logic than could possibly be calculated by a single person. The complexity of these algorithms requires a new level of discipline in quality control.

When used appropriately, some algorithms can improve the diagnosis and management of disease. For example, algorithms that detect diabetic retinopathy from retinal images7 hold promise for improving the diagnosis of diabetic retinopathy, a leading cause of vision loss. However, algorithms also have the potential to exacerbate existing systems of structural inequality, as highlighted by recent research that detected racial bias in an algorithm that could potentially affect millions of patients.8

As the US Food and Drug Administration reassesses its regulatory framework for AI/ML algorithms, health systems must also develop oversight frameworks to ensure that algorithms are used safely, effectively, and fairly. Such efforts should focus particularly on complex and predictive algorithms that necessitate additional layers of quality control. Health systems that use predictive algorithms to provide clinical care or support operations should designate a person or group responsible for algorithmic stewardship. This group should be advised by clinicians who are familiar with the language of data, patients, bioethicists, scientists, and safety and regulatory organizations. In this Viewpoint, drawing from best practices from other areas of clinical practice, several key considerations for emerging algorithmic stewardship programs are identified.

Create and Maintain an Algorithm Inventory

Health systems should inventory all predictive algorithms currently in use, with a particular emphasis on understanding the exact outcome being predicted and the decisions made on the basis of those predictions. This is particularly important because recent work has shown that algorithms can reach enormous scale and potentially affect millions of individuals while major problems go undetected.8 Developing and maintaining algorithm inventories will require active upkeep. Similar to hospital drug formularies, algorithm inventories should be overseen by a centralized group analogous to existing pharmacy and therapeutics committees.

Audit Safety and Fairness

Before predictive algorithms are used to provide clinical care or support operations, they should be audited for safety and fairness in diverse patient populations. Such audits should be conducted when new algorithms are proposed or updated and whenever context, populations, or hospital conditions change significantly. At a minimum, this process should include screenings for bias and assessments of potential safety concerns. As with clinical trials of new drugs, safety and, in this case, fairness should be assessed first, prior to subsequent clinical investigation.

Algorithmic audits have helped to ensure that problematic algorithms are not deployed. For example, in 2017, researchers developed an algorithm to predict hospital length of stay, with the idea that case managers would be assigned to patients likely to be discharged soon, helping to remove any barriers to discharge. However, the algorithm “learned” that patients from less wealthy zip codes were more likely to be hospitalized for longer stays, which could have resulted in the unintended consequence of prioritizing finite case management resources to a “predominantly white, more educated, more affluent population to get them out of the hospital sooner.”9 Instead, the analysts who audited the algorithm before its deployment identified this issue and ensured that it never negatively affected patient care.

Monitor Ongoing Clinical Use and Performance

Routine evaluations should be conducted to monitor the ongoing use and performance of predictive algorithms in actual clinical settings. Similar to the medication use evaluations required by The Joint Commission for hospital accreditation, algorithm use evaluations will require periodic review by a designated group with oversight responsibility. This review is critical given that many clinical algorithms rely on input data captured via electronic health record systems or medical imaging technologies. As these systems evolve, even subtle variations in data capture may adversely influence performance. For example, researchers demonstrated that algorithms to detect hip fractures in medical images can “learn” to detect the scanner model used to collect the image,10 which could affect algorithm performance if a hospital updates or replaces existing scanners. Similarly, the performance of diagnostic algorithms in clinical settings depends on the prevalence of the underlying condition. As a result, algorithm performance may change over time as public health interventions successfully reduce the prevalence of preventable diseases or as emerging infectious diseases spread to affect new populations.

Adapt Existing Tools and Best Practices

Although the specific role of algorithmic stewardship may be new, many of the challenges posed by emerging AI/ML technologies have parallels to existing clinical practices to ensure the appropriate use of drugs and other hospital resources. Algorithmic stewardship programs should aim to identify and adapt existing best practices that are already working well within their institutions (Figure).

Figure.  Existing and Proposed Processes and Tools to Ensure Appropriate Use of Drugs for Algorithmic Stewardship Efforts
Existing and Proposed Processes and Tools to Ensure Appropriate Use of Drugs for Algorithmic Stewardship Efforts

A Path Forward

Structures of oversight and regulation have contributed to advances in the safety and efficacy of modern health systems. Algorithmic stewardship efforts will not automatically ensure that new AI/ML technology has no unintended harms. However, these activities will help to ensure that these new technologies are used safely, effectively, and fairly, and to the benefit of diverse patient communities.

Back to top
Article Information

Corresponding Author: Atul Butte, MD, PhD, Bakar Computational Health Sciences Institute, University of California, San Francisco, 550 16th St, Mission Hall, Fourth Floor, PO Box 0110, San Francisco, CA 94158 (atul.butte@ucsf.edu).

Published Online: September 14, 2020. doi:10.1001/jama.2020.9371

Conflict of Interest Disclosures: Ms Eaneff was supported by the Innovate for Health Data Science Health Innovation program, including support from the UCSF Bakar Computational Health Sciences Institute, the UC Berkeley Institute for Data Science, and Johnson & Johnson. Dr Obermeyer reported receiving equity from Berkeley Data Ventures and LookDeep Health outside the submitted work. Dr Butte reported receiving grants from Janssen during the conduct of the study and personal fees from and being a shareholder in Personalis and NuMedii; personal fees from Samsung, Mango Tree Corporation, 10x Genomics, Helix, Pathway Genomics, Verinata, Geisinger Health, Regenstrief Institute, Gerson Lehman Group, Roche, Merck, Genentech, Novartis, Covance, and AlphaSights; being a minor shareholder in Sutro, Vet24seven, Assay Depot, Nuna Health, Snap, Illumina, CVS, Biogen, Amazon, 10x Genomics, Sarepta, Microsoft, Google, Facebook, Regeneron, Moderna, and Sanofi; receiving honoraria and travel reimbursement for invited talks from Genentech, Takeda, Varian, Roche, Pfizer, Merck, Lilly, Mars, Siemens, Optum, Abbott, Celgene, AstraZeneca, AbbVie, Johnson & Johnson, Westat, and academic institutions, state or national agencies, medical or disease-specific foundations and associations, and health systems; receiving royalty payments through Stanford University for patents and other disclosures licensed to NuMedii and Personalis; and having research funded by the National Institutes of Health, Robert Wood Johnson Foundation, Northrup Grumman, Genentech, Johnson & Johnson, the US Food and Drug Administration, the Leon Lowenstein Foundation, the Intervalien Foundation, Priscilla Chan and Mark Zuckerberg, the Barbara and Gerson Bakar Foundation, the March of Dimes, Juvenile Diabetes Research Foundation, California Governor's Office of Planning and Research, California Institute for Regenerative Medicine, L'Oreal, and Progenity outside the submitted work.

Additional Information: Johnson & Johnson had the right to review the publication before submission and could delay submission for up to 60 days if necessary to make new patent applications but could not mandate any revision of the manuscript or prevent submission for publication, and had no role in the preparation or approval of the manuscript.

Additional Contributions: We thank Karla Lindquist, PhD (Johnson & Johnson), and Emma Huang, PhD (University of California, San Francisco), for proofreading the manuscript.

AMEDD/NCO enlisted soldier history. US Army. Updated June 21, 2011. Accessed February 20, 2020. https://history.amedd.army.mil/corps/nco/historynco.html
FDA cleared AI algorithms. Data Science Institute. Accessed May 8, 2020. https://www.acrdsi.org/DSI-Services/FDA-cleared-ai-algorithms
510(k) premarket notification: K190896. US Food & Drug Administration. Updated August 31, 2020. Accessed September 4, 2020. https://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfPMN/pmn.cfm?ID=K190896
510(k) premarket notification: K181861 . US Food & Drug Administration. Updated August 31, 2020. Accessed September 4, 2020.https://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfPMN/pmn.cfm?ID=K181861
Murray SG, Wachter RM, Cucina RJ. discrimination by artificial intelligence in a commercial electronic health record—a case study. Health Affairs. January 31, 2020. Accessed September 4, 2020. https://www.healthaffairs.org/do/10.1377/hblog20200128.626576/full/
White  DB, Lo  B.  A framework for rationing ventilators and critical care beds during the COVID-19 pandemic.   JAMA. 2020. Published online March 27, 2020. doi:10.1001/jama.2020.5046PubMedGoogle Scholar
Gulshan  V, Peng  L, Coram  M,  et al.  Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs.   JAMA. 2016;316(22):2402-2410. doi:10.1001/jama.2016.17216PubMedGoogle ScholarCrossref
Obermeyer  Z, Powers  B, Vogeli  C, Mullainathan  S.  Dissecting racial bias in an algorithm used to manage the health of populations.   Science. 2019;366(6464):447-453. doi:10.1126/science.aax2342PubMedGoogle ScholarCrossref
Nordling  L.  A fairer way forward for AI in health care.   Nature. 2019;573(7775):S103-S105. doi:10.1038/d41586-019-02872-2PubMedGoogle ScholarCrossref
Badgeley  MA, Zech  JR, Oakden-Rayner  L,  et al.  Deep learning predicts hip fracture using confounding patient and healthcare variables.   NPJ Digit Med. 2019;2(1):31. doi:10.1038/s41746-019-0105-1PubMedGoogle ScholarCrossref
Limit 200 characters
Limit 25 characters
Conflicts of Interest Disclosure

Identify all potential conflicts of interest that might be relevant to your comment.

Conflicts of interest comprise financial interests, activities, and relationships within the past 3 years including but not limited to employment, affiliation, grants or funding, consultancies, honoraria or payment, speaker's bureaus, stock ownership or options, expert testimony, royalties, donation of medical equipment, or patents planned, pending, or issued.

Err on the side of full disclosure.

If you have no conflicts of interest, check "No potential conflicts of interest" in the box below. The information will be posted with your response.

Not all submitted comments are published. Please see our commenting policy for details.

Limit 140 characters
Limit 3600 characters or approximately 600 words