Key PointsQuestion
How are racial and ethnic biases associated with health care algorithms and efforts to address these biases perceived?
Findings
In this qualitative study about views regarding health care algorithms, responses from 42 respondents suggested algorithms are in widespread use and may be biased whether or not they include race; there is no standardization in how race is defined; bias can be introduced at all stages of algorithm development and implementation; and algorithms’ use and bias should be discussed between clinicians and patients, who are often unaware of their use and potential for bias.
Meaning
Findings suggest that standardized and rigorous approaches for algorithm development and implementation are needed to mitigate racial and ethnic biases from algorithms and reduce health inequities.
Importance
Algorithms are commonly incorporated into health care decision tools used by health systems and payers and thus affect quality of care, access, and health outcomes. Some algorithms include a patient’s race or ethnicity among their inputs and can lead clinicians and decision-makers to make choices that vary by race and potentially affect inequities.
Objective
To inform an evidence review on the use of race- and ethnicity-based algorithms in health care by gathering public and stakeholder perspectives about the repercussions of and efforts to address algorithm-related bias.
Design, Setting, and Participants
Qualitative methods were used to analyze responses. Responses were initially open coded and then consolidated to create a codebook, with themes and subthemes identified and finalized by consensus. This qualitative study was conducted from May 4, 2021, through December 7, 2022. Forty-two organization representatives (eg, clinical professional societies, universities, government agencies, payers, and health technology organizations) and individuals responded to the request for information.
Main Outcomes and Measures
Identification of algorithms with the potential for race- and ethnicity-based biases and qualitative themes.
Results
Forty-two respondents identified 18 algorithms currently in use with the potential for bias, including, for example, the Simple Calculated Osteoporosis Risk Estimation risk prediction tool and the risk calculator for vaginal birth after cesarean section. The 7 qualitative themes, with 31 subthemes, included the following: (1) algorithms are in widespread use and have significant repercussions, (2) bias can result from algorithms whether or not they explicitly include race, (3) clinicians and patients are often unaware of the use of algorithms and potential for bias, (4) race is a social construct used as a proxy for clinical variables, (5) there is a lack of standardization in how race and social determinants of health are collected and defined, (6) bias can be introduced at all stages of algorithm development, and (7) algorithms should be discussed as part of shared decision-making between the patient and clinician.
Conclusions and Relevance
This qualitative study found that participants perceived widespread and increasing use of algorithms in health care and lack of oversight, potentially exacerbating racial and ethnic inequities. Increasing awareness for clinicians and patients and standardized, transparent approaches for algorithm development and implementation may be needed to address racial and ethnic biases related to algorithms.
Commonly incorporated into electronic health records, clinical guidelines, and health care decision tools used by health systems and payers,1,2 algorithms are associated with quality of care, access, and patient outcomes.1-4 Some algorithms are developed from biased data sets or rely on incorrect assumptions, resulting in care disparities for racial and ethnic minority groups.
There is significant concern that algorithms may perpetuate bias and inequities in care. The algorithm used to estimate kidney function, the estimated glomerular filtration rate, includes an adjustment implying that Black people have healthier kidneys compared with White people when the individuals are otherwise similar. Such an adjustment could restrict care for Black people, including access to kidney transplants.5 An algorithm used to identify patients with complex medical needs who might benefit from additional services underestimated need among Black patients because health care use was misconstrued as a proxy for illness severity.2 As a result, some Black people appeared ineligible for additional services despite having worse health.
In 2020, Congress requested that the Agency for Healthcare Research and Quality conduct a review on the use of race- and ethnicity-based algorithms in health care. To help inform the evidence review, the agency invited input from public stakeholders via a request for information (RFI). The RFI solicited information about the repercussions of and efforts to address algorithm-related bias, awareness and perspectives on the topic, and identification of important areas for future efforts, including research. We qualitatively analyzed responses to the RFI. To our knowledge, no prior study has evaluated information from stakeholders on racial and ethnic bias related to health care algorithms.
The RFI was posted from March 5, 2021, to May 4, 2021, in the Federal Register and included 11 open-ended questions we developed (Table 1). The Agency for Healthcare Research and Quality also emailed approximately 170 000 professionals and organizations via the agency’s listserv about the opportunity. We used a modified rapid thematic analysis approach, a qualitative methodology6 consistent with study objectives to broaden understanding of a phenomenon (ie, race- and ethnicity-based algorithms) rather than to generate new theory.6,7 First, the research team agreed on an approach to open coding with manual extraction of excerpts from RFI responses to identify codes and emerging themes. Three team members (A.J., J.R.B., and C.C.A.) independently open coded all responses in separate Microsoft Excel workbooks, coding line by line, documenting codes and themes. The team then met for five 1.5-hour sessions, first to create a single codebook of themes and subthemes by naming common themes, discussing discrepancies, refining themes iteratively, and reaching consensus. Excerpts were then organized into codes, subthemes, and themes, and illustrative quotes were selected. Final themes, subthemes, and representative quotes were achieved by consensus with the entire team. This qualitative study, conducted from May 4, 2021, through December 7, 2022, was internally submitted to the Agency for Healthcare Research and Quality institutional review board and classified as exempt because there was no risk to human subjects and the study was not considered research under the Common Rule. The study followed the Standards for Reporting Qualitative Research (SRQR) reporting guideline.8
Forty-two respondents included representatives from 16 professional organizations, 9 digital health or health technology organizations, 7 academic organizations, 4 federal and state agencies, and 1 payer organization, as well as 5 individuals; responses totaled 485 pages of text. The questions answered and the length of responses varied considerably across respondents. In response to question 1 about algorithms in use, 18 algorithms were identified, many by more than 1 respondent. Qualitative analysis of the responses yielded 7 themes and 31 subthemes (Table 2).
The Results section is organized by theme and includes summaries of the content supporting each theme and its subthemes. Illustrative quotes corresponding to themes are presented in Table 3.
Theme 1: Importance of Addressing Racial and Ethnic Bias in Health Care Algorithms
Respondents endorsed efforts to address racial and ethnic bias in algorithms associated with health inequities, often emphasizing the national conversation about racism and the urgency of addressing long-standing inequities. The widespread and increasing use of algorithms in health care and lack of oversight could perpetuate racial and ethnic inequities in care. Algorithms developed via artificial intelligence or machine learning in which included variables are not transparent are particularly worrisome because bias could be introduced unknowingly. Broader understanding of algorithms, their use in health care, and their association with inequities in care is essential (Table 3, quotes 1.1-1.2).
Theme 2: Algorithm Uses, Harms, and Benefits
Respondents identified algorithms in use with a range of purposes, including predicting hospitalization or mortality, making diagnoses, determining disease severity, and allocating resources for high-risk groups (Table 4).2-4,9-23 Algorithms incorporated into an electronic health record or recommended within a clinical practice guideline could have significant repercussions, including perpetuating biases. Although algorithms may be useful, harms remain a concern because algorithms might be associated with decreased access to resources, increased misdiagnoses, biases in care, and misrepresentation of the needs of minority patients. One respondent suggested (algorithmic) risk models could potentially “[adjust] away inequities,” thereby creating a different standard of care for racial and ethnic minority groups (Table 3, quotes 2.1-2.2).
Continued use of race and ethnicity within algorithms may perpetuate stigma and discrimination against racial and ethnic minority groups. Flawed research studies were implicated in promoting racial and ethnic inequities in health care, such as the purported increased muscle mass attributed to Black people, justifying estimated glomerular filtration rate adjustment. Use of unreliable research could reinforce notions suggesting racial and ethnic minority groups are biologically predisposed to worse health outcomes, possibly removing motivations to tackle structural racism and other causative factors for illness. With “race-based medicine” deeply entrenched in clinical education and practice (Table 3, quotes 2.3-2.4), redressing past harms is imperative.
The consequences of algorithms may extend beyond health care, such as access to disability compensation; the Fitzpatrick phototype scale estimates skin cancer risk by using skin color and could limit compensation in darker-skinned individuals. Algorithms are also used to address health inequities: 1 respondent described an algorithm that improved outcomes for Black patients in their system (Table 3, quotes 2.5-2.6).
Theme 3: Awareness and Repercussions Among Clinicians and Patients
Clinicians are likely unaware of the ubiquity of algorithms in health care and potential for bias. Possibly owing to the burden of clinical duties and lack of algorithm expertise, clinicians entrust professional organizations and algorithm developers to vet algorithms for clinical use. Clinicians may be incentivized to use specific algorithms, perpetuating bias, especially if used beyond their clinical purpose. Some algorithms require clinicians to input variables (such as race) without precise instructions.
Few relevant educational resources exist for clinicians. Although published research on algorithms may exist, clinicians are not trained to appraise algorithms or their validity. However, understanding the uses and pitfalls of algorithms is critical to engaging in shared decision-making with patients. Shared decision-making about bias and the use of algorithms is understudied, with little guidance about how to talk to patients about race, bias, or algorithms. Ideally, discussing algorithms could facilitate shared decision-making conversations in which some degree of imprecision in predictions is tolerated. For example, algorithm results with or without a race- and ethnicity-based “correction” can be considered and patient preferences respected.
Patients are mostly unaware of how algorithms affect their care, whether race information is included, or the potential for bias. Patients may be generally uninformed about how their data are used. If patients are unaware, they may not be able to meaningfully consent to and participate in care involving algorithms. Low health literacy, poor clinician awareness, and the technical complexity of many algorithms likely all affect patients’ low level of familiarity. Increased research and communication to enhance patient health literacy regarding the use of algorithms and their potential for bias is needed. The complexity and sensitivity of the topic also calls for prudence when communicating with patients and the public (Table 3, quotes 3.1-3.9).
Theme 4: Race as a Social Construct
Rather than being biological or genetic, race is socially constructed, often with more variation within a racial and ethnic group than between groups. Racial disparities are associated with socioeconomic and environmental factors; therefore, some algorithms include race or ethnicity as a proxy for a combination of biological factors and social determinants of health (SDOH). Unequal treatment resulting from discrimination and structural racism has health consequences that may be interpreted as biological or clinical (eg, more advanced disease owing to delayed diagnosis and treatment), making it impossible to completely distinguish between social or environmental and biological and genetic phenomena (Table 3, quotes 4.1-4.3).
Theme 5: Inclusion of Race and SDOH Information in Algorithms
Whether race and ethnicity information should be included in algorithms depends on the algorithm’s purpose. When use of race or ethnicity results in bias, it should be replaced with indicators that accurately represent the link between a risk factor and outcome. Region rather than race might explain an association between geography and a genetic risk factor, for example. When algorithms are used to identify risk groups, race modifiers could improve an algorithm’s accuracy, and failure to include race might result in biased care. Alternatively, including race might improve an algorithm’s predictive accuracy without addressing inequities.
Race is not defined in a single or consistent way in the United States, although the definitions used in the US Census survey may come closest to serving as a standard. Black in the United States often means any proportion of Black heritage, however small, and thus is a cultural label rather than an objective entity. Furthermore, available racial and ethnic categories in algorithms rarely reflect the evolving diversity among individuals, such as those with multiple races or ethnicities in their ancestry. Although self-report of race is ideal, clinicians may document a patient’s race as self-reported without input from the patient. Transparency in how race is defined and by whom is needed (Table 3, quotes 5.1-5.5).
Variables for SDOH may be used instead of or in addition to race in algorithms, including zip code, income, educational attainment, housing status, or food security. Concerns about using SDOH information include its inconsistent collection, particularly in acute care settings, and the typically low quality and missingness of SDOH data. Furthermore, SDOH variables can also lead to biases similar to those of race and ethnicity variables by, for example, decreasing access to services among individuals who are socioeconomically disadvantaged. Research is needed to understand and measure specific SDOH, their consequences on health, and the interaction between race and SDOH in health care decisions (Table 3, quotes 5.6-5.8).
Theme 6: Life Cycle of Algorithms
Algorithms can become biased during development, validation, implementation, or optimization. Just as the underlying data behind algorithms reflect an unequal health system, so too may their predictions without necessarily reducing inequities for vulnerable groups. Algorithms could instead be designed to reduce inequities and better address community needs as part of design and implementation.
Algorithms are developed from data sets that often do not include diverse populations. Algorithm data typically originate from electronic health records and claims but could come from laboratory, clinical or biomedical, consumer, and digital data; patient assessments; customer service and vendor records; engagement data; plans and benefit information; or patient registries. Each source could be static, dynamic, or a combination. Proprietary data sets and algorithms hamper transparency and could obscure algorithmic bias.
Underrepresentation of racial and ethnic minority groups in data sets used to develop algorithms is rampant and not easily overcome by analytic approaches such as oversampling and undersampling and weighting. Normal variation in characteristics such as sex, age, and severity of illness may not be adequately captured or well represented for racial and ethnic minority groups owing to both inadequate reporting and small numbers.
Algorithms may be validated and tested against a portion of the original data set or by “back-testing” the algorithm against historical data. Ideally, algorithms are tested with data distinct from the original data set and in all populations in which the algorithm will be used. Recommendations to mitigate bias included comparing algorithm predictions with actual patient outcomes, conducting stratified analyses to confirm performance across and within all relevant demographic groups, using sensitivity analyses to assess the robustness of predictions, and conducting more research to understand how algorithms are used in practice.
Algorithms require maintaining and updating after initial development and implementation. Algorithms developed and updated via automated machine learning processes could eventually become inadvertently biased. Alternatively, algorithmic models could be designed to automatically monitor performance and self-correct, detecting biases and other quality issues as they arise. Continued advances in scholarship and methods can be incorporated into algorithm development to reduce bias and improve inequities (Table 3, quotes 6.1-6.5).
Federal- and system-level initiatives and policies are needed to identify and address potential racial and ethnic biases in algorithms that affect health inequities. Specific solutions include standardizing approaches for variable definitions and data collection; standardizing risk-adjustment models used in health care algorithms; endorsing systematic and rigorous methods for algorithm development, testing, and implementation; and independent monitoring of algorithm implementation and outcomes.
Some organizations have standardized how race and SDOH information is gathered and incorporated into algorithms, particularly for underserved populations. Developers and users could determine whether including race or SDOH information exacerbates or reduces inequities among disadvantaged populations before deciding for its inclusion. Initiatives to measure specific risk factors or biomarkers to include in algorithms instead of race (eg, cystatin C–based calculations for kidney disease) require increased support.
Government could establish national standards for algorithm development, testing, and reporting to create standard frameworks for risk adjustments and to audit algorithms in use. In addition, transparency at all stages of algorithm development and implementation is critical. Improving algorithm vendors’ and developers’ understanding of clinical contexts would enhance communication with users (eg, policy makers, health systems, and clinicians) and, in turn, enhance algorithm design and clinical performance (Table 3, quotes 6.1-6.5).
In this analysis of 42 responses from a mix of clinical, professional, payer, and technology organizations; academics; federal and state agencies; and individuals, respondents recognized the broad use of algorithms in health care and affirmed the importance of addressing potential racial and ethnic biases in algorithms affecting health inequities. Including race and ethnicity or SDOH in algorithms can perpetuate bias and inequities or, instead, be used to identify and address racial and ethnic inequities. The lack of consistency and precision about how to define, measure, and document race and ethnicity and SDOH exacerbates the problem. Potential solutions include using a fairness metric as part of algorithm development and testing algorithm outcomes with implementation. Government and national organizations could call for standardized, rigorous, and transparent approaches for algorithm development and implementation. Education for both clinicians and patients is needed for the deployment of algorithms via shared decision-making.
To our knowledge, this is the first report exploring stakeholder and public awareness of and experience with racial and ethnic bias affecting the use of health care algorithms. A recent, nationally representative survey24 indicated that patients had mostly positive views about the use of artificial intelligence in health care but wanted to know when it was involved in their care. Patients expressed significant concerns, however, about privacy breaches, increased costs, and the potential for artificial intelligence to misdiagnose and to reduce time with clinicians, with racial and ethnic minority individuals expressing greater concern than White people. Other authors25 have discussed algorithms in health care, including those developed via artificial intelligence, noting the potential for significant repercussions and need to identify the large numbers of algorithms currently in use, many of which have not have been assessed for bias. One systematic review,26 for example, found 45 prediction tools used for cardiovascular disease among patients with type 2 diabetes, 12 of which were specifically developed for such patients.
Beyond health care, concerns about algorithmic bias emerged as early as the 1990s. Algorithms used in criminal justice, education, and business were found to be biased against disadvantaged groups, especially women and racial and ethnic minority groups.27 A few studies have explored algorithm users’ awareness of bias and subsequent responses. One study28 of Airbnb hosts found decreased use of the algorithm among users once they were aware of bias, even if they benefitted from the bias. Thus, perception of bias, regardless of accuracy, could lead to differences in algorithm adoption, also a potential source of bias. Similarly, in a study29 of hotel ratings, evidence of algorithmic bias increased user mistrust of the system developing the algorithm. Another study30 described outrage among users on discovering algorithmic bias, yet less anger compared with discovering biases thought to have come from a person (rather than a machine). The authors of this study found that users assume greater impartiality and lack of intent from machine-based algorithms compared with humans. Correspondingly, biased results from an algorithm can be particularly reinforcing of negative stereotypes despite originating from human partiality like other biases.
In health care, users of algorithms include payers, clinical teams, and patients. Health systems may also buy or license algorithms from developers. Questions deserving attention include both how algorithmic bias and awareness of bias affect users and trust between patients and professionals. Efforts to increase awareness need to be coupled with significant efforts to mitigate bias and improve outcomes. Research is needed to detect and mitigate biased algorithms and to educate both clinicians and patients on the benefits and harms of algorithm use. Ideally, algorithms may be used to foster trust among patients, clinicians, and systems with shared goals to improve health. Advancing trust is especially important among racial and ethnic minority groups, already less likely to have confidence in the health care system in light of persistent inequities.
This study has several limitations. Although we highlighted the RFI in a wide variety of sources and received a robust set of responses, perspectives presented here may not be representative of the public or those most affected by racial and ethnic bias and may instead reflect those familiar with algorithms such as large and technologically savvy health systems or professional associations. Similarly, not everyone monitors Federal Register notices or has the resources to respond, which could have limited responses to those already familiar with the Agency for Healthcare Research and Quality and government processes. The limited period for responses may also have curtailed them. Also, responses from individuals (clinicians, patients, or unidentified individuals) were few and generally brief, and we were unable to assess the diversity or representativeness of the respondents.
This was an exploratory analysis of respondents to a targeted RFI; few respondents answered all 11 questions. Although reasons for responding were unknown and the accuracy of the submitted responses cannot be verified, submission of intentionally misleading information is unlikely because responding to the RFI was optional and responses could be linked to respondents in most cases (via email address, etc). The RFI questions were not designed for research purposes and consistent comprehension of questions among respondents was not guaranteed. Instead, the questions were written to best inform and guide the evidence review.
The algorithms identified by respondents, their perspectives on race and racism, thoughts about algorithm development and implementation, and ideas about how to mitigate bias and improve inequities demonstrate a commitment among stakeholders to address bias. Respondents called for guidance and standardization from government and others, a hopeful indicator that stakeholders believe algorithms can be held to a higher standard and harmful biases can be identified and eliminated. Algorithms are useful for combining complex information and multiple variables more quickly and consistently than individuals can, making them valuable or even essential in health care. Depending on design and purpose, algorithms may have the potential to help reduce inequities instead of worsening them.
Accepted for Publication: April 3, 2023.
Published: June 2, 2023. doi:10.1001/jamahealthforum.2023.1197
Open Access: This is an open access article distributed under the terms of the CC-BY License. © 2023 Jain A et al. JAMA Health Forum.
Corresponding Author: Anjali Jain, MD, Evidence-based Practice Center Division, Center for Evidence and Practice Improvement, Agency for Healthcare Research and Quality, 5600 Fishers Ln, Rockville, MD 20857 (anjali.jain@ahrq.hhs.gov).
Author Contributions: Dr Jain had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: Jain, Chang, Umscheid, Bierman.
Acquisition, analysis, or interpretation of data: Jain, Brooks, Alford, Mueller, Bierman.
Drafting of the manuscript: Jain, Brooks, Alford, Chang, Bierman.
Critical revision of the manuscript for important intellectual content: All authors.
Statistical analysis: Brooks.
Administrative, technical, or material support: Jain, Alford, Mueller.
Supervision: Jain, Chang, Umscheid, Bierman.
Conflict of Interest Disclosures: None reported.
Data Sharing Statement: See the Supplement.
Additional Contributions: We thank Patrick O’Malley, MD, MPH (New England Journal of Medicine, Massachusetts Medical Society, Waltham, Massachusetts; National Center for Excellence in Primary Care Research, Center for Evidence and Practice Improvement, Agency for Healthcare Research and Quality, Rockville, Maryland) for editing and providing valuable feedback on the draft manuscript. He did not receive financial compensation for his contributions.
4.Goff
DC
Jr, Lloyd-Jones
DM, Bennett
G,
et al; American College of Cardiology/American Heart Association Task Force on Practice Guidelines. 2013 ACC/AHA guideline on the assessment of cardiovascular risk: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines.
Circulation. 2014;129(25 suppl 2):S49-S73. doi:
10.1161/01.cir.0000437741.48606.98PubMed 5.Diao
JA, Wu
GJ, Taylor
HA,
et al. Clinical implications of removing race from estimates of kidney function.
JAMA. 2021;325(2):184-186.
PubMedGoogle Scholar 13.Peterson
PN, Rumsfeld
JS, Liang
L,
et al; American Heart Association Get With the Guidelines-Heart Failure Program. A validated risk score for in-hospital mortality in patients with heart failure from the American Heart Association get with the guidelines program.
Circ Cardiovasc Qual Outcomes. 2010;3(1):25-32. . doi:
10.1161/CIRCOUTCOMES.109.854877PubMedGoogle ScholarCrossref 16.Levey
AS, Coresh
J, Greene
T,
et al; Chronic Kidney Disease Epidemiology Collaboration. Using standardized serum creatinine values in the modification of diet in renal disease study equation for estimating glomerular filtration rate.
Ann Intern Med. 2006;145(4):247-254. doi:
10.7326/0003-4819-145-4-200608150-00004
PubMedGoogle ScholarCrossref 19.Lydick
E, Cook
K, Turpin
J, Melton
M, Stine
R, Byrnes
C. Development and validation of a simple questionnaire to facilitate identification of women likely to have low bone density.
Am J Manag Care. 2022;4(1):37-48.
PubMed 23.Grobman
WA, Sandoval
G, Rice
MM,
et al; Eunice Kennedy Shriver National Institute of Child Health and Human Development Maternal-Fetal Medicine Units Network. Prediction of vaginal birth after cesarean delivery in term gestations: a calculator without race and ethnicity.
Am J Obstet Gynecol. 2021;225(6):664.e1-664.e7. doi:
10.1016/j.ajog.2021.05.021PubMedGoogle ScholarCrossref 29.Eslami
M, Vaccaro
K, Karahalios
K, Hamilton
K. “Be careful; things can be worse than they appear”: understanding biased algorithms and users' behavior around them in rating platforms. Paper presented at: Eleventh International Association for the Advancement of Artificial Intelligence Conference on Web and Social Media; May 15-18, 2017; Montreal, Canada.