Artificial intelligence (AI) in medical imaging shows promise in improving health care efficiency and outcomes. However, automated AI disease detection is not a new concept. Pressure to adopt such emerging technologies is reminiscent of the rapid rise of adjunct computer-aided detection (CAD) tools in mammography 2 decades ago. The CAD cues in mammography mark image features suspicious for breast cancer, while new deep learning AI algorithms similarly mark suspicious image features and provide scores for cancer risk. Despite the intended outcomes of these tools, their use in clinical practice often generates unexpected results. Without a more robust approach to the evaluation and implementation of AI, given the unabated adoption of emergent technology in clinical practice, we have not learned from our past mistakes in the field of mammography.
In 1998, CAD received US Food & Drug Administration (FDA) clearance as an adjunct tool for mammography. Within a few years, the Centers for Medicare & Medicaid Services approved its reimbursement. Nearly a decade after FDA clearance, a seminal article found that CAD did not improve mammography accuracy after dissemination into routine clinical practice.1 Results of initial CAD reader studies were summarized in simple receiver operating characteristic (ROC) curve figures comparing results with CAD vs control. While early reader studies showed improved accuracy with CAD, after FDA clearance and clinical adoption, results were reversed with lower accuracy with CAD.1
By 2016, more than 92% of US imaging facilities used CAD for mammography interpretation despite further research confirming that CAD did not improve radiologist accuracy over 2 decades of use in clinical practice.2,3 The CAD tools are associated with increased false-positive rates, leading to overdiagnosis of ductal carcinoma in situ and unnecessary diagnostic testing. In 2018, Medicare ceased add-on payments for CAD but not before the widespread embrace of CAD had resulted in more than $400 million per year in unnecessary health care expenditures.2 The premature adoption of CAD is consistent with the embrace of emergent technology before its association with patient outcomes is fully understood. As AI algorithms are increasingly receiving FDA clearance and becoming commercially available with ROC curves similar to what we observed prior to CAD clearance and adoption, how can we prevent history from repeating itself?
First, we must remember that there are complex interactions between a computer algorithm output and the interpreting physician. While much research is being done in the development of the AI algorithms and tools, the extent to which physicians may be influenced by the many types and timings of computer cues when interpreting remains unknown. Automation bias, or the tendency of humans to defer to a presumably more accurate computer algorithm, likely affects physician judgment negatively if presented prior to a physician’s independent assessment. In the case of CAD, 2 to 4 markings were shown to radiologists per screening mammogram–as breast cancer is present in about 5 per 1000 screening mammograms, almost all of these markings are false positives. Yet, radiologists do not want to miss cancer, thus leading to higher rates of additional testing, resulting in false-positive results and benign biopsies.4 Before widespread adoption of new AI tools for medical imaging, we need to evaluate the different user interfaces between AI and human interpreters and better understand how and when AI outputs should be presented. Ideally, we need prospective studies incorporating AI into routine clinical workflow.
Second, reimbursement of AI technologies needs to be incumbent on improved patient outcomes, not just improved technical performance in artificial settings. Currently, FDA clearance requires small reader studies and a demonstration of noninferiority to existing technologies (eg, CAD). Newer AI technologies need to demonstrate disease detection that matters. For example, the use of AI in mammography should correspond with increased detection of invasive breast cancers with poor prognostic markers and decreased interval cancer rates. To demonstrate improved patient outcomes, AI technologies need to be evaluated in large population-based, real-world screening settings with longitudinal data collection and linkage to regional cancer registries. If more benefits than harms are identified, then we need to confirm that these results are consistent across diverse populations and settings to ensure health equity. Given the rapid pace of innovation and the many years often needed to adequately study important outcomes, we suggest coverage with evidence development, whereby payment is contingent on evidence generation and outcomes are reviewed on a periodic basis.
Third, we need to embrace revisions to the FDA clearance process for AI algorithms to encourage continued technological improvement. The benefit of deep learning is the ability of machines to continuously improve their algorithms over time. Unfortunately, current FDA review for AI tools in medical imaging is only provided for static unchanging software tools. The consequence is a loss of incentive for continuously improving deep learning algorithms. The FDA is drafting regulatory frameworks for AI-based software as a medical device that pivots from approving static algorithms to oversight over the total product lifecycle including postmarket evaluation.5 The details are yet to be finalized, but will likely require the development of robust data sharing infrastructure necessary for continuous monitoring. One potential avenue for more vigorous, continuous evaluation of AI algorithms is to create prospectively collected imaging data sets that keep up with other temporal trends in medical imaging and are representative of target populations. For instance, the Breast Cancer Surveillance Consortium collects longitudinal data linked to long-term cancer outcomes data permitting prospective image collection, including from the most up-to-date manufacturers and digital breast tomosynthesis (3 dimensional mammograms) technologies. Then, updated algorithms can be independently validated on these representative, population-based imaging examination cohorts continuously.
Fourth, we need to address the impact of AI on medical-legal risk and responsibility in medical imaging interpretation. A great promise of AI is that sophisticated algorithms could eventually interpret images by themselves and free some of physicians’ time to concentrate on more complex tasks. However, truly independent AI is not currently possible because radiologists continue to be the responsible legal parties for accurate imaging interpretation. For example, the recall rate of mammography screening is heavily influenced by medical-legal risk, with missed breast cancer as one of the leading causes of medical malpractice cases in the US.6 Fear of being sued is likely a major reason that American radiologists call back twice as many women after screening mammograms as other countries, even though the cancer detection rate is the same in American and European populations.7,8 Physicians will remain unwilling to bypass inspecting images unless malpractice concerns are addressed. With the Mammography Quality Standards Act now requiring direct patient disclosure of additional risk information (eg, breast density), the appropriate use of supplemental technologies will likely make missed breast cancer even more of a malpractice lawsuit target. One potential solution is to amend the Mammography Quality Standards Act regarding cancer screening liability to better define standards for who can interpret mammograms, how AI can be used for interpretation, and provide guidance on capping limits to AI-related malpractice payouts. Without better guidance on individual party responsibilities for missed cancer, AI creates a new network of who, or what, is legally liable in complex and prolonged, multiparty malpractice lawsuits. Without national legislation addressing the medical and legal aspects of using AI, adoption will slow and we risk opportunities for predatory legal action.
We stand at the precipice of widespread adoption of AI-directed tools in many areas of medicine beyond mammography, and the harms vs benefits hang in the balance for patients and physicians. We need to learn from our past embrace of emerging computer support tools and make conscientious changes to our approaches in reimbursement, regulatory review, malpractice mitigation, and surveillance after FDA clearance. Inaction now risks repeating past mistakes.
Published: February 25, 2022. doi:10.1001/jamahealthforum.2021.5207
Open Access: This is an open access article distributed under the terms of the CC-BY License. © 2022 Elmore JG et al. JAMA Health Forum.
Corresponding Author: Joann G. Elmore, MD, MPH, David Geffen School of Medicine at UCLA, 1100 Glendon Ave, Ste 900, Los Angeles, CA 90024 (email@example.com).
Conflict of Interest Disclosures: Drs Elmore and Lee reported receiving grants from the National Cancer Institute, National Institutes of Health during the conduct of the study. Dr Elmore reported serving as editor-in-chief for adult primary care topics at UpToDate. Dr Lee reported receiving personal fees from GRAIL, Inc. for service on a data safety monitoring board; textbook royalties from McGraw Hill, Inc., Oxford University Press, and Wolters Kluwer; and personal fees from American College of Radiology for journal editorial board work all outside the submitted work.
Funding/Support: Drs Elmore and Lee are both supported by NIH/NCI grant R37 CA240403. Dr Elmore is also supported by NIH/NCI grant R01 CA 200690.
Role of the Funder/Sponsor: The funding organization had no role in the preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
DF. International variation in screening mammography interpretations in community-based programs. J Natl Cancer Inst
. 2003;95(18):1384-1393. doi:10.1093/jnci/djg048