Deep Convolutional Neural Networks as a Diagnostic Aid—A Step Toward Minimizing Undetected Scaphoid Fractures on Initial Hand Radiographs | Emergency Medicine | JAMA Network Open | JAMA Network
[Skip to Navigation]
Sign In
Views 1,104
Citations 0
Invited Commentary
Health Informatics
May 6, 2021

Deep Convolutional Neural Networks as a Diagnostic Aid—A Step Toward Minimizing Undetected Scaphoid Fractures on Initial Hand Radiographs

Author Affiliations
  • 1Section of Trauma, Surgical Critical Care and Acute Care Surgery, Department of Surgery, Stanford University School of Medicine, Stanford, California
  • 2The Buncke Clinic, San Francisco, California
  • 3Department of Biomedical Data Science, Stanford University, Stanford, California
JAMA Netw Open. 2021;4(5):e216393. doi:10.1001/jamanetworkopen.2021.6393

Scaphoid fractures are the most common carpal bone fracture. If missed and untreated at initial evaluation, they can lead to a progressive pattern of debilitating wrist arthritis that may ultimately require salvage procedures, including wrist fusion. The concern over missing scaphoid fractures that are radiographically occult on initial plain radiographs has been an inspiration for many prior studies.1 However, overtreating scaphoid fractures because of these concerns can lead to costly advanced imaging or unnecessary immobilization in a cast or splint.

It is this challenge that Yoon et al2 addressed in their study. Advances in machine learning and computer vision have allowed for an increasing number of investigations into the potential of computer vision to augment human interpretation of radiographs. The work by Yoon et al2 adds to this growing body of knowledge. They trained a deep convolutional neural network model that can successfully identify scaphoid fractures in plain radiographs. To highlight the potential future clinical impact, they chose to focus their algorithm not only on radiographically apparent scaphoid fractures but specifically on detecting radiographically occult scaphoid fractures, ie, those missed by human interpretation.

For radiographs run through their entire algorithm pipeline, 20 of 22 radiographically occult fractures were detected. These results indicate the potential of computer vision algorithms to eventually become a clinically meaningful tool for assessing possible scaphoid fractures in initial radiographs. In a fully realized form, they could facilitate prompt identification of these fractures, expeditious treatment, and decreased reliance on costly advanced imaging.

By looking at the details of the study, we can see where additional algorithm development can make contributions toward furthering this goal. The underlying rate of occult scaphoid fractures was 3.3% (161 of 4917 total fracture radiographs), with occult fracture radiographs representing 1.4% (161 of 11 838 total radiographs) of all radiographs included in the study. Given the relatively low frequency of occult fractures, their data set was necessarily imbalanced. This is a common challenge for many clinical scenarios in which we hope to use computer vision techniques to augment human diagnostic performance. When there is an imbalanced data set with few positives and the few positives are the class of interest (as is the case with occult scaphoid fractures), then there are particular performance metrics that should be examined.

In general, when evaluating computer vision classification algorithms, it is important to consider the sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). Understanding model performance for each of these big 4 evaluation metrics provides meaningful insight into applicability. When there is an imbalanced data set, it is particularly important to look at sensitivity and PPV, as they give the most information on model performance in those cases. Yoon et al2 report on their model performance metrics in Table 1. Notably, the overall pipeline had a 97.2% sensitivity for detecting scaphoid fractures and a 91% sensitivity (20 of 22) for detecting scaphoid fractures that were radiographically occult to human evaluators on initial films. In this way, the overall pipeline achieved what it was trained to do—identify scaphoid fractures, including those missed by human interpretation.

This impressive aspect of algorithm performance is balanced with a PPV of 65.7%, which demonstrates a high rate of false-positive results. Looked at another way, the pipeline falsely detected a scaphoid fracture in 469 of 1379 normal radiographs. On balance, the high sensitivity coupled with the relatively low PPV suggests that the entire pipeline in its current state could eventually act as a good screening tool. It is good at detecting occult fractures from initial radiographs that would otherwise be missed but with the trade-off in performance of a high rate of false-positives.

What are the implications of this trade-off? Were an algorithm like this to be deployed in clinical practice, a human observer may not be able to tell the difference between an occult fracture correctly identified by the algorithm and a false-positive; in theory, both would be imperceptible to their eye. Pragmatically they would then need to immobilize and/or order magnetic resonance imaging for these patients. This may be neither cost-effective nor clinically efficient, which creates the problem that prompted this study.

It should also be noted that the pipeline missed fractures that were apparent to humans at a rate of 2.7% (24 of 904). Thus, we can say that there were 3 related sets of true fractures in this study: those apparent only to humans, those apparent only to the algorithm, and those apparent to both. To obtain the most possible benefit from a computer vision algorithm that detects scaphoid fractures, it is likely not a matter of algorithm performance vs human performance but rather how to optimize synergistic human plus algorithm performance.3 A successful implementation could play to the strengths of both, leading to a higher number of true fractures being detected.4

Yoon et al2 achieved promising results in their application of neural networks to scaphoid fracture detection. They trained a deep convolutional neural network that detected scaphoid fractures with 97.2% sensitivity and 97.2% NPV. This study set a great foundation for further work in this area, research that could improve on the specificity of 66.0% and PPV of 65.7%.

Future research that seeks to achieve high overall performance in scaphoid fracture detection with a high sensitivity and specificity will likely require a larger volume of occult fractures in the data set, both for model development and meaningful evaluation. It is also important to better understand the role of interrater reliability and/or concordance between the human raters in the problem of occult scaphoid fractures. Different humans themselves have different performance profiles.5 Future work could include other advanced computational methods, including radiomics, which focuses on converting images into minable data to be used in clinical decision support.6 Ultimately the authors should be commended on having made inroads on an important clinical problem. They have demonstrated that computer vision techniques hold the potential to augment human diagnostic capabilities in the detection of scaphoid fractures.

Back to top
Article Information

Published: May 6, 2021. doi:10.1001/jamanetworkopen.2021.6393

Open Access: This is an open access article distributed under the terms of the CC-BY License. © 2021 Jopling JK et al. JAMA Network Open.

Corresponding Author: Jeffrey K. Jopling, MD, MSHS, Section of Trauma, Surgical Critical Care and Acute Care Surgery, Department of Surgery, Stanford University School of Medicine, 300 Pasteur Dr, H3691, Stanford, CA 94305 (jjopling@stanford.edu).

Conflict of Interest Disclosures: None reported.

References
1.
Suh  N, Grewal  R.  Controversies and best practices for acute scaphoid fracture management.   J Hand Surg Eur Vol. 2018;43(1):4-12. doi:10.1177/1753193417735973PubMedGoogle ScholarCrossref
2.
Yoon  AP, Lee  YL, Kane  RL, Kuo  CF, Lin  C, Chung  KC.  Development and validation of a deep learning model using convolutional neural networks to identify scaphoid fractures in radiographs.   JAMA Netw Open. 2021;4(5):e216096. doi:10.1001/jamanetworkopen.2021.6096Google Scholar
3.
Goldstein  IM, Lawrence  J, Miner  AS.  Human-machine collaboration in cancer and beyond: the Centaur Care model.   JAMA Oncol. 2017;3(10):1303-1304. doi:10.1001/jamaoncol.2016.6413PubMedGoogle ScholarCrossref
4.
Lindsey  R, Daluiski  A, Chopra  S,  et al.  Deep neural network improves fracture detection by clinicians.   Proc Natl Acad Sci U S A. 2018;115(45):11591-11596. doi:10.1073/pnas.1806905115PubMedGoogle ScholarCrossref
5.
Ozkaya  E, Topal  FE, Bulut  T, Gursoy  M, Ozuysal  M, Karakaya  Z.  Evaluation of an artificial intelligence system for diagnosing scaphoid fracture on direct radiography.   Eur J Trauma Emerg Surg. 2020. doi:10.1007/s00068-020-01468-0PubMedGoogle Scholar
6.
Gillies  RJ, Kinahan  PE, Hricak  H.  Radiomics: images are more than pictures, they are data.   Radiology. 2016;278(2):563-577. doi:10.1148/radiol.2015151169PubMedGoogle ScholarCrossref
Limit 200 characters
Limit 25 characters
Conflicts of Interest Disclosure

Identify all potential conflicts of interest that might be relevant to your comment.

Conflicts of interest comprise financial interests, activities, and relationships within the past 3 years including but not limited to employment, affiliation, grants or funding, consultancies, honoraria or payment, speaker's bureaus, stock ownership or options, expert testimony, royalties, donation of medical equipment, or patents planned, pending, or issued.

Err on the side of full disclosure.

If you have no conflicts of interest, check "No potential conflicts of interest" in the box below. The information will be posted with your response.

Not all submitted comments are published. Please see our commenting policy for details.

Limit 140 characters
Limit 3600 characters or approximately 600 words
    ×