[Skip to Content]
Access to paid content on this site is currently suspended due to excessive activity being detected from your IP address Please contact the publisher to request reinstatement.
[Skip to Content Landing]
Griffin  J, Treanor  D.  Digital pathology in clinical use: where are we now and what is holding us back?  Histopathology. 2017;70(1):134-145.PubMedGoogle ScholarCrossref
Madabhushi  A, Lee  G.  Image analysis and machine learning in digital pathology: challenges and opportunities.  Med Image Anal. 2016;33:170-175.PubMedGoogle ScholarCrossref
Gulshan  V, Peng  L, Coram  M,  et al.  Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs.  JAMA. 2016;316(22):2402-2410.PubMedGoogle ScholarCrossref
Esteva  A, Kuprel  B, Novoa  RA,  et al.  Dermatologist-level classification of skin cancer with deep neural networks.  Nature. 2017;542(7639):115-118. PubMedGoogle ScholarCrossref
Vestjens  JHMJ, Pepels  MJ, de Boer  M,  et al.  Relevant impact of central pathology review on nodal classification in individual breast cancer patients.  Ann Oncol. 2012;23(10):2561-2566.PubMedGoogle ScholarCrossref
Litjens  G, Sánchez  CI, Timofeeva  N,  et al.  Deep learning as a tool for increased accuracy and efficiency of histopathological diagnosis.  Sci Rep. 2016;6:26286. PubMedGoogle ScholarCrossref
Reed  J, Rosman  M, Verbanac  KM, Mannie  A, Cheng  Z, Tafra  L.  Prognostic implications of isolated tumor cells and micrometastases in sentinel nodes of patients with invasive breast cancer: 10-year analysis of patients enrolled in the Prospective East Carolina University/Anne Arundel Medical Center Sentinel Node Multicenter Study.  J Am Coll Surg. 2009;208(3):333-340. PubMedGoogle ScholarCrossref
Chagpar  A, Middleton  LP, Sahin  AA,  et al.  Clinical outcome of patients with lymph node-negative breast carcinoma who have sentinel lymph node micrometastases detected by immunohistochemistry.  Cancer. 2005;103(8):1581-1586.PubMedGoogle ScholarCrossref
Pendas  S, Dauway  E, Cox  CE,  et al.  Sentinel node biopsy and cytokeratin staining for the accurate staging of 478 breast cancer patients.  Am Surg. 1999;65(6):500-505.PubMedGoogle Scholar
Chakraborty  DP.  Recent developments in imaging system assessment methodology, FROC analysis and the search model.  Nucl Instrum Methods Phys Res A. 2011;648supplement 1:S297-S301. PubMedGoogle ScholarCrossref
Efron  B.  Bootstrap methods: another look at the jackknife.  Ann Stat. 1979;7(1):1-26.Google ScholarCrossref
Gallas  BD, Chan  H-P, D’Orsi  CJ,  et al.  Evaluating imaging and computer-aided detection and diagnosis devices at the FDA.  Acad Radiol. 2012;19(4):463-477.PubMedGoogle ScholarCrossref
Obuchowski  NA, Beiden  SV, Berbaum  KS,  et al.  Multireader, multicase receiver operating characteristic analysis: an empirical comparison of five methods.  Acad Radiol. 2004;11(9):980-995.PubMedGoogle Scholar
Hillis  SL, Obuchowski  NA, Berbaum  KS.  Power estimation for multireader ROC methods.  Acad Radiol. 2011;18(2):129-142. doi:10.1016/j.acra.2010.09.007Google ScholarCrossref
Upton  G, Cook  I.  A Dictionary of Statistics 3e. Oxford, UK: Oxford University Press; 2014.
Mason  SJ, Graham  NE.  Areas beneath the relative operating characteristics (ROC) and relative operating levels (ROL) curves: statistical significance and interpretation.  Q J R Meteorol Soc. 2002;128(584):2145-2166. doi:10.1256/003590002320603584Google ScholarCrossref
GitHub.  DIDSR/iMRMC. https://github.com/DIDSR/iMRMC. Accessed November 14, 2017.
GitHub.  CAMELYON16. https://github.com/computationalpathologygroup/CAMELYON16. Accessed November 14, 2017.
Lowe  DG.  Distinctive image features from scale-invariant keypoints.  Int J Comput Vis. 2004;60(2):91-110. https://people.eecs.berkeley.edu/~malik/cs294/lowe-ijcv04.pdf. Accessed November 13, 2017.Google ScholarCrossref
Ojala  T, Pietikainen  M, Maenpaa  T.  Multiresolution gray-scale and rotation invariant texture classification with local binary patterns.  IEEE Trans Pattern Anal Mach Intell. 2002;24(7):971-987. doi:10.1109/TPAMI.2002.1017623Google ScholarCrossref
Haralick  RM, Shanmugam  K, Dinstein  I.  Textural features for image classification.  IEEE Trans Syst Man Cybern. 1973;SMC-3(6):610-621. http://haralick.org/journals/TexturalFeatures.pdf. Accessed November 13, 2017.Google ScholarCrossref
Cortes  C, Vapnik  V.  Support-vector networks.  Mach Learn. 1995;20(3):273-297. http://image.diku.dk/imagecanon/material/cortes_vapnik95.pdf. Accessed November 13, 2017.Google Scholar
Breiman  L.  Random forests.  Mach Learn. 2001;45(1):5-32. http://www.math.univ-toulouse.fr/~agarivie/Telecom/apprentissage/articles/randomforest2001.pdf. Accessed November 13, 2017.Google ScholarCrossref
Szegedy  C, Wei  L, Yangqing  J,  et al.  Going deeper with convolutions.  Paper presented at: IEEE Conference on Computer Vision and Pattern Recognition; June 7-12, 2015; Boston, MA. http://ieeexplore.ieee.org/document/7298594/. Accessed November 13, 2017.
He  K, Zhang  X, Ren  S, Sun  J.  Deep residual learning for image recognition.  Paper presented at: IEEE Conference on Computer Vision and Pattern Recognition; June 27-30, 2016; Las Vegas, NV. https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf. Accessed November 13, 2017.
Simonyan  K, Zisserman  A.  Very deep convolutional networks for large-scale image recognition. https://arxiv.org/pdf/1409.1556.pdf. Accessed November 13, 2017.
Kendall  A, Badrinarayanan  V, Cipolla  R.  Bayesian segnet: model uncertainty in deep convolutional encoder-decoder architectures for scene understanding. http://mi.eng.cam.ac.uk/~cipolla/publications/inproceedings/2017-BMVC-bayesian-SegNet.pdf. Accessed November 13, 2017.
Krizhevsky  A, Sutskever  I, Hinton  GE.  Imagenet classification with deep convolutional neural networks.  Paper presented at: Advances in Neural Information Processing Systems 25; December 3-8, 2012; Lake Tahoe, NV. https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf. Accessed November 13, 2017.
Ronneberger  O, Fischer  P, Brox  T.  U-net: Convolutional networks for biomedical image segmentation.  Paper presented at: International Conference on Medical Image Computing and Computer-Assisted Intervention; October 5-9, 2015; Munich, Germany. https://pdfs.semanticscholar.org/0704/5f87709d0b7b998794e9fa912c0aba912281.pdf. Accessed November 13, 2017.
Zheng  S, Jayasumana  S, Romera-Paredes  B,  et al.  Conditional random fields as recurrent neural networks.  Paper presented at: IEEE Conference on Computer Vision and Pattern Recognition; June 7-12, 2015; Boston, MA. http://www.robots.ox.ac.uk/~szheng/papers/CRFasRNN.pdf. Accessed November 13, 2017.
Viola  P, Jones  M.  Fast and robust classification using asymmetric adaboost and a detector cascade.  Paper presented at: Advances in Neural Information Processing Systems 15; December 9-14, 2002; Vancouver, British Columbia, Canada. https://pdfs.semanticscholar.org/90f6/e2c454909f819f20d9eb6c731ba709bbe8b6.pdf. Accessed November 13, 2017.
Albarqouni  S, Baur  C, Achilles  F, Belagiannis  V, Demirci  S, Navab  N.  AggNet: deep learning from crowds for mitosis detection in breast cancer histology images.  IEEE Trans Med Imaging. 2016;35(5):1313-1321.PubMedGoogle ScholarCrossref
Dorfman  DD, Berbaum  KS, Metz  CE.  Receiver operating characteristic rating analysis: generalization to the population of readers and patients with the jackknife method.  Invest Radiol. 1992;27(9):723-731.PubMedGoogle ScholarCrossref
Bejnordi  BE, Litjens  G, Timofeeva  N,  et al.  Stain specific standardization of whole-slide histopathological images.  IEEE Trans Med Imaging. 2016;35(2):404-415.PubMedGoogle ScholarCrossref
Original Investigation
December 12, 2017

Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in Women With Breast Cancer

Author Affiliations
  • 1Diagnostic Image Analysis Group, Department of Radiology and Nuclear Medicine, Radboud University Medical Center, Nijmegen, the Netherlands
  • 2Medical Image Analysis Group, Eindhoven University of Technology, Eindhoven, the Netherlands
  • 3Department of Pathology, University Medical Center Utrecht, Utrecht, the Netherlands
  • 4Department of Pathology, Radboud University Medical Center, Nijmegen, the Netherlands
JAMA. 2017;318(22):2199-2210. doi:10.1001/jama.2017.14585
Key Points

Question  What is the discriminative accuracy of deep learning algorithms compared with the diagnoses of pathologists in detecting lymph node metastases in tissue sections of women with breast cancer?

Finding  In cross-sectional analyses that evaluated 32 algorithms submitted as part of a challenge competition, 7 deep learning algorithms showed greater discrimination than a panel of 11 pathologists in a simulated time-constrained diagnostic setting, with an area under the curve of 0.994 (best algorithm) vs 0.884 (best pathologist).

Meaning  These findings suggest the potential utility of deep learning algorithms for pathological diagnosis, but require assessment in a clinical setting.


Importance  Application of deep learning algorithms to whole-slide pathology images can potentially improve diagnostic accuracy and efficiency.

Objective  Assess the performance of automated deep learning algorithms at detecting metastases in hematoxylin and eosin–stained tissue sections of lymph nodes of women with breast cancer and compare it with pathologists’ diagnoses in a diagnostic setting.

Design, Setting, and Participants  Researcher challenge competition (CAMELYON16) to develop automated solutions for detecting lymph node metastases (November 2015-November 2016). A training data set of whole-slide images from 2 centers in the Netherlands with (n = 110) and without (n = 160) nodal metastases verified by immunohistochemical staining were provided to challenge participants to build algorithms. Algorithm performance was evaluated in an independent test set of 129 whole-slide images (49 with and 80 without metastases). The same test set of corresponding glass slides was also evaluated by a panel of 11 pathologists with time constraint (WTC) from the Netherlands to ascertain likelihood of nodal metastases for each slide in a flexible 2-hour session, simulating routine pathology workflow, and by 1 pathologist without time constraint (WOTC).

Exposures  Deep learning algorithms submitted as part of a challenge competition or pathologist interpretation.

Main Outcomes and Measures  The presence of specific metastatic foci and the absence vs presence of lymph node metastasis in a slide or image using receiver operating characteristic curve analysis. The 11 pathologists participating in the simulation exercise rated their diagnostic confidence as definitely normal, probably normal, equivocal, probably tumor, or definitely tumor.

Results  The area under the receiver operating characteristic curve (AUC) for the algorithms ranged from 0.556 to 0.994. The top-performing algorithm achieved a lesion-level, true-positive fraction comparable with that of the pathologist WOTC (72.4% [95% CI, 64.3%-80.4%]) at a mean of 0.0125 false-positives per normal whole-slide image. For the whole-slide image classification task, the best algorithm (AUC, 0.994 [95% CI, 0.983-0.999]) performed significantly better than the pathologists WTC in a diagnostic simulation (mean AUC, 0.810 [range, 0.738-0.884]; P < .001). The top 5 algorithms had a mean AUC that was comparable with the pathologist interpreting the slides in the absence of time constraints (mean AUC, 0.960 [range, 0.923-0.994] for the top 5 algorithms vs 0.966 [95% CI, 0.927-0.998] for the pathologist WOTC).

Conclusions and Relevance  In the setting of a challenge competition, some deep learning algorithms achieved better diagnostic performance than a panel of 11 pathologists participating in a simulation exercise designed to mimic routine pathology workflow; algorithm performance was comparable with an expert pathologist interpreting whole-slide images without time constraints. Whether this approach has clinical utility will require evaluation in a clinical setting.