The convolutional neural network (CNN) determines the position of the pupil together with the classified phase. The appropriate guidance tool is activated according to the surgical phase being performed. OR indicates operating room.
A, Surgical guidance in capsulorhexis. The convolutional neural network (CNN) is used to track the location of the pupil in real time (outer yellow circle), and computer vision is used to provide a template for the capsulorhexis (inner yellow circle [black arrowhead]), the diameter of which can be adjusted by the user (instructions on screen). Histogram equalization is activated within the pupil area to improve visualization of the capsulorhexis and lens capsule. B, As nuclear disassembly and lens removal are performed, a safe zone (inner thin yellow circle) is defined for the surgeon to facilitate a target zone for activation of the phacoemulsification instrument. The color of the outer circle, defining the pupillary margin, varies according to the relative area of the exposed lens capsular bag (green = no capsular bag exposed; yellow = partial capsular bag exposed [situation shown in the image]; red = all capsular bag exposed), and a thicker outer circle is displayed when turbulence is identified. The user can adjust the sensitivity of motion estimation (instructions on screen). C, Histogram equalization enhances visualization and identification of anatomical structures in cases with limited visualization (eg, during creation of the capsulorhexis and removal of cortical lens fibers). D, During idle, surgical guidance and enhanced visualization tools are disabled.
A template to aid the surgeon in executing a symmetric, centered, and circular rhexis is activated when capsulorhexis is identified as the current surgical phase.
With the use of optical flow, the acceleration of instruments, lens fragments (video 2), and cortical fibers (video 3) is estimated. The digital tool is automatically activated when phacoemulsification or cortex removal surgical phases are identified.
The idle phase is triggered as soon as the surgical tools are removed from the pupil area.
A warning sound is played when the centroid of the pupil is moved beyond the predefined safe zone area.
eMethods 1. Data Extraction and Annotation Methods for the Training of the Network
eMethods 2. Data Extraction and Evaluation Methods
eMethods 3. Faster R–CNN Implementation
eMethods 4. Parameters for Computer Vision Tools
eFigure 1. Pupil Tracking and Phase Classification With Faster R–CNNs
eFigure 2. Receiver Operating Characteristic (ROC) and Precision-recall (PR) Curves for Each Surgical Phase Classification
eFigure 3. Timeline With the Predictions for the Local Test Data Set
eFigure 4. Timeline With the Predictions for the External Test Data Set
eFigure 5. Tool for Annotation of the Pupil Area
eFigure 6. Automated Tissue Segmentation During Cataract Surgery
eFigure 7. Tool for the Annotation of the Amount of Capsular Bag Exposed
eFigure 8. Surgical Instrument–and Tissue–Tracking
eFigure 9. Motion estimation of Tools and Lens Fragments
eFigure 10. Acceleration of Tools and Lens Fragments
eFigure 11. Performance of Each Set of Tools During Visualization
eFigure 12. Contrast Limited Adaptive Histogram Equalization (CLAHE) During the Cataract Procedure
eTable 1. Intergrader Agreement for Phase Classification
eTable 2. Metrics for Network–Based Pupil Area Segmentation
eTable 3. Agreement Between Annotators Regarding Pupil Area for Both Data Sets
eTable 4. Comparison Between Algorithm Output and Annotator’s Data for Capsular Bag Exposed
eTable 5. Agreement Between Annotators Regarding the Amount of Capsular Bag Exposed in Randomly Selected Frames for Both Data Sets
Customize your JAMA Network experience by selecting one or more topics from the list below.
Garcia Nespolo R, Yi D, Cole E, Valikodath N, Luciano C, Leiderman YI. Evaluation of Artificial Intelligence–Based Intraoperative Guidance Tools for Phacoemulsification Cataract Surgery. JAMA Ophthalmol. 2022;140(2):170–177. doi:10.1001/jamaophthalmol.2021.5742
Can real-time surgical guidance for phacoemulsification cataract surgery be achieved using a deep learning detection network combined with computer vision tools?
In this cross-sectional study, a region-based convolutional neural network was able to track the pupil and to identify the current surgical phase being performed with a mean area under the receiver operating characteristic curve greater than 95%, triggering surgical guidance tools developed with computer vision.
These findings suggest that a platform that can be integrated into surgical ophthalmic microscopes can provide real-time audiovisual feedback to the surgeon during cataract surgery, combining deep learning neural networks with custom surgical guidance tools built with computer vision techniques.
Complications that arise from phacoemulsification procedures can lead to worse visual outcomes. Real-time image processing with artificial intelligence tools can extract data to deliver surgical guidance, potentially enhancing the surgical environment.
To evaluate the ability of a deep neural network to track the pupil, identify the surgical phase, and activate specific computer vision tools to aid the surgeon during phacoemulsification cataract surgery by providing visual feedback in real time.
Design, Setting, and Participants
This cross-sectional study evaluated deidentified surgical videos of phacoemulsification cataract operations performed by faculty and trainee surgeons in a university-based ophthalmology department between July 1, 2020, and January 1, 2021, in a population-based cohort of patients.
A region-based convolutional neural network was used to receive frames from the video source and, in real time, locate the pupil and in parallel identify the surgical phase being performed. Computer vision–based algorithms were applied according to the phase identified, providing visual feedback to the surgeon.
Main Outcomes and Measures
Outcomes were area under the receiver operator characteristic curve and area under the precision-recall curve for surgical phase classification and Dice score (harmonic mean of the precision and recall [sensitivity]) for detection of the pupil boundary. Network performance was assessed as video output in frames per second. A usability survey was administered to volunteer cataract surgeons previously unfamiliar with the platform.
The region-based convolutional neural network model achieved area under the receiver operating characteristic curve values of 0.996 for capsulorhexis, 0.972 for phacoemulsification, 0.997 for cortex removal, and 0.880 for idle phase recognition. The final algorithm reached a Dice score of 90.23% for pupil segmentation and a mean (SD) processing speed of 97 (34) frames per second. Among the 11 cataract surgeons surveyed, 8 (72%) were mostly or extremely likely to use the current platform during surgery for complex cataract.
Conclusions and Relevance
A computer vision approach using deep neural networks was able to pupil track, identify the surgical phase being executed, and activate surgical guidance tools. These results suggest that an artificial intelligence–based surgical guidance platform has the potential to enhance the surgeon experience in phacoemulsification cataract surgery. This proof-of-concept investigation suggests that a pipeline from a surgical microscope could be integrated with neural networks and computer vision tools to provide surgical guidance in real time.
Despite advances in instrumentation and surgical technique in phacoemulsification, surgical complications are associated with worse visual outcomes.1 During phacoemulsification, multiple variables affect the performance and safety of the procedure, including rapid changes in intracameral flow and associated fluidic turbulence, instrument positioning and proximity to tissues such as the posterior lens capsule, and visualization of intraocular tissues. Intraoperative surgical guidance—the provision of information or feedback to the surgeon for the purpose of facilitating effective and safe surgery—must be applied in real time to be of benefit during the surgical procedure.
Computer vision algorithms and deep neural networks (DNNs) for instrument detection and surgical-phase classification have been applied post hoc for retrospective analysis of cataract surgical procedures. Previous work has demonstrated the ability to segment the pupil and surgical instruments2 and acquire features to analyze trainees’ surgical performance.3 Surgical phase identification has been evaluated in prerecorded videos, with different convolutional neural networks (CNNs) achieving up to 95% accuracy.4-7 An important subfield of computer vision is detection, which involves the localization and classification of objects within an image. Detection is especially important in the fields of autonomous vehicles and security. One family of DNNs that has achieved high success in this task is the region-based CNN (Faster R-CNN).8-10
In this work, we developed a surgical guidance platform that can be integrated with existing surgical microscopes that uses tracking and segmentation of the pupil together with automated surgical-phase identification. These functions serve as the foundation that enables the system to formulate functional assessments of phacoemulsification surgery in real time and provide audiovisual feedback to the surgeon. After pupil tracking and surgical-phase recognition using a DNN, surgical guidance tools developed using computer vision process the frame and return visual feedback to the camera stream. The following tools were developed for surgical guidance according to the cataract surgical phase being performed: (1) capsulorhexis guidance for improved symmetry and intended size of the rhexis; (2) feedback on decentration of the eye, erratic tool movement, and turbulent flow conditions (turbulent flow is assessed by lens fragment motion patterns during phacoemulsification and removal of cortical fibers); and (3) enhanced visualization of anatomical structures via contrast equalization, including visualization of the rhexis, remaining lens fragments, and cortical fibers.
This cross-sectional study was approved by the University of Illinois at Chicago Institutional Review Board, which determined that informed consent was not required because deidentified videos were captured for this purpose. The participants received an electronic behavioral consent form for the evaluation survey. This study followed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guideline.
Ten stereoscopic videos of phacoemulsification cataract surgery, performed by attending physicians and surgical trainees at the University of Illinois Hospital and Health Sciences Center, were captured using a stereoscopic surgical microscope (NGENUITY 3D Visualization System; Alcon). Six surgical procedures were selected at random (600 frames extracted) and used for training of our DNN, with 2 eyes (n = 200 frames) for validation and another 2 eyes (n = 23 640 frames) as a holdout subset dedicated to the final evaluation. Surgical cases exhibited variability in multiple image parameters, including surgical lighting settings and ocular characteristics, such as iris color, pupil size, and lens color and density. A total of 101 phacoemulsification cataract procedures (comprising 10 100 frames) from the publicly available Cataract-101 data set11 were used for comparative evaluation and to assess the generalizability of the platform. Details of how surgical phases and pupil area were annotated are available in eMethods 1 and 2 in the Supplement.
A Faster R-CNN built on ResNet-5012 (Microsoft Inc) was trained with the local data set for the classification of phases and pupil detection. Details of network implementation are available in eFigure 1 and eMethods 3 in Supplement.
The data pipeline is depicted in Figure 1. Each frame of the surgical video was acquired by our surgical guidance platform, with each fifth frame being sent to a Faster R-CNN model for pupil detection and surgical-phase classification. The model output served to activate the phase-appropriate surgical guidance tool for capsulorhexis, phacoemulsification, and cortex removal. Assessment of our network was performed analogously to implementation because frame acquisition can be performed from a video file for testing purposes in the same way that the video feed of a surgical microscope in real time can be used to achieve surgical guidance. After object identification and phase classification with activation of the appropriate image-guidance tools, postprocessed information was overlaid onto the original stereo frame and displayed for surgical visualization.
A k-means clustering technique13 was used to segment lens fragments and to estimate the amount of capsular bag exposed, grouping pixels by their proximity in 2 defined characteristics: hue and saturation features of instruments and lens fragments. Optical flow14 tracking was used to estimate the relative acceleration and speed of surgical instruments and lens fragments, to provide feedback in the event of potentially harmful instrument movement, and for turbulent flow conditions during phacoemulsification and cortical fiber removal. The user was able to adjust the sensitivity of the motion estimation feedback. A circle describing the boundaries of the pupil detected by the Faster R-CNN model was displayed with the aid of drawing functions of the OpenCV Computer Vision library,15,16 and an audio alert was activated when the pupil was decentered from the microscope frame, warning the surgeon of an abrupt eye movement. The color of the outer circle bounding the pupil area was altered to indicate the relative area of the capsular bag exposed during lens fragment removal via tissue segmentation. Contrast-limited adaptive histogram equalization17 was applied frame by frame within and around the limbus to enhance the visualization of features during different phases of cataract extraction. Using limitations in the contrast values, the frame distribution of pixels was equalized through the entire color spectrum. On the basis of segmentation of the pupil, a capsulorhexis guidance template was created showing the ideal path of the rhexis. The diameter may be customized by the surgeon before execution of the capsulorhexis or during the procedure. Details of the implementation are available in eMethods 4 in the Supplement.
To evaluate the DNN performance for phase classification, we calculated the area under the receiver operating characteristic curve (AUROC) and area under the precision-recall curve (AUPR) for the local and external test sets. The pupil segmentation metrics used were intersection over union, precision, recall (sensitivity), Dice score (harmonic mean of the precision and recall [sensitivity]) of the segmented area vs the annotator’s area of the pupil, and the percentage of capsular bag exposed compared with the expert’s opinion. The processing speed of the computer vision tools was calculated based on the frames per second (FPS) achieved during network runtime. A detailed description of the evaluation methods is given in eMethods 2 in the Supplement.
After a brief description of the surgical guidance platform, phacoemulsification cataract surgeons watched a compilation of different surgical phases of the 2 videos from the test data set via a 3-dimensional video monitor at the Illinois Eye and Ear Infirmary Cless Laboratory. Contributors were then asked to complete a nonvalidated postassessment survey because, to our knowledge, a structured evaluation tool for assessing surgical guidance tools in ophthalmic microsurgery is not widely available.
The Faster R-CNN model achieved AUROC values of 0.996 for capsulorhexis, 0.972 for phacoemulsification, 0.997 for cortex removal, and 0.880 for idle phase recognition and exhibited a mean (SD) performance decrease in the AUROC of 6.8% (4.53%) when applied to the external data set (Table 1; eFigure 2 in the Supplement). The final predictions timeline revealed a significant performance decrease during the idle phase compared with the classification of other phases (eFigures 3 and 4 in the Supplement). Comparison of the real size of the pupil with the pupil area detected by the algorithm yielded mean (SD) Dice scores of 90.2% (0.11%) in the local data set and 85.4% (0.61%) in the external data set. eTables 1, 2, and 3 in the Supplement give the data in detail, including interrater agreement of annotators for phase classification and pupil segmentation. eFigure 5 in the Supplement shows how the annotators segmented the ground-truth pupil area.
The segmentation of tissues that used k-means clustering improved discrimination among lens fragments and the capsular bag even in low-light and low-contrast situations (eFigure 6 in the Supplement). The estimation of the amount of capsular bag exposed was in near or complete agreement with annotated frames in 81.6% of frames. eTables 4 and 5 in the Supplement give detailed data and indicate agreement of the annotators for the relative proportion of capsular bag exposed. eFigure 7 in the Supplement shows the script used to extract the expert’s point of view on the proportion of capsular bag exposed. Optical flow estimated the movement of instruments and lens fragments during phacoemulsification and cortex removal by generating a set of tracked points for each element (eFigures 8, 9, and 10 in the Supplement).
Figure 2 shows an example of the surgical guidance image format. The surgical guidance platform displayed a circular rhexis template with a customizable diameter throughout the execution of the capsulorhexis (Figure 2A and Video 1). Optical flow functionality calculated the acceleration of instruments and movement of lens fragments, increasing the outer circle thickness when acceleration values reached a threshold set by the surgeon (Figure 2B and Video 2). The platform applied the same optical flow tracking tools when the DNN model identified the removal of cortex fibers (Figure 2C and Video 3). Assignment of the idle phase deactivated all surgical guidance tools when no instrument was detected in the pupil (Figure 2D and Video 4). The audio alert for pupil decentration was activated when the centroid of the pupil moved beyond predefined limits (Video 5).
Our surgical guidance platform achieved a mean (SD) processing speed of 137 (26) FPS during the execution of capsulorhexis guidance, 80 (20) FPS during activation of the tools for lens material removal during phacoemulsification, and 92 (18) FPS for the cortex removal set of tools (eFigure 11 in the Supplement).
Eleven phacoemulsification cataract surgeons (4 resident physicians, 3 clinical fellows, and 4 attending physician cataract instructors) performed a post hoc evaluation of the surgical guidance platform. Five of the experts had 5 or more years of experience with phacoemulsification cataract surgery. The survey (Table 2) showed that 8 respondents (72%) were mostly or extremely likely to use the guidance tool for complex cataract procedures, whereas 5 (45%) considered it useful for noncomplex cataract procedures. A total of 10 participants (91%) considered the pupil tracking and phase classification tools mostly or extremely accurate. All participants judged the platform beneficial for real-time surgical guidance in phacoemulsification procedures.
This cross-sectional study presents a conceptual platform that provides real-time surgical guidance for cataract surgery. The key findings were as follows: (1) Faster R-CNN was able to perform precise pupil tracking and segmentation as well as surgical-phase identification in real time during phacoemulsification cataract procedures; (2) computer vision tools were able to take advantage of the information retrieved by neural networks with the potential to provide surgical guidance to try to improve rhexis symmetry, provide feedback for harmful turbulence and instrument movements, and improve tissue visualization; and (3) a surgical guidance platform was able to use video output from existing commercially available surgical systems. This surgical guidance platform combined surgical-phase identification and element tracking within the same neural network in real time, also providing a framework for the development of additional guidance tools. We have demonstrated the potential utility of various surgical guidance parameters via feedback to the surgeon.
Yu et al7 published the most comprehensive work on surgical-phase classification in cataract surgery using a broad set of deep learning approaches, also resulting in precision and recall (sensitivity) values greater than 90%, using more surgical phases from a data set of 100 procedures. Others4,6,18 have explored surgical-phase recognition with neural networks but without taking into consideration real-time inference. Existing solutions to track the pupil with computer vision tools have limitations; predefined color patterns are used to identify the pupil area, limiting the effectiveness of these solutions during surgery.19 Our results appeared comparable to those attained by Yu et al7 for phase classification on local and external (Cataract-10111) data sets, while introducing an alternate method to perform phase classification in combination with pupil tracking in a single inference and in real time, enabling the immediate feedback within our platform requisite for surgical guidance. These results were achieved by using a Fast R-CNN model, which tracks the pupil and performs classification of the current surgical phase by identifying the instruments located within the pupil area (eFigure 1 in the Supplement).
We performed a usability survey to assess the potential utility of our guidance tools and the usability of the graphic user interface by practicing cataract surgeons. The survey suggested that surgeons would be willing to use the platform during routine and complex cataract procedures. The final experience was not negatively affected by frames misclassified or omitted by the DNN model. All participants rated contrast enhancement using contrast-limited adaptive histogram equalization as a valuable tool during cataract surgery; the substantial improvement in the distinction between instruments and unique anatomical eye elements, without compromising the image with noise or deceptive elements during capsulorhexis, phacoemulsification, and cortex removal (eFigure 12 in the Supplement), likely accounted for the uniform acceptance and endorsement by experts, suggesting that further advancements in enhancing the surgeon’s view of critical anatomical structures and surgical tactics will be an area for additional technology development. All participants considered the rhexis template helpful, with 55% labeling the tool as extremely helpful (Table 2). Considering feedback from the subset of ophthalmic attending physicians, all participants considered pupil tracking, phase identification, rhexis guidance, and contrast enhancement mostly or extremely accurate and helpful.
This study has several limitations. One potential limitation of a supervised learning approach is the need in some instances for a large and labor-intensive library of annotated images for training: our training data set was composed of a relatively small number of frames (600), whereas the final test was performed frame by frame (n = 23 640) on a heterogeneous set of phacoemulsification cases and for comparison on a publicly available data set of 101 cataract cases (10 100 frames).11 The decrease in AUROC and AUPR values for phase classification during idle periods resulted in a guidance tool being activated when no instrument was present within the pupil, but these false-positive predictions may not be clinically relevant in preventing surgical complications or affecting the surgical environment because surgical tactics are not being actively performed when instruments are not present within the eye. In addition, there was a modest reduction in network performance when evaluating our model with an external data set, where AUROC and AUPR values decreased by a mean of 6.8% (Table 1), and the Dice score for pupil segmentation decreased by 4.8%. Of note, eFigures 3 and 4 in the Supplement show that misclassified frames could be removed via data smoothing and outlier detection.20 Additional variables that may affect network performance include heterogeneity in image features and patient demographic characteristics in test data not well represented in training data and differences in training and test image features inherent to differential acquisition methods, such as optical and digital image capture modalities. Study design may benefit from the inclusion of a diverse array of images to improve the heterogeneity of data and minimize potential sources of bias related to patient demographic characteristics, surgeon handedness, and type of surgical instrumentation. Diverse representation of patient-centric features and surgical parameters should be a primary goal of all potential artificial intelligence–based developments that involve image features and other forms of clinical data. In the ideal case, a phase classification platform would be extended to identify all possible cataract surgical phases and tactics, enabling the development of additional decision-making surgical guidance tools specific to a wider array of surgical steps and complications. Our assessment survey has not been validated; to our knowledge, no system to evaluate the usability of ophthalmic microsurgical guidance in real time has been formulated and standardized. As additional surgical guidance tools become available, it will be important to develop validated systematic usability metrics to assess these tools.
These results were achieved with a relatively small number of cases and frames used for training our network, as well as the fast-paced learning processes of Faster R-CNN models for phase classification, suggesting that this approach could be successfully applied to a broad array of surgical steps or techniques. A deeper understanding of turbulent conditions during cataract surgery and stronger predictive tools are needed to minimize the risk of complications. Advanced quantitative models using the data extracted from measures of turbulent movements of lens fragments may be of value. In addition, integration of guidance system output with the surgical instrumentation platform directly via control systems to modulate surgical fluidics parameters may have the potential to enhance surgical efficiency and safety (eg, by modulating aspiration of lens fragments during conditions of high turbulence or with progressive exposure of the lens capsule or by halting aspiration of the lens capsule before posterior capsular rupture). Analogous strategies for collision avoidance using computer vision have been implemented in other image-based fields, such as autonomous driving vehicles.21-23 Future work might strive to develop further the user experience and graphic interface. In addition, feedback might be provided via means other than visual information, such as haptics-mediated force and vibrotactile feedback.
The platform described in this cross-sectional study may contribute to the foundations for a surgical guidance application for phacoemulsification cataract surgery using DNNs. The approach used in this study demonstrated the feasibility of integration between surgical microscopes and artificial intelligence–based platforms to provide surgical guidance in real time. Feedback to the surgeon on the turbulent flow of lens fragments and/or brusque movements of microsurgical instruments may potentially enhance the surgeon’s experience during cataract surgery, and adaptative contrast enhancement improved visualization of the capsulorhexis, lens fragments, and cortical lens fibers. Additional studies are warranted to assess the feasibility of implementation into this current surgical paradigm.
Accepted for Publication: November 6, 2021.
Published Online: January 13, 2022. doi:10.1001/jamaophthalmol.2021.5742
Corresponding Author: Yannek Leiderman, MD, PhD, Department of Ophthalmology and Visual Sciences, University of Illinois at Chicago, 1855 W Taylor St, Mail Code 648, Chicago, IL 60612 (email@example.com).
Author Contributions: Mr Garcia Nespolo had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: Garcia Nespolo, Luciano, Leiderman.
Acquisition, analysis, or interpretation of data: Garcia Nespolo, Yi, Cole, Valikodath, Leiderman.
Drafting of the manuscript: Garcia Nespolo.
Critical revision of the manuscript for important intellectual content: All authors.
Statistical analysis: Garcia Nespolo, Yi.
Obtained funding: Leiderman.
Administrative, technical, or material support: Garcia Nespolo, Cole, Valikodath, Leiderman.
Supervision: Garcia Nespolo, Yi, Luciano, Leiderman.
Conflict of Interest Disclosures: Mr Garcia Nespolo reported receiving grants from Research to Prevent Blindness during the conduct of the study, having an equity stake in Microsurgical Guidance Solutions LLC outside the submitted work, having a patent for USSN: 63/183424 pending, and receiving support from the Louis and Dolores Jedd Research Award from the Department of Ophthalmology and Visual Sciences, University of Chicago at Illinois. Dr Luciano reported having a patent for WO2020163845. Dr Leiderman reported receiving grants from Research to Prevent Blindness during the conduct of the study, receiving personal fees and nonfinancial support from Alcon, having an equity stake in Microsurgical Guidance Solutions outside the submitted work, and having a patent for WO2020163845 pending and a patent for USSN: 63/183424 pending. No other disclosures were reported.