Vicinanzo MG, McGwin G, Allamneni C, Long JA. Interreader Variability of Computed Tomography for Orbital Floor Fracture. JAMA Ophthalmol. 2015;133(12):1393-1397. doi:10.1001/jamaophthalmol.2015.3501
The timing and indications for repair of orbital floor fractures have been controversial. Current practice dictates that fractures involving more than 50% of the orbital floor should be repaired. Early management is initiated in such situations to prevent long-term sequelae of enophthalmos and diplopia. Because fracture size as measured by computed tomography (CT) is one of the criteria to determine the need for surgical repair, there is a need to know the reliability of this parameter.
To assess the variability of CT measurements of orbital floor fractures.
Design, Setting, and Participants
This study took place between January 1, 2005, and June 1, 2007, at an urban academic medical center. Patients with isolated orbital floor fractures were evaluated by 1 oculoplastic surgeon, and their orbital CT images were subsequently read by 3 neuroradiologists blinded to demographic information and the other readers’ measurements. Separately, each was asked to determine the maximal anterior to posterior length and transverse width if a floor fracture existed.
Main Outcomes and Measures
Intraclass correlation coefficients were calculated for length and width using a 2-way mixed-effects model to evaluate the agreement between radiologists.
Twenty-three patients met criteria for inclusion in this study (isolated orbital fracture thought to be in need of repair, with diplopia within 30° of primary gaze, and/or enophthalmos >2 mm, and/or 50% of the floor area involved in the fracture). The mean (SD) age of the patients was 31.5 (17.6) years (range, 8-73 years). The magnitude of agreement between the readers as measured by the intraclass correlation coefficient was 0.66 (95% CI, 0.46-0.82) for anterior to posterior length and 0.44 (95% CI, 0.22-0.69) for transverse width, indicating only a moderate degree of concordance.
Conclusions and Relevance
Although the literature has long held that a floor fracture seen radiographically to involve 50% of the orbital floor has a high likelihood of enophthalmia and should be repaired, this study shows how variable CT measurements of orbital floor fractures can be in a clinical setting, even in trained hands. We question the dependence on such a criterion and reemphasize the importance of making surgical decisions based on clinical findings rather than radiological interpretations.
Since the term blow-out fracture was popularized in the seminal work by Smith and Regan1 in 1957, there has been an ongoing discussion on when and how to treat it. Early intervention was the standard at the time,2 but Putterman et al3 put an opposing approach forth 2 decades later, stating that observation and later intervention were preferable. They noted that surgery was not without its risks, occasionally resulting in blindness, implant infection, extrusion, and worsening diplopia. Moreover, Putterman and colleagues noted that radiographic imaging had its limitations, pointing out that some presumed orbital floor fractures seen on radiographic evaluations were in fact misinterpreted.
A decade later, Hawes and Dortzbach4 put together a set of criteria needed for repair that still remains accepted for most blow-out fractures, including the need for surgical intervention when large fractures greater than half the orbital floor are observed. Yet, a debate continues as to whether the actual fracture size as seen on computed tomography (CT) can be repeatedly related to long-term enophthalmia and other poor outcomes.5- 7 Orbital expansion, loss of ligament support, fat atrophy, scar contracture, and displacement and change in the shape of the orbital soft tissue may serve as concurrent or alternate explanations.8 Because of this, many authors have tried to find ways to analyze CT measurements to predict long-term enophthalmia and diplopia and thus create a more evidence-based rationale for reparative surgery.9- 12
Unfortunately, it is our experience that orbital CT examinations are not yet standardized enough to make specific recommendations in the evaluation of the orbital floor fracture. Also, it is our experience that radiographic readers have a high discrepancy of ability, which further confuses this issue. In a review of the literature, we noted that all volumetric analysis of orbital fractures involved the use of only 1 reader and a highly specific and exact CT format (eg, machine manufacturer, cut thickness, window settings, width, and center).6,10,13,14 Further, highly sophisticated software was often used to calculate the results of these measurements.15 However, these studies do not seem to address the issue of how these highly controlled experiments can be related to everyday situations in which these standards are not used or not yet available, nor do they address the issue of reader variability.
This study attempts to assess the accuracy of CT image readings in an everyday setting by measuring the variability among different radiographic readers.
This study evaluates the variability of computed tomographic measurements of orbital floor fractures.
The magnitude of agreement between the radiographic readers of computed tomographic images of orbital floor fractures as measured by the intraclass correlation coefficient was 0.66 (95% CI, 0.46-0.82) for anterior to posterior length and 0.44 (95% CI, 0.22-0.69) for transverse width, indicating only a moderate degree of concordance.
The authors suggest that surgical decisions regarding treatment of orbital floor fractures be based on clinical findings.
While limited to 1 physician’s experience from 2005 to 2007, the data suggest that the 50% floor fracture criterion on radiological interpretation should not be used in decision making given reader variability, at least in the setting evaluated.
Between January 1, 2005, and June 1, 2007, 1 oculoplastic surgeon (M.G.V.) evaluated 45 patients. Twenty-three fit the criteria as put forth by Hawes and Dortzbach4 of isolated orbital floor fractures thought to be in need of repair (ie, diplopia within 30° of primary gaze, and/or enophthalmos >2 mm, and/or >50% of the floor area involved in the fracture). All patients were referred to the surgeon by an outside physician and came to their evaluation with hard-copy findings of the CT examinations previously ordered by the referring physician. Thus, no controls could be made for the brand of the CT machine, the software, the level of the windows, the slice thickness, the timing of CT after the injury, the size of the images on the film, or the overall quality. All included coronal images. All patients were given a clinical examination including motility, forced ductions, and exophthalmometry when possible. All ages and both sexes were accepted and recorded, as were the cause and side of the injury. The University of Alabama at Birmingham Institutional Review Board approved this study, and the study adhered to the tenets of the Declaration of Helsinki. Informed consent was not required, as there were no identifiers of any kind that any participant saw and no risk to participants.
All 23 patients underwent orbital exploration and repair by the single oculoplastic surgeon. In each case, an inferior forniceal, transconjunctival incision was made and deepened to the infraorbital rim. The periosteum was then elevated and all prolapsed tissue was gently removed from the fracture (thus confirming the CT examination findings). This allowed for visualization of the full extent of the orbital floor fracture.
The orbital CT images were then given to 3 different, fellowship-trained neuroradiologists. Two were from level 1 tertiary care centers, one of which was a university-based medical center. All 3 were either chairman of their department or chief of their service. Two were full-time teachers in the university-based residency.
In each case, the neuroradiologist was given the hard-copy CT images with no access to any information other than that they were from a patient who presented with a possible orbital floor fracture. The neuroradiologists were blinded to all other demographic information, including the eye in question, the surgical findings, patient complaints, and physical signs noted on examination. Separately, each neuroradiologist was asked to determine the maximal anterior to posterior length and transverse width if a floor fracture existed. Each was blinded to the other readers’ measurements. Each reader was given unlimited time to read each image. All measurements were then recorded.
To evaluate the agreement between radiologists, intraclass correlation coefficients (also referred to as the reliability coefficient) were calculated for length and width using a 2-way mixed-effects model. The intraclass correlation coefficient represents the proportion of the total variability in a given measure that can be attributed to the true variability among individuals. It assumes values from 0.0 to 1.0, with values of 0.00 or lower considered poor; greater than 0.00 to 0.20, slight; 0.21 to 0.40, fair; 0.41 to 0.60, moderate; 0.61 to 0.80, substantial; and 0.81 to 0.99, almost perfect agreement.
The mean (SD) age of the patients was 31.5 (17.6) years (range, 8-73 years); there were 14 males and 9 females. The 23 sets of hard-copy images came from 18 outside hospitals between January 1, 2005, and June 1, 2007. Slice thickness varied from 0.6 to 3.8 mm with an average of 2.58 mm and a mode of 3 mm. Twenty-one sets had bone windows included. Repair was done on 16 left eyes and 7 right eyes.
For anterior to posterior length measurements, the intraclass correlation coefficient for the 3 radiologists was 0.66 (95% CI, 0.46-0.82), indicating substantial agreement. The intraclass correlation coefficient was a function of fracture size: the intraclass correlation coefficient was 0.75 (95% CI, 0.54-0.88) for fractures 24 mm or longer compared with 0.37 (95% CI, 0.06-0.84) for those shorter than 24 mm.
For transverse width measurements, the intraclass correlation coefficient for the 3 radiologists was 0.44 (95% CI, 0.22-0.69), indicating moderate agreement. As with length, agreement between raters was a function of fracture size: the intraclass correlation coefficients for larger (≥13 mm) vs smaller (<13 mm) fractures were 0.56 (95% CI, 0.18-0.88) vs 0.22 (95% CI, 0.04-0.65), respectively (Table).
Of note, 1 patient was thought to have no fracture by all 3 readers, when in fact a significant fracture with an entrapped muscle was noted intraoperatively. Also, 1 reader did not identify a fracture in another patient, while the other 2 readers did.
Relying on radiologic measurements to determine fracture size and thus surgical intervention may be complicated less by how they agree with the surgeon’s measurements and more by the variability that they show among themselves. After statistical analysis, the magnitude of variability found only a moderate degree of agreement between the readers for their width measurements. In essence, this means that even if their averages were close, any given patient’s data point may be very different. Moreover, although there appears to be substantial agreement in their average length measurements, this is only true for fractures 24 mm or longer. Shorter fractures showed only a fair degree of agreement between the readers. This is somewhat disconcerting, especially in light of studies13,16- 18 showing that smaller, less displaced fractures may in fact be more in need of urgent repair and thus more accurate CT readings when available.
By choosing neuroradiologists from large academic and metropolitan medical centers, we were able to find the most experienced readers available. Christiansen et al14 noted that neuroradiologists had the lowest variability among readers measuring in vitro, dried human mandible fractures and frozen cadaver mandible fractures when compared with lesser trained readers. However, they found significant discrepancies in their ability to measure condyles that had structural changes; angular measurements were particularly difficult. This occurred despite standardized CT conditions for all readers (eg, slice thickness, software, window levels). In this study, we presented a more difficult task to the neuroradiologists: find, evaluate, and measure an orbital floor fracture in more common, less controlled conditions. The variability among them is significant and troublesome.
Furthermore, the 50% floor fracture criterion itself may be dated. By 1978, Grove et al19 had noted that orbital CT examination gave a superior visualization to Waters views and tomograms, including the relationship between soft tissue and bone. This was confirmed by Hammerschlag et al20 in 1982 in their cadaveric CT studies of blow-out fractures. However, in 1983 when Hawes and Dortzbach4 created the standard by which most blow-out fractures are still repaired today, the 50% floor fracture criterion was based on Waters views and tomograms. This criterion carried over to the CT scan as if it was based on this updated technology. The possibility that CT may overestimate or underestimate the size of the fracture when compared with the earlier technology was not thoroughly investigated.
Also, studies suggest that fracture size does not directly correlate with long-term complications including diplopia and enophthalmos. By 1986, Manson et al8 had shown that for a large floor fracture to result in enophthalmia, other necessary injuries have to occur with it. More specifically, damage to the ligamentous and periosteal supports with prolapse of the intraconal fat and soft tissue need to be present. This would take into account that it is not just the size of the isolated fracture but also the overall change of the orbital anatomy including soft tissue. In 2013, Shah et al12 found that small and medium-sized fractures with soft-tissue herniation were more likely to cause diplopia than large fractures and thus recommended early repair or closer observation of small and medium-sized orbital floor fractures with soft-tissue herniation.
Orbital trapdoor fractures, also known as white-eyed blow-out fractures, are another example of fracture size not correlating with clinical outcomes. The trapdoor fracture presents most often in pediatric patients. It allows herniation of orbital contents through the fracture and then entraps these herniated contents.18 In these fractures, there is often minimal evidence of floor disruption on radiologic examination, but urgent surgical intervention within the first few days after injury is required to prevent long-term diplopia and motility restriction.16
It should also be noted that there was a time gap from data collection (January 2005 to June 2007) to presentation of findings at a meeting (October 2009) to submission of this report for publication (May 2015). However, this gap was only due to time constraints. To our knowledge, there have been no major developments in orbital floor fracture management in that interval, with the criteria by Hawes and Dortzbach still widely accepted.
Limitations of this study include that it is 1 physician’s experience from 1 clinical site between 2005 and 2007. However, we believe that this physician’s experience should be generalized, as the study attempted to assess the accuracy of CT readings in an everyday setting. As described in the Methods, there were no standardized CT conditions, readers were blinded to demographic information and the other readers’ measurements, and images came from a variety of medical centers.
This study shows how variable CT measurements of orbital floor fractures can be in a clinical setting. Surgical decisions regarding treatment of orbital floor fractures should be based on clinical findings rather than radiological interpretations. Specifically, the 50% floor fracture criterion should not be used in decision making for a number of reasons. First, radiologists’ variability in assessing anterior to posterior length and transverse width suggests that accurately measuring such parameters from orbital CT scans is impractical. Second, the size of the floor fracture is not necessarily predictive of the risk of complications, with smaller fractures with herniation more likely to cause diplopia. Instead, clinical signs including the degree of diplopia and enophthalmia should be of primary importance when assessing the need for surgical intervention after an orbital floor fracture; CT should still be used as an adjunct study to assess for soft-tissue herniation and rectus involvement. Finally, we encourage emergency department physicians, or any physician who first evaluates a patient with a possible orbital floor fracture, to consult an oculoplastic surgeon for a second opinion if there are any doubts or if there is significant discordance between radiological and clinical findings.
Corresponding Author: Chaitanya Allamneni, BA, 1000 19th St S, Birmingham, AL 35205 (firstname.lastname@example.org).
Submitted for Publication: May 2, 2015; final revision received July 15, 2015; accepted July 28, 2015.
Published Online: October 8, 2015. doi:10.1001/jamaophthalmol.2015.3501.
Author Contributions: Dr McGwin and Mr Allamneni had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Vicinanzo, Allamneni, Long.
Acquisition, analysis, or interpretation of data: All authors.
Drafting of the manuscript: All authors.
Critical revision of the manuscript for important intellectual content: All authors.
Statistical analysis: McGwin, Allamneni.
Administrative, technical, or material support: Vicinanzo, Long.
Study supervision: Vicinanzo, Allamneni, Long.
Conflict of Interest Disclosures: All authors have completed and submitted the ICMJE Form for Disclosure of Potential Conflicts of Interest and none were reported.
Previous Presentation: This work was presented at the 40th Annual Fall Scientific Symposium of the American Society of Ophthalmic Plastic and Reconstructive Surgery; October 22, 2009; San Francisco, California.