Frequency of individual computed tomographic findings in acute appendicitis, tabulated by severity level. *All findings were graded on a 0 (absent) through 3 (severe) scale, except for appendiceal diameter, which was separated into the following 3 groups: 5 to 9, 10 to 14, and 15 to 19 mm.
Hansen AJ, Young SW, De Petris G, Tessier DJ, Hernandez JL, Johnson DJ. Histologic Severity of Appendicitis Can Be Predicted by Computed Tomography. Arch Surg. 2004;139(12):1304-1308. doi:10.1001/archsurg.139.12.1304
A regression model based on computed tomographic (CT) findings alone can accurately predict the histologic severity of acute appendicitis in patients who have a high disease likelihood.
Mayo Clinic in Scottsdale, Ariz.
Consecutive sample of 105 patients (50 women and 55 men, aged 15-89 years) undergoing nonincidental appendectomy within 3 days of nonfocused abdominal CT.
Computed tomographic scans and histologic features were retrospectively reinterpreted. Each patient’s histologic and CT findings were scored by standardized criteria. An ordinal logistic regression model was constructed with a subset of CT findings that statistically correlated best with the final histologic features. Predicted severity values were then generated from the model.
Main Outcome Measure
Agreement between predicted and actual histologic severity, using weighted κ measurement.
Computed tomography variables used in the model were fat stranding, appendix diameter, dependent fluid, appendolithiasis, extraluminal air, and the radiologist’s overall confidence score. The weighted κ measurement of agreement between predicted and actual histologic severity was 0.75, with a 95% confidence interval between the values of 0.59 and 0.90.
Computed tomographic findings, when used with the regression model developed from this pilot study, can accurately predict the histologic severity of acute appendicitis in patients initially seen with a high clinical suspicion of the disease. These findings provide a platform from which to prospectively test the model.
Appendectomy is the most common abdominal surgical procedure performed on an emergent basis worldwide.1 Early diagnosis is crucial in preventing perforation and resultant morbidity. Historically, the appendix is normal in approximately 20% of suspected cases of appendicitis that proceed to surgery, although modern negative laparotomy rates are significantly lower.2- 5 Computed tomography (CT) has been shown to aid in the diagnosis of acute appendicitis, with up to 98% sensitivity, 98% specificity, and an overall accuracy of 98%.2,3,6- 10 Computed tomography has contributed greatly to decreasing the performance of negative laparotomies. Although the accuracy of CT has been clearly demonstrated, areas of diagnostic uncertainty still exist. Uncertainty in the diagnosis of acute appendicitis, even in the presence of a high clinical suspicion, may result in unnecessary morbidity (4.60%) and mortality (0.14%) through negative laparotomy or delay in surgical therapy.11,12
Prior studies have identified common radiologic findings in acute appendicitis.10,13- 16 Identification of a set of these findings may lead an interpreter to accept or to reject the radiologic diagnosis of acute appendicitis. Clinicians interpreting CT scans and clinical information must ultimately rely on their subjective experience to establish a diagnosis of acute appendicitis. A wealth of literature has been published citing experience with and methods of accurately diagnosing or ruling out the disease. Previous publications have described scoring systems that grade appendicitis clinically, based on various combinations of CT, surgical, and pathologic findings.17- 20 To our knowledge, no formal, predictive model has been created to ascertain the pathologic severity of the disease, given a set of radiologic findings. Such a system could provide a surgical consultant with a powerful tool to aid in clinical decision making. The purpose of this study is to devise a system that may accurately predict the histologic severity of acute appendicitis, as proven by final pathologic findings, based on CT findings. Such a scoring system could potentially facilitate improved judgment in operative timing and potential nonoperative management of patients initially seen with a highly probable clinical picture of acute appendicitis.
After obtaining approval of the Mayo Clinic in Scottsdale, Ariz, institutional review board, the medical records of all patients (Table 1) evaluated at our institution during the study period (January 1, 2001-April 30, 2003) with suspected acute appendicitis were reviewed. Only those patients who proceeded to laparotomy or laparoscopy for nonincidental appendectomy are included in this study.
The number of initial cases was 148. On initial medical record review, some variability in CT technique was observed. To establish uniformity of analysis, patients were selected who had undergone preoperative, nonfocused, abdominal CT scan, using enteric and intravenous contrast media. These factors were chosen based on their demonstrated results reported in recent literature.3,6,9,21 Rectal contrast or air enema was present in most but was not required as inclusion criteria. The selected group also proceeded to appendectomy, via laparoscopy or laparotomy, within 3 days of the CT. One patient was eliminated because of the inability to visualize the appendix on CT, which precluded the accurate diagnosis of appendicitis. Another patient was the only one in the study group to have the finding of diffuse cecal wall thickening. This patient was excluded from the final statistical analysis with the intent to use findings that are commonly seen on CT performed for suspected appendicitis. The final number of patients totaled 105.
A board-certified staff radiologist (S.W.Y.), fellowship trained in body imaging, interpreted all CT scans. The radiologist was provided with the list of selected patients, in addition to the following predetermined list of possible CT findings in acute appendicitis that was based on current literature13: abscess, adenopathy, phlegmon, cecal bar, fat stranding, enlarged appendix, focal cecal apical thickening, appendolithiasis, arrowhead sign, dependent fluid, extraluminal air, terminal ileal wall thickening, sigmoid wall thickening, focal cecal wall thickening, and diffuse cecal wall thickening. The radiologist was blinded to the outcome of each individual case, although, by necessity of the electronic viewing system’s retrieval function, the radiologist was provided with the patients’ identification numbers. The original radiology interpretation was not accessed. Instead, the CT scans were reinterpreted using the standardized list of potential findings. A value of 0 through 3 was assigned to each item (0, absent; 1, mild; 2, moderate; and 3, severe). Appendix diameter was not graded, but instead measured in millimeters. For comparison purposes to usual clinical practice, an overall confidence of diagnosis score was assigned to each CT scan by the radiologist (0, appendicitis absent/unlikely; 1, equivocal; 2, appendicitis likely; and 3, appendicitis strongly suspected).
Next, a board-certified staff pathologist (G.D.), fellowship trained in both gastrointestinal and oncologic pathology, was provided with the histologic slides from the cases corresponding to the aforementioned CT scans. The pathologist was also blinded to the outcome of each individual case and the slides were assigned research numbers for anonymity. The original pathologic interpretations were not accessed, due to lack of uniformity of diagnostic criteria. Each patient’s slides were then reinterpreted, assigning a score from 0 through 3, based on the following criteria: 0 indicates no acute inflammation; 1, mucosal infiltrate of neutrophils (acute mucosal appendicitis); 2, inflammation into submucosa and/or muscle (acute suppurative appendicitis); and 3, extensive necrosis of appendiceal wall (gangrenous appendicitis).
The relationship between radiologic findings (predictor variables) and pathologic findings (outcome variables) was statistically analyzed. Each radiologic finding, including the overall confidence of diagnosis score, was correlated with the actual severity score of appendicitis as determined by pathologic review. The Spearman correlation for ranks was used to assess statistical significance. The predictor variables with the strongest correlation coefficients were then selected for inclusion in an ordinal logistic regression model. A predicted histologic severity value was obtained from the model for each patient and the predicted value was compared with the corresponding actual histologic severity score. The weighted κ measurement of agreement was used to analyze the comparison, providing a measure of accuracy of the regression model.
The Figure graphically displays the frequency of each radiologic finding, tabulated by severity levels. All findings were graded on a 0- through 3-point scale, except for appendiceal diameter, which was separated into the following 3 groups: 5 to 9, 10 to 14, and 15 to 19 mm.
Table 2 lists the Spearman correlations, which correlate the presence of each radiologic finding to the actual histologic severity. The variables that were included in the ordinal logistic regression model, owing to their relatively higher correlative values, include the following: (1) extraluminal air, (2) the radiologist’s overall confidence score, (3) appendix diameter, (4) fat stranding, (5) appendolithiasis, and (6) dependent fluid.
Table 3 gives the comparison between the predicted histologic severities from the regression model and the actual pathologic values. The weighted κ value derived from the data was 0.75, with a 95% confidence interval between the values of 0.59 and 0.90.
A model that accurately predicts the histologic status of patients who had a CT scan consistent with acute appendicitis could prove to be a useful clinical tool, especially in cases of uncertain patient disposition. Given the common presentation of this surgical entity, many patients could potentially be affected if even a small percentage were triaged with greater accuracy, based on mathematical prediction of the histologic severity of their disease state. A numeric score falling within the range indicating definite acute appendicitis would support the decision to proceed immediately to surgery. However, patients with scores at either end of the spectrum would benefit from more specific predictive classification of their illness. It has been shown that some patients with symptoms consistent with, but uncertain for, acute appendicitis may avoid surgery altogether, instead being observed, then discharged if their symptoms resolve.18,19 It is plausible that these patients, indeed, have early or mild appendicitis, although this would be impossible to confirm without appendectomy. Some patients with the diagnosis of appendicitis by CT scan have been treated nonoperatively with reasonable results.22 Additionally, nonoperative treatment of perforated appendicitis has been documented with good success.23,24 The traditional practice of interval appendectomy has been called into question by some, indicating that patients who do not have recurrent episodes of appendicitis within 3 to 6 months may never need an appendectomy.23 Any improvement to the current process of diagnosis and determination of patient disposition by an accurate predictive model could help streamline and potentially decrease the costs and morbidity associated with this significant health problem.
This study was designed as a pilot study to lay the foundation for a future randomized controlled trial. Although the results are encouraging, further research will be required before they may be used in clinical practice.
A strength of the current study was its strict use of objective, standardized radiologic and pathologic data evaluated retrospectively by independent observers. Although the outcome of appendectomy was known for every patient, the actual outcome being measured was histologic severity, which allowed the opportunity to blind the radiologist and the pathologist interpreting the CT scans and slides. The study was designed with the specific intent of avoiding potential sources of error, such as reporting bias, to strengthen the assumptions made regarding the validity and applicability of the regression model. One radiologist interpreted all of the CT scans, and one pathologist interpreted all of the histologic specimens. This maximized precision of interpretation. However, no specific attempt was made to determine the reproducibility of the individual interpreters’ findings, which potentially introduces decreased accuracy of interpretation.
A weakness of the study was selection of a group of patients who had the known clinical outcome of appendectomy. As discussed by Raptopoulos et al17 in a related study, a very high probability of acute appendicitis introduces test review bias. This may in turn lead to overestimation of the sensitivity of CT and the regression model in predicting the actual histologic severity of acute appendicitis. However, the study was intended to determine a useful regression model that would predict histologic severity based on CT findings, not to determine how accurately CT scanning can distinguish the presence or absence of acute appendicitis. Additionally, the radiologist and pathologist were blinded to the pathologic findings to decrease test review bias.
A further weakness of the study was its inability to include all patients with appendicitis. Our patient study group was compiled based on performance of appendectomy shortly after having an abdominopelvic CT scan. It is likely that there were patients seen in the study period who had acute appendicitis but were treated for other conditions and recovered without appendectomy. The patients in the study group with low predicted and actual histologic severity scores were too few to state firmly that the model would be truly useful for predicting histologic severity in patients with early or mild appendicitis. Although some patients may have been unintentionally excluded, no patients were intentionally treated nonoperatively for acute appendicitis during the study period. At the other extreme, patients undergoing interval appendectomy would not have been included in the study group, owing to the inclusion requirement of having a CT scan within 3 days of appendectomy. This only amounted to one patient in the study period. These exclusions, although minimized, could be an important source of bias, and could weaken the assumption that the devised model can accurately predict histologic severity, based on exclusion of some patients at either end of the spectrum of severity.
A thorough description of the statistical methods used in the current study may be obtained elsewhere.25,26 However, certain points warrant mention. Spearman correlations associate predictor variables with outcome measures. With respect to the study population collectively, a high value would indicate the absence of a particular finding when appendicitis is absent or mild, and the high-grade presence of the same finding in severe appendicitis. Conversely, a low or negative value would indicate weak or inverse correlation, respectively. As expected, the relationships between each individual radiologic finding and the actual histologic severity were relatively weak. These data show that no single finding on CT scan can reliably and specifically predict the severity of appendicitis, which substantiates the findings of previous studies.14,15 Instead, multiple concurrent findings contribute to a common diagnosis. Although all of the listed radiologic findings correlated to some extent with the final histologic diagnoses in the study patients, the ones that were shown to correlate most strongly were (1) extraluminal air, (2) the radiologist’s overall confidence score, (3) appendix diameter, (4) fat stranding, (5) appendolithiasis, and (6) dependent fluid. Thus, these findings were used in the predictive model. The radiologist’s overall confidence score is not a specific, objective finding on CT, but it was shown to be a highly correlative factor in predicting final histology. As may be noted from the Figure, inclusion in the regression model did not directly depend on the frequency of occurrence. In fact, the findings of dependent fluid, appendolithiasis, and extraluminal air were relatively infrequent, which suggests that their occurrence, in the presence of the more common findings of fat stranding and appendiceal dilation should help to solidify the diagnosis of appendicitis.
The ordinal logistic regression model was chosen specifically to relate multiple independent radiologic variables to one dependent variable (predicted histologic severity of appendicitis). Since the dependent outcome variable was ranked on a scale of 0 through 3, rather than a simple binary outcome variable, simple logistic regression could not be used. Rather, the order of the ranked scale was considered and used in the ordinal logistic regression.
The weighted κ measurement of agreement was used in this study to measure the regression model’s overall suitability in predicting actual histologic severity of acute appendicitis. A weighted κ measurement considers complete and partial agreement, and assigns a weight related to the degree of disagreement.26 The weighted κ value derived from the data was 0.75, with a 95% confidence interval between the values of 0.59 and 0.90. A κ value of this magnitude is generally interpreted as strong and suggests a useful predictive value of the proposed regression model. A study designed to rely on analysis such as a weighted κ measurement has the inherent problem of generalizability.25 Therefore, application of the model should be reserved for the type of population in which it was developed, namely, patients having a working diagnosis of acute appendicitis.
Computed tomography is a commonly used clinical tool that has been clearly demonstrated to contribute to the accurate diagnosis of acute appendicitis. Once a strong clinical suspicion of acute appendicitis has been affirmed, deciding on the most advantageous treatment is key. This study accomplished the goal of devising a predictive model that in the near future may help stratify such patients based on prediction of the histologic severity of their disease. It provides a foundation for a subsequent prospective trial that will use predictive stratification to determine the need for and timing of appropriate operative intervention. Such prospective study is necessary before the model can be applied to widespread clinical practice.
Correspondence: Daniel J. Johnson, MD, Department of Surgery, Division of General Surgery, Mayo Clinic Scottsdale, 13400 E Shea Blvd, Scottsdale, AZ 85259 (email@example.com).
Accepted for Publication: May 20, 2004.