Ohmann C, Franke C, Yang Q, and the German Study Group of Acute Abdominal Pain. Clinical Benefit of a Diagnostic Score for AppendicitisResults of a Prospective Interventional Study. Arch Surg. 1999;134(9):993-996. doi:10.1001/archsurg.134.9.993
Clinical use of a diagnostic score improves decision making in acute appendicitis.
A before-and-after trial comparing a group of patients undergoing standard diagnostic workup with no additional diagnostic support (phase 1) with a group of patients undergoing additional diagnostic support with a score (phase 2).
Eight departments of surgery in Germany and Austria.
Eight hundred seventy patients with acute abdominal pain in phase 1 (October 1, 1994, to April 30, 1995) and 614 patients in phase 2 (February 1, 1995, to August 15, 1995).
Structured and standardized history and clinical investigation in all patients with computer-based documentation; introduction of the diagnostic score after phase 1 and computer-supported use of the score in phase 2.
The 2 groups were comparable with respect to signs, symptoms, and investigations related to acute appendicitis. Diagnostic performance of the final examiner decreased with the score (specificity, 86% vs 78%; positive predictive value, 67% vs 50%; and accuracy, 88% vs 81%). There were no differences in the rates of perforated appendix, appendectomy with normal findings, and complications; however, the delayed appendectomy rate (2% vs 8%) and the delayed discharge rate (11% vs 22%) were significantly lower with diagnostic support by the score (P=.02).
Integration of a score into the diagnostic process may have unforeseen clinical effects. The tested score cannot be recommended as a standard tool for diagnostic decision making in acute appendicitis.
THE EARLY and accurate diagnosis of acute appendicitis is still a difficult problem.1 Despite introduction of ultrasound and special laboratory investigations (eg, C-reactive protein), high diagnostic error rates are observed.2 As a consequence, perforation rates and rates of appendectomy with normal findings of 15% and more occur.3
In the last few years, several scoring systems have been developed for supporting the diagnosis of acute appendicitis.4- 12 Initial evaluation studies have reported excellent results, indicating that scoring systems would be ideal as diagnostic aids because they have good performance and require no special equipment, being user-friendly and comprehensible to the clinician.1,7,10- 12 However, the clinical benefit of a diagnostic score integrated into the diagnostic process has not been investigated so far in a prospective study with adequate methods. We therefore performed such a study with the use of a diagnostic score developed and evaluated in Germany.
The investigation was performed as a multicenter prospective interventional study with 8 German or Austrian surgical hospitals, including 3 university hospitals. Included were all patients with acute abdominal pain within 1 week before hospital admission. Excluded were patients with postoperative acute abdominal pain, trauma, or hernia; children less than 6 years old; patients who gave no informed consent; and patients with no definite final diagnosis. Acute appendicitis was diagnosed only on histopathological grounds according to the following criteria: macroscopic signs: intravascular injection of the serosa; fibrinous, purulent film; edematous, hemorrhagic, necrotic changes of the wall; and blood (not sufficient) or pus on opening of the appendix; microscopic signs: focal or expanded erosion, ulceration, abscess, fistula, necrosis, or perforation.
Not sufficient were fibrosis taken as evidence of subsided inflammation, intravascular injection of the serosa as the only finding, and description of few granulocytes.
Perforation had to be proved on histopathological grounds. There was no option for diagnosing "chronic appendicitis" or "subacute appendicitis." In the case of outpatients, a follow-up was performed after 30 days (telephone interview).
In all patients, a structured and standardized history and clinical investigation were performed according to international standards. Data were documented with a user-friendly computer program and form-based data entry.13 In case of computer breakdown, forms were available for data collection.
The study was performed in 2 consecutive phases: phase 1, no additional diagnostic support (4 months); and phase 2, diagnostic support with a score based on history, clinical examination, and basic laboratory data (4 months) (Table 1).
The diagnostic score was introduced after phase 1 into the hospitals in several ways: distribution of a publication, presentation in training sessions and clinical conferences, and by posting in the outpatient ward. The score was integrated into the computer program and automatically presented after data input of the history, clinical examination results, and basic laboratory data. After special laboratory investigations, ultrasound, and x-ray, the diagnosis of the final examiner after all investigations (in the majority of cases, a senior surgeon), the final diagnosis at discharge, and the outcome of disease were documented prospectively with the computer program. Comparability of the study groups was investigated for signs and symptoms related to acute appendicitis, the distribution of the final diagnoses, and the diagnostic investigations performed.
The outcome criteria were the diagnostic accuracy of the final examiner with respect to appendicitis (sensitivity, specificity, positive and negative predictive value, and accuracy), the perforated appendix rate, the rate of appendectomy with normal findings, the rate of laparotomy with normal findings, the delayed appendectomy rate, the complication rate, and the delayed discharge rate. For the outcome criteria, the following definitions were used: perforated appendix rate, proportion of patients with acute appendicitis who had a histologically proved perforation; negative appendectomy rate, proportion of patients with appendectomy in whom no appendicitis was found; negative laparotomy rate, proportion of laparotomies that were unnecessary (no intraoperative or histological diagnosis); delayed appendectomy rate, proportion of patients with appendicitis in whom the appendectomy was performed the second day or later after admission; and delayed discharge rate, proportion of patients with appendicitis who were discharged 10 days or later after admission.
Statistical comparisons between the 2 phases were performed with the χ2 test excluding missing data.
There are no general guidelines and rules in Germany for the performance of studies with formal decision aids based on routinely assessed clinical variables. We decided to give an information brochure to the patients explaining the study and to give them the option not to take part in the study.
Overall, 1484 patients could be enrolled in the study: 870 patients in phase 1, with no additional diagnostic support, and 614 patients in phase 2, with diagnostic support by the score (Table 2). The starting date of the study varied between centers; phase 1 began between October 1, 1994, and April 30, 1995, and phase 2 between February 1, 1995, and August 15, 1995. The frequency of appendicitis in phase 1 was 23.1% (n=201) compared with 18.6% (n=114) in phase 2. Major diagnoses were no specific abdominal pain (phase 1, 25%; phase 2, 27%), acute dyspepsia (8%, 10%), acute biliary disease (8%, 9%), ileus (4%, 5%), urolithiasis (3%, 5%), urinary tract infection (3%, 4%), and acute diverticulitis (3%, 4%). There were no significant differences between the 2 phases with respect to signs and symptoms related to appendicitis (Table 3). Study groups were comparable with respect to ultrasound of the abdomen (phase 1, 65%; phase 2, 64%) and ultrasound of the appendix (11%, 9%). Leukocyte counts were determined significantly more often in phase 2 as a component of the score (88%, 95%; P<.001).
Clinicians' diagnosis of appendicitis changed after introduction of the score (Table 4). Specificity, positive predictive value, and accuracy were significantly lower with diagnostic support by the score. Before introduction of the score, appendicitis was diagnosed less often by the final examiner (31%) than after introduction of the score (36%) (P=.10), contrary to the frequency of appendicitis (23% vs 19%). There were no significant differences with respect to the perforation, appendectomy with normal findings, and complication rates. The delayed appendectomy and delayed discharge rates were significantly lower with diagnostic support. However, timing of appendectomy was not associated with the complication rate (24% in delayed appendectomy vs 10% in nondelayed appendectomy; P<.09; not differentiated between the study phases because of the small sample size). As expected, a higher complication rate was found in patients with delayed discharge than in those without delayed discharge (36% vs 5% in the total study population; P<.001).
There was a linear relationship between the score values and frequency of appendicitis: less than 4.0 points, 3% (phase 1), 0% (phase 2); 4.0 to 5.5 points, 5%, 3%; 6.0 to 7.5 points, 11%, 10%; 8.0 to 9.5 points, 24%, 15%; 10.0 to 11.5 points, 32%, 24%; 12.0 to 13.5 points, 55%, 38%; and 14.0 points or more, 68%, 74%.
Despite all improvements (ultrasound, special laboratory values), routine diagnosis in acute appendicitis still poses a challenging problem. Major areas of concern are perforations (rate of up to 20%), negative appendectomies (rate of up to 30%), delayed operations, complications after operation, and late discharge.3,14 Therefore, several diagnostic scoring systems have been developed, characterized as noninvasive, understandable, user-friendly, and cost-effective.2,4- 8,10- 12 Evaluation studies have demonstrated a good performance for some of these scores, indicating their potential for diagnostic decision making.6- 8,10,12 Testing of these scores on a prospective database of German cases revealed disappointing results.15 None of the scores fulfilled any of the given quality criteria. The lack of separate testing in a prospective study, small sample size, differences in the target population, and geographic variation of the incidence and presentation of the diseases were discussed as major factors.16 For that reason, a new score was developed on the basis of German data, which gave promising results in a first evaluation study.9
Unfortunately, the clinical benefit of none of the scores has been tested in an adequate controlled study, comparing diagnostic performance of the clinician with and without the score. Some reports indicate improvement concerning the negative appendectomy rate or the perforation rate, if compared with historical data. In one study, 2 different surgical units were compared. In the unit that used the score, a negative appendectomy rate of 7% was found, and in the unit not using the score, a negative appendectomy rate of 17%.12 These studies cannot be taken as evidence of the clinical benefit of diagnostic scores in acute appendicitis.17 The optimal approach in clinical research is the randomized controlled clinical trial. In evaluating scores, this design has several pitfalls. Randomization of patients may result in carryover effects, since the physician may be influenced when deciding to treat control patients. A possible solution is to randomize physicians, but previous studies have shown that randomization to the intervention group may motivate physicians more than randomization to the control group.18 An alternative design is to perform a prospective intervention study with a before-and-after design, an approach used in our study. This design may be undermined by secular trends or sudden changes, either in the outcomes to be measured or in characteristics of the study population that influence these outcomes. This type of bias can never be excluded with this design, but it is probably low in our study for the following reasons: uniform data collection according to standard definitions in both phases, no differences between the study populations in the 2 phases (Table 3), and the short duration of each phase (4 months).
Systematic reviews have shown that the effectiveness of clinical guidelines and decision support is critically dependent on 3 factors: development, dissemination, and implementation strategy.19 The probability of being effective is highest if guidelines are developed internally, disseminated by specific educational initiatives, and implemented as patient-specific reminders at the time of consultation. In our study, the majority of participating centers were involved in the development of the score.9 The score was disseminated by specific training sessions or during clinical conferences, and it was applied during the consultation. The score did change clinical practice, although the accuracy of the score as a diagnostic aid was not convincing. Which factors may have biased the results in our study? In a previous multicenter study we showed that standardized and structured data collection did not change clinical performance in 6 German hospitals, so a checklist effect can be discounted. Because of the study design, with 2 consecutive phases and introduction of the score in phase 2, no carryover effects could occur. Systematic feedback was not provided in the study.
From the results of the study, it can be hypothesized that the diagnostic behavior of the clinician was changed in a systematic way. Although occurring less often, possible acute appendicitis was suspected more often in the test phase, but the diagnostic decision was false positive in every second patient (positive predictive value, 50%). Although this did not influence the decision to operate (no difference in the negative appendectomy rate), it helped to avoid delayed but necessary operations. In Germany, the average hospital stay for acute appendicitis is rather long, as was demonstrated in our study. Financing in nonperforated appendicitis in Germany is performed per case (Fallpauschale). The calculation of reimbursement is based on an average hospital stay of 7.16 days for an open operation and 6.04 days for a laparoscopic operation. Only if hospital stay exceeds 14 days (open operation) or 13 days (laparoscopic operation) is additional reimbursement of costs possible (Grenzverweildauer). In our study, we defined a hospital stay of 10 days or longer as delayed discharge and could demonstrate that scoring improved with respect to this outcome criterion. In summary, scoring did not result in an improvement of the classic outcome criteria (negative appendectomy, perforated appendix, and complication rate). Even worse, scoring degraded diagnostic decision making of the final examiner, especially with respect to overprediction of acute appendicitis. However, decreased diagnostic performance did not result in poorer management and outcome; instead, positive effects on the timing of operation and duration of hospital stay were measured.
Two general conclusions and 1 specific conclusion can be drawn from this study. Testing of a score in new clinical environments is necessary before widespread application can be recommended. Integration of a score into the diagnostic process may have unforeseen clinical effects. The existing score cannot be recommended as a standard tool for diagnostic decision making in acute appendicitis.
This work was supported by a grant (project number 01 EI 9606/0) from the German Ministry of Education, Science, Research, and Technology, Bonn, Germany, within the Medizinische Wissensbasen (MEDWIS) program.
Joachim Walenzyk, MD, Georg Federmann, MD, Clinic of General Surgery, Kreiskrankenhaus Goslar, Goslar, Germany; Jörg Krenzien MD, Gabiele Hansdorfer, MD, Surgical Clinic, Klinikum Ernst von Bergmann, Potsdam, Germany; Cornelia Berner, MD, Joachim Eibner, MD, Department of General and Trauma Surgery, Robert-Bosch-Krankenhaus Stuttgart, Stuttgart, Germany; Matthias Kraemer, MD, Klaus Kremer, MD, Surgical Clinic and Policlinic, University of Würzburg, Würzburg, Germany; Heinrich Böhner, MD, Surgical Clinic, Elisabeth-Krankenhaus Essen, Essen, Germany; Martin Labus, MD, Surgical Clinic, Bürgerhospital Frankfurt, Frankfurt, Germany; and Anton Klingler, PhD, Theoretical Surgery Unit, Surgical Clinic, University of Innsbruck, Innsbruck, Austria.
Reprints: Christian Ohmann, PhD, Funktionsbereich Theoretische Chirurgie, Klinik für Allgemein und Unfallchirurgie, Heinrich-Heine-Universität, Moorenstr 5, 40225 Düsseldorf, Germany (e-mail: firstname.lastname@example.org).