Sign In
Individual Sign In
Create an Account
Institutional Sign In
OpenAthens Shibboleth
April 2005

Interexaminer Reliability in Physical Examination of Pediatric Patients With Abdominal Pain

Author Affiliations

Author Affiliations: Department of Pediatrics, Section of Pediatric Emergency Medicine (Drs Yen, Karpas, and Gorelick), Department of Surgery, Section of Pediatric Surgery (Dr Pinkerton), Medical College of Wisconsin, Milwaukee; Children’s Hospital Research Institute, Children’s Hospital of Wisconsin, Milwaukee (Drs Yen, Karpas, and Gorelick).


Copyright 2005 American Medical Association. All Rights Reserved. Applicable FARS/DFARS Restrictions Apply to Government Use.2005

Arch Pediatr Adolesc Med. 2005;159(4):373-376. doi:10.1001/archpedi.159.4.373

Objective  To test the interexaminer reliability of abdominal examinations performed by pediatric emergency medicine physicians and surgeons in an emergency department.

Methods  A prospective cross-sectional study in which 3 different types of physicians (pediatric emergency department residents, pediatric emergency department attending physicians, and pediatric surgeons in training) independently examined a convenience sample of children (aged 3-19 years) with initial complaint of abdominal pain. The interexaminer reliability of 6 components of the abdominal examination (the presence or absence of abdominal distension, abdominal tenderness to percussion, abdominal tenderness to palpation, abdominal guarding, rebound tenderness, and bowel sounds) and the clinical diagnosis of peritonitis was tested.

Results  Sixty-eight patients were examined by pediatric emergency department residents and pediatric emergency department attending physicians. All 3 physician types examined 46 of these 68 patients. When comparing residents and attending physicians, the components of the abdominal examination showed less than moderate chance-adjusted agreement (κ range, −0.04 to 0.38). When comparing attending physicians and surgeons, the presence of rebound tenderness showed moderate agreement (κ = 0.54). The rest of the components demonstrated less than moderate chance-adjusted agreement (κ range, −0.04 to 0.34).

Conclusions  The components of the abdominal examination are poorly reliable between physician types. Only the “rebound tenderness” component of the abdominal examination shows moderate agreement between the pediatric emergency department attending physicians and the surgeon. No component of the abdominal examination appears to be consistently reliable. Interexaminer agreement must be considered when developing management strategies for acute abdomen. Interventions to improve reliability should be developed.

The evaluation of the child with abdominal pain is an important and challenging task.1 The abdominal examination is a critical element in this evaluation. In the interest of cost containment and patient safety, the limitation of unnecessary tests by the performance of an accurate “initial test” (the history and physical examination) is paramount.2 Studies have compared the diagnostic accuracy of the abdominal clinical examination with other laboratory and imaging tests.3 Studies have evaluated reliability of the history and physical examination of the abdomen of adults.4 We, however, did not find any studies that evaluated interphysician agreement on the clinical abdominal examination components of children with acute abdominal pain initially seen in emergency departments.

It is essential for any diagnostic test, including the physical examination, to be reliable (findings reproducible when repeated by the same or a different examiner). For example, when the presence or absence of a physical examination finding is involved in the determination of the next step along a practice pathway, unacceptable practice variation can result from differences between examiners. Emergency medicine physicians make clinical decisions about diagnostic evaluations and consults based on the history and physical examination results of patients with acute abdominal pain. Inconsistencies in the physical examination findings between emergency department physicians (eg, residents, attending physicians) and other physicians (eg, surgeons) can potentially delay the correct diagnosis or lead to unnecessary delays in administration of analgesics.5

The primary objective of this study was to test the interexaminer reliability of abdominal examinations performed by pediatric emergency medicine physicians and surgeons in an emergency department in the evaluation of children with abdominal pain.


This was a cross-sectional study conducted at a university-affiliated, urban pediatric emergency department with an annual emergency department census of 43 000. During the study period (May 1, 2002, to April 1, 2003), 3- to 19-year-old patients seen in the pediatric emergency department with a chief complaint of abdominal pain of less than 72 hours’ duration were eligible for enrollment and prospectively identified. Patients with prior abdominal surgery and/or developmental delay were excluded. Study subjects were a convenience sample of eligible patients who were seen when a research assistant was present to enroll the patients and when a pediatric surgeon was available to examine the patient during the study time frame. Formal surgical consultation was not required. Informed consent and assent were obtained prior to study commencement. The study was approved by the hospital’s institutional review board.

Three different physician types examined the patient: residents rotating in the emergency department (including pediatric, emergency medicine, or family practice residents), pediatric emergency medicine physicians (attending physician or fellow), and pediatric surgeons in training (senior surgical resident or fellow).

The examiners were allowed to ask the patient’s medical history. The 3 examiners independently performed the physical examination (including the abdominal examination). Each examination was performed with only 1 examiner in the room with the patient at a time. Each of the examiners was blinded to the findings of the other examiners. The interval between examinations was less than 30 minutes. The order of examiners was resident, pediatric emergency medicine physician, and then pediatric surgeon. Each physician completed an abdominal examination form eliciting information about the presence of the clinical findings shown in the Table. If any part of the examination was not possible to complete, the physician had the option of marking “unable to assess.” During the session, the patient was not told the results of the examination.

Image not available
Percentage Overall Agreement and κ Statistic for Each Clinical Examination Finding and Diagnosis of Peritonitis

The prevalence of each of the clinical findings for each of the physician types was calculated. For each clinical finding, 2 × 2 tables comparing the rating (present or absent) by each of 2 observers were constructed. Two pairwise comparisons were evaluated: pediatric emergency medicine physician vs resident and pediatric emergency medicine physician vs pediatric surgeon. For each 2 × 2 table, the total overall agreement and a κ statistic (chance-adjusted agreement) were calculated.6 By convention, a κ value of 0.6 or greater is considered to indicate good to excellent agreement, while a κ of less than 0.4 is considered poor agreement.7 For the examination components of the abdominal examination, we considered moderate agreement (κ range, 0.4-0.6) as a reasonable goal. Bootstrapping was used to calculate 95% confidence intervals.8 All statistical computations were performed with Intercooled Stata 7 for Windows (STATA Corp, College Station, Tex).


A total of 68 patients had examinations performed by both a pediatric emergency medicine resident and the pediatric emergency medicine attending physician. Forty-six patients had examinations performed by both the pediatric emergency medicine attending physician and a pediatric surgeon. The pediatric surgeon did not complete the abdominal examination within the half-hour window for 22 of the patients. The mean age of subjects was 10 years, and 68% of the subjects were female. Six patients were admitted for surgical reasons (observation for possible appendicitis or actual appendicitis), 5 patients were admitted for medical reasons (eg, rehydration), and the rest were discharged home.

The prevalence of positive examination finding for each of the components is presented in the Figure. Overall agreement between pediatric emergency medicine attending physicians and pediatric emergency department residents was calculated and presented in the Table. For each of the examination components, there is relatively high raw agreement (ranging from 69.1% for abdominal tenderness with palpation to 95.5% for clinical diagnosis of peritonitis).

Image not available

Percentage prevalence of positive clinical examination finding by physician type. ED indicates emergency department.

A similar pattern of agreement exists between pediatric emergency medicine attending physicians and pediatric surgeons. Overall agreement between these groups ranged from 65.2% for abdominal tenderness with palpation to 95.5% for clinical diagnosis of peritonitis.

However, chance-adjusted agreement was generally fair to poor, as presented in the Table. The κ statistic for the agreement between pediatric emergency medicine physicians and residents on examination components ranged from −0.04 for absence of bowel sounds to 0.38 for the diagnosis of peritonitis. Similar values for κ were found between pediatric emergency medicine physicians and pediatric surgeons. The only examination component with κ higher than 0.4 was rebound tenderness, with a κ of 0.54, indicating intermediate agreement.


We found poor interexaminer reliability among a broad range of examiners in the clinical examination of acute abdominal pain. Only the presence of rebound pain showed moderate agreement between the pediatric surgeons in training and the pediatric emergency medicine attending physician.

These differences may be the result of differences in training and levels of experience. Since the study did not define what is considered a positive finding, each of the physician types may have different definitions of what a true, positive finding is.

A possible explanation for the difference between pediatric emergency medicine physicians and surgeons is that these groups may, in effect, calibrate their examinations differently because of different intrinsic thresholds. The presence of positive clinical examination components tended to be higher with the pediatric emergency medicine attending physician compared with the pediatric surgeon. This may be related to the fact that positive clinical examination results would mean different things for different physician types. For the pediatric emergency medicine physician, a positive examination finding would mean a pediatric surgery consultation; false-positive results bear relatively little cost compared with a false-negative result, which could lead to a missed diagnosis. For a pediatric surgeon, in contrast, a positive examination result may mean surgery. Surgeons would be expected to want to minimize false-positive results. Thus, pediatric emergency medicine physicians would interpret findings in a way to maximize sensitivity, while the surgeons may aim for greater specificity.

There are other limitations to this study. Since most of the patients had normal test results for many of the clinical examination components, considerable agreement may occur when the abnormal examination findings are rare. Under such circumstances, the κ statistic may appear artificially low.9,10

Furthermore, our study did not look at the severity or final outcome of the patient. For example, we did not look at who went to surgery or whether patients who were discharged from the hospital returned with an acute abdomen. Agreement may improve if the abdominal pain is more severe. Thomas et al11 demonstrated fairly good agreement for the need for opioid analgesia between patients and physicians. Also, physicians may agree more frequently in patients with more serious diagnoses, in whom findings may be less equivocal. Relatively few patients in this study were thought to have peritoneal signs by any of the examiners.

The implications of this are that interexaminer agreement must be considered when developing management strategies for acute abdominal pain. Recent increases in the development and use of clinical guidelines and practice pathways are a prime example of how determination of interexaminer reliability of physical examination components may be important. Clinical guidelines that consider the abdominal physical examination in their decision trees must be aware of the poor reliability of the examination components between physician types.

More research on the reliability of abdominal and other physical examination components needs to be performed to understand the limitations of the clinical examination. Interventions that improve abdominal examination reliability should be developed and investigated. As stated earlier, our study did not define what was a normal or abnormal examination finding. Providing definitions would probably be a first and essential step to reduce variability, after which, standardizing the method of examination should also be investigated and considered.

Back to top
Article Information

Correspondence: Kenneth Yen, MD, Department of Pediatrics, Medical College of Wisconsin, 9000 W Wisconsin Ave, Milwaukee, WI 53224 (

Accepted for Publication: December 7, 2004.

Acknowledgment: We thank Jo Bergholte, MS, and her staff of research assistants for assistance with patient enrollment, data collection, data entry, and review of the manuscript.

Irish  MSPearl  RHCaty  MGGlick  PL The approach to common abdominal diagnosis in infants and children. Pediatr Clin North Am 1998;45729- 772
Khushf  G The aesthetics of clinical judgment: exploring the link between diagnostic elegance and effective resource utilization. Med Health Care Philos 1999;2141- 159
Andersson  RE Meta-analysis of the clinical and laboratory diagnosis of appendicitis. Br J Surg 2004;9128- 37
Bjerregaard  BBrynitz  SHolst-Christensen  J  et al.  The reliability of medical history and physical examination in patients with acute abdominal pain. Methods Inf Med 1983;2215- 18
Kim  MKStrait  RTSato  TTHennes  HM A randomized clinical trial of analgesia in children with acute abdominal pain. Acad Emerg Med 2002;9281- 287
Fleiss  JL Statistical Methods for Rates and Proportions. 2nd ed. New York, NY John Wiley & Sons1981;
Anthony  D Validity and Reliability: Understanding Advanced Statistics.  London, England Harcourt Brace1999;29- 44
StataCorp, bstrap—Bootstrap sampling and estimation. Stata Reference Manual Release 7 1 A-G. College Station, Tex Stata Press2001;164- 174
Cicchetti  DVFeinstein  AR High agreement but low kappa, II: resolving the paradoxes. J Clin Epidemiol 1990;43551- 558
Feinstein  ARCicchetti  DV High agreement but low kappa, I: the problems of two paradoxes. J Clin Epidemiol 1990;43543- 549
Thomas  SHBorczuk  PShackelford  J  et al.  Patient and physician agreement on abdominal pain severity and need for opioid analgesia. Am J Emerg Med 1999;17586- 590