Expert Agreement in Current Procedural Terminology Evaluation and Management Coding | Guidelines | JAMA Internal Medicine | JAMA Network
[Skip to Navigation]
Sign In
Table 1. 
Coding Survey Cases*
Coding Survey Cases*
Table 2. 
Characteristics of 136 Coding Specialists
Characteristics of 136 Coding Specialists
Table 3. 
Coding Specialist Current Procedural Terminology (CPT) Evaluation and Management Coding of 6 Hypothetical Cases*
Coding Specialist Current Procedural Terminology (CPT) Evaluation and Management Coding of 6 Hypothetical Cases*
Table 4. 
Undercoding and Overcoding of Cases by Coding Specialists
Undercoding and Overcoding of Cases by Coding Specialists
Original Investigation
February 11, 2002

Expert Agreement in Current Procedural Terminology Evaluation and Management Coding

Author Affiliations

From the Department of Family Medicine, Northwestern University Medical School, Chicago, Ill.

Arch Intern Med. 2002;162(3):316-320. doi:10.1001/archinte.162.3.316

Background  Available data suggest that physicians are accurate in approximately 55% of Current Procedural Terminology (CPT) evaluation and management (E/M) coding for their services. This accuracy is relative to observers' or auditors' assigned codes for these services, a group that has not been studied for their consistency in application of the CPT E/M coding guidelines. The purpose of this study was to determine the level of agreement of certified coding specialists in their application of CPT E/M coding guidelines.

Methods  Three hundred certified professional coding specialists randomly selected from the active membership of the American Health Information Management Association were sent 6 hypothetical progress notes of office visits along with a demographic survey. The study group assigned CPT E/M codes to each of the progress notes and completed the demographic survey.

Results  Coding specialists agreed on the CPT E/M codes for 57% of these 6 cases. The level of agreement for the individual cases ranged from 50% to 71%. Relative to the most common or consensus code, undercoding of established patients occurred more commonly than overcoding. In contrast, for new patient progress notes, overcoding relative to the consensus code was more common than undercoding.

Conclusions  There is substantial disagreement among coding specialists in application of the CPT E/M coding guidelines. The results of this study are similar to results of prior studies assessing physician coding accuracy, suggesting that the CPT coding guidelines are too complex and subjective to be applied consistently by coding specialists or physicians.

DURING THE past decade, the Health Care Financing Administration (HCFA) has revised the Current Procedural Terminology (CPT) coding guidelines in an effort to clarify the work of physicians. Prior to 1992, fee schedules for physician's services were determined by a customary and reasonable charge method.1 In 1992, this fee schedule was changed and replaced by a system based on relative-value units and conversion factors. To implement this system, a new CPT coding system was used, and in 1995, HCFA developed guidelines for use of this new CPT coding system. Use of CPT evaluation and management (E/M) codes for a patient visit requires determining the level of history taking, physical examination, and medical decision making for the patient and matching the combination of these 3 elements to the proper CPT E/M code. These guidelines provide physicians and insurance carriers a format for determining the proper coding level based on medical record documentation. Continued efforts to refine and standardize the guidelines across specialties led to development of new guidelines in 1997 and plans for additional guidelines in 1999, which are still in development.

In today's climate of health care regulation, the accuracy of how physicians use CPT coding to define their E/M services is receiving more attention. Since coding is connected to reimbursement, there is concern that financial incentives might lead to coding inaccuracies. However, inaccuracies in coding might also stem from the complexity of the revised coding systems rather than a financial motivation to overcode.

Current data suggest that physicians code improperly, with conflicting data on the net economic impact of this inaccuracy. Information from HCFA and the American Academy of Family Physicians indicates that family practitioners undercode for services, resulting in a loss of potential revenues.1 Conversely, the Office of the Inspector General2 recently issued a release citing $20 billion of Medicare overpayments with 29% owing to improper coding for physician's services. A recent study using trained observers and current CPT guidelines found that physicians agreed with the observers' codes for established patients 55% of the time.3 Errors were almost equally divided between undercoding and overcoding. Kikano and colleagues4 noted similar results for established patient visits but found that for new patient visits, physicians tended to overcode. Using medical record auditors, Zuber and associates5 found a higher level of undercoding for established patient visits than seen in prior studies. However, in this study, the 3 auditors (physician faculty, resident, and professional coder) agreed with each other only 31% (1995 guidelines) and 44% (1998 guidelines) of the time, similar to the interrater reliability findings in the study by Kikano et al.4

These studies suggest that despite revisions in the current coding system, physicians continue to have trouble using the CPT E/M coding guidelines correctly. One explanation for inaccurate coding may be a system that is too complex and subjective to be applied uniformly.6,7 Ultimately, a physician's coding accuracy is judged by experts who audit physician medical charts and examine if the coding level reflects the documented services provided. This assumes that the experts can apply these codes uniformly. However, despite the financial and legal implications of the assumption that the coding system can be applied consistently by coding specialists, there is little research examining the agreement among expert coders in their interpretation of HCFA guidelines.

In this study, we examined the consistency and variability of the current CPT coding guidelines when applied by certified coding specialists to medical records for outpatient visits. In addition, we sought to see if characteristics such as years experience in coding, time per week spent coding, number of records coded per week, and type or location of practice are associated with coding accuracy. The results might help to define the complexities of the coding guidelines as well as assist in determining a natural background error rate for coding. This may help distinguish between fraudulent billing practices vs the difficulty of applying a complex system with perfect accuracy.


The study group consisted of 300 certified coding specialists–physician based selected randomly by the American Health Information Management Association (AHIMA) from active members. The AHIMA is 1 of 2 major professional organizations that provide education and certification programs in medical coding. Coding specialists with the certified coding specialists–physician based status were chosen because this certification indicates training and certified competency through testing in physician office–based CPT and International Classification of Diseases coding. The membership of AHIMA was chosen because they represent a heterogeneous group, including coding specialists from urban, suburban, and rural settings, as well as from different practice models. In addition, AHIMA endorsed the study and provided a mailing list of the 300 randomly selected active members.

Six cases presented as hypothetical progress notes were developed representing different levels of service as well as new and established patient visits. The following 6 problems were chosen for these progress notes: pneumonia, leg cramps/hypertension, deep vein thrombosis (follow-up), exercise-induced asthma, gastroenteritis, and sinusitis/hypertension. These were selected because they represent common problems encountered by family physicians. A sample note is presented in Table 1 (copies of the other notes available from the authors on request). The patient cases were labeled as "new" or "established," and only the appropriate CPT codes were provided as choices for selection. For example, codes 99201 through 99205 were provided for cases of new patients and codes 99211 through 99215 were provided for cases of established patients.

These cases were then peer-reviewed by family physician faculty at Northwestern University Medical School for completeness and to assess the authenticity in representing actual patient cases.

In addition, a brief survey was developed with demographic and practice characteristics that might be associated with coding ability. Items were generated using information derived from the literature and expert opinion. For example, practice location was included in the survey since a prior study indicated that practice location influences physician coding.8 The survey instrument was piloted among coding specialists from AHIMA for content validity and reliability. Feedback from the coders was then incorporated into a final survey instrument.

The survey instrument and cases were mailed, with a self-addressed return envelope and cover letter to the study participants. The cover letter briefly described the project and contained the endorsement of AHIMA. Because of the potential sensitive nature of coding errors, complete anonymity was ensured. Instructions were provided to complete the survey and to code the office visit cases with a CPT E/M code based on the documentation found in the sample progress notes, using the 1997 CPT E/M coding guidelines. Participants were allowed to use whatever resources they might typically use in their own practice (eg, books, articles) to code the sample notes. An incentive of $25 dollars was provided for individuals who completed the survey. After 1 month, nonresponders received a second mailing. Two additional mailings were sent to nonresponders.

The "correct" or consensus CPT E/M code was defined as the coding level most commonly agreed on for each case. Coding accuracy was defined as the number of cases coded correctly, or in agreement with the consensus codes for the 6 cases. To compare the coding specialist's responses on the new cases vs the established cases, a frequency count of the cases coded correctly, overcoded, and undercoded was completed across the 3 new cases and across the 3 established cases. Individual performances were evaluated by summing the number of the 6 cases coded correctly.

Descriptive statistics were used to summarize the sample characteristics. An analysis of variance was used to compare groups when appropriate. Scoring on the new cases vs the established cases were analyzed using the nonparametric Wilcoxon matched pairs test.


Of the 300 mailing labels provided, 294 had complete information needed for mailing. Of the 294 surveys sent, 2 were returned as undeliverable, leaving a study group of 292. A total of 136 of 292 eligible for study returned the survey for a response rate of 46%. Thirty-six responders gave incomplete demographic information; however, coding was completed for the cases. In addition, 5 individuals completed coding for all but 1 of the cases. The results of these individuals are included in the analysis of CPT coding accuracy.

Table 2 summarizes the characteristics of the study group. As given in Table 2, the group averaged 10.9 years of coding experience, with an average of 8.3 years of experience coding in physicians' offices. The mean number of hours per week spent coding was 24.9 hours. The average number of records coded per week was 278. Fifteen percent of the coders coded only for primary care physicians, 42% only for specialist physicians, and 43% for primary care and specialist physicians. All the coders were certified as certified coding specialists–physician based, and 48.5% had an additional coding certification status. Thirty-five percent of coders were located in urban practices, 29% in suburban practice, and 16% in rural practices.

Coding results of the 6 cases are given in Table 3. The agreement among the coders in assigning CPT codes ranged from 50% to 71% across the cases. The level of overall agreement for all of the cases was 58.7%. The frequency of overcoding, undercoding, and correct coding is presented in Table 4. New patient progress notes were overcoded in 33% of cases, which is 4 times the rate of undercoding for new patients and twice the rate of overcoding of established patients. Established patient progress notes were undercoded in 25% of cases, which represents 3 times the rate of undercoding for new patient cases. Thus, undercoding occurred significantly more often for established patients, and overcoding occurred significantly more often for new patients (P = .001).

The coders' overall score relative to the consensus coding response is as follows:

Seven percent of the coders agreed with the consensus code for all of the cases, and 26% agreed for 5 or more of the 6 cases. Twenty-eight percent of the certified coders were in agreement with the consensus code for less than 50% of the cases; however, 97.7% and 92.5% of responses were within 1 coding level of the consensus response for established patients and new patients, respectively.

Coding accuracy (ie, number of cases coded correctly) was not significantly correlated with years of coding experience, years coding in physicians' offices, practice type, or location. Coding accuracy was correlated −0.31 (P<.001) with hours spent coding and –0.36 (P<.001) with number of records coded per week.


The results of this study suggest that certified coding specialists do not agree on codes using current CPT guidelines. This is a particularly troublesome finding given that the coding specialists involved in this study were all certified by at least 1 professional coding organization and 25% of the coders held certificates from both professional coding organizations.

The experts' codes for established patients were in agreement for 58% of the cases coded, findings similar to a recent study in which physicians' codes for established visits agreed with that of a trained observer 55% of the time.3 The level of agreement among the coders for new patients was also 58%, slightly better than the 48% found for physicians.4 Taken together, these data would suggest that physicians' coding accuracy is not much different from that of trained certified coding specialists.

In addition, the patterns of errors in our study with expert coders were similar to prior physician studies.3-5,9 In our study and others, undercoding is more common in cases of established patients whereas overcoding is more common with new patients. One reason for this discrepancy could be a tendency to apply the same guidelines to all patients, not recognizing or applying the different criteria for new patients. Coding criteria are stricter for new patients, requiring more documentation to establish the same service level. In addition, physicians and coding specialists alike may recognize that caring for new patients requires more effort and that there is more uncertainty in providing this care than for established patients. Thus, physicians and coders may feel that new patients are more difficult and coding levels may reflect this rationale.

Although one might predict that experience would improve coding accuracy, this study found no such association. In addition, no associations between coding accuracy and type of practice or practice location were found. A negative correlation was found between number of hours spent coding and numbers of records coded per week. This suggests that excessive time spent coding and higher volumes of coding may actually compromise accuracy. Another explanation may be that individuals who spend many hours per week coding might tend to work for larger practice groups. In today's climate of potential audits, these larger organizations could create a conservative coding atmosphere that promotes a tendency toward undercoding.

Although coding errors might conceivably relate to financial incentives or potential legal penalties, the format of the study was designed to test the coding specialists' accuracy in coding using hypothetical cases. This design removes any financial or legal incentives for incorrect coding. All coding specialists coded from the same typewritten progress notes, thus removing the discrepancies from attempting to interpret handwritten progress notes or apply the guidelines to different cases of the same coding level. Despite removing these potential sources of coding inaccuracy, the error rate was still high, with 44% of coding specialists agreeing with the consensus response on 3 or fewer cases, and 8% agreeing with the consensus code on 1 or none of the cases. However, only 3% of established patient codes and 8% of new patient codes were more than 1 coding level different from the consensus code. Thus, although there seems to be a high background error rate for CPT coding among coding specialists, most errors are within 1 level of the correct code. This finding is consistent with findings from the physicians coding study by Kikano et al4 who found that physicians' codes differed from reviewers' codes by more than 1 level in fewer than 4% of cases. Unless this intrinsic coding error rate is accounted for, identification of fraudulent coding practices would be extremely difficult.

From our results, it seems that the error rate with CPT coding is substantial for coding specialists as well as physicians. This would suggest that the guidelines themselves are overly complex and open to subjective interpretation which then creates a high inherent error rate. Having separate sets of guidelines for new and established patients may be a contributory factor. One possible solution to minimizing the error rate with CPT coding would be to standardize the coding criteria into 1 set of guidelines for all patients. In addition, decreasing the number of potential codes for each office visit as well as the number of steps required to arrive at a code would limit the potential for error and subjective interpretations. Another proposed solution6 involves using time and new vs established patient status as the deciding factors in arriving at the level of service provided. Finally, given the complexity of the current CPT guidelines, another potential solution is to accept an inherent error rate. Clearly, further study of the CPT coding guidelines is warranted.

Accepted for publication May 8, 2001.

This study was funded by a research grant from the American Academy of Family Physicians Foundation, Leadwood, Kan.

We thank the American Academy of Family Physicians Foundation for their generous grant support and the American Health Information Management Association, Chicago, Ill, for their assistance.

Corresponding author and reprints: Mitchell S. King, MD, Glenbrook Family Care Center, 2050 Pfingsten Rd, Room 200, Glenview, IL 60025 (e-mail:

Sgammato  J HCFA answers questions about its new documentation guidelines.  Fam Pract Manag. 1995;260- 67Google Scholar
Martin  S OIG: $20 billion in "improper" Medicare payments.  American Medical News. May11 1998- 11Google Scholar
Chao  JGillanders  WGFlocke  SAGoodwin  MAKikano  GEStange  KC Billing for physician services: a comparison of actual billing with CPT codes assigned by direct observation.  J Fam Pract. 1998;4728- 32Google Scholar
Kikano  GEGoodwin  MAStange  KC Evaluation and management services: a comparison of medical record documentation with actual billing in a community family practice.  Arch Fam Med. 2000;968- 71Google ScholarCrossref
Zuber  TJRhody  CEMuday  TA  et al.  Variability in code selection using the 1995 and 1998 HCFA documentation guidelines for office services.  J Fam Pract. 2000;49642- 645Google Scholar
Lasker  RDMarquis  MS The intensity of physicians' work in patient visits: implications for coding of patient evaluation and management services.  N Engl J Med. 1999;341337- 341Google ScholarCrossref
Iezzoni  LI The demand for documentation for medicare payment.  N Engl J Med. 1999;341365- 367Google ScholarCrossref
Purvis  JRHorner  RD Billing practices of North Carolina family physicians.  J Fam Pract. 1991;32487- 491Google Scholar
Horner  RDParis  JAPurvis  JRLawler  FH Accuracy of patient encounter and billing information in ambulatory care.  J Fam Pract. 1991;33593- 598Google Scholar