[Skip to Content]
[Skip to Content Landing]
Views 795
Citations 0
Original Investigation
October 2016

Application of Recursive Partitioning to Derive and Validate a Claims-Based Algorithm for Identifying Keratinocyte Carcinoma (Nonmelanoma Skin Cancer)

Author Affiliations
  • 1Women’s College Research Institute, Women’s College Hospital, Toronto, Ontario, Canada
  • 2Institute for Clinical Evaluative Sciences, Toronto, Ontario, Canada
  • 3Division of Dermatology, Department of Medicine, University of Toronto, Ontario, Canada
  • 4Department of Dermatology, Brown University, Providence, Rhode Island
  • 5Department of Epidemiology, Brown University, Providence, Rhode Island
JAMA Dermatol. 2016;152(10):1122-1127. doi:10.1001/jamadermatol.2016.2609
Key Points

Question  Can a valid algorithm be derived to identify keratinocyte carcinoma at a population level using health insurance claims data?

Findings  By applying recursive partitioning to a data set of 602 371 community laboratory pathology episodes linked to health insurance claims, an algorithm was derived with 82.6% sensitivity, 93.0% specificity, 76.7% positive predictive value, and 95.0% negative predictive value. The derived algorithm also performed well when validated using an independent hospital clinic data set.

Meaning  The derived algorithm can reliably identify keratinocyte carcinoma for epidemiological research in the absence of cancer registry data. Recursive partitioning is an effective tool for deriving valid claims-based algorithms.


Importance  Keratinocyte carcinoma (nonmelanoma skin cancer) accounts for substantial burden in terms of high incidence and health care costs but is excluded by most cancer registries in North America. Administrative health insurance claims databases offer an opportunity to identify these cancers using diagnosis and procedural codes submitted for reimbursement purposes.

Objective  To apply recursive partitioning to derive and validate a claims-based algorithm for identifying keratinocyte carcinoma with high sensitivity and specificity.

Design, Setting, and Participants  Retrospective study using population-based administrative databases linked to 602 371 pathology episodes from a community laboratory for adults residing in Ontario, Canada, from January 1, 1992, to December 31, 2009. The final analysis was completed in January 2016. We used recursive partitioning (classification trees) to derive an algorithm based on health insurance claims. The performance of the derived algorithm was compared with 5 prespecified algorithms and validated using an independent academic hospital clinic data set of 2082 patients seen in May and June 2011.

Main Outcomes and Measures  Sensitivity, specificity, positive predictive value, and negative predictive value using the histopathological diagnosis as the criterion standard. We aimed to achieve maximal specificity, while maintaining greater than 80% sensitivity.

Results  Among 602 371 pathology episodes, 131 562 (21.8%) had a diagnosis of keratinocyte carcinoma. Our final derived algorithm outperformed the 5 simple prespecified algorithms and performed well in both community and hospital data sets in terms of sensitivity (82.6% and 84.9%, respectively), specificity (93.0% and 99.0%, respectively), positive predictive value (76.7% and 69.2%, respectively), and negative predictive value (95.0% and 99.6%, respectively). Algorithm performance did not vary substantially during the 18-year period.

Conclusions and Relevance  This algorithm offers a reliable mechanism for ascertaining keratinocyte carcinoma for epidemiological research in the absence of cancer registry data. Our findings also demonstrate the value of recursive partitioning in deriving valid claims-based algorithms.