Assessment of Extractability and Accuracy of Electronic Health Record Data for Joint Implant Registries

Key Points Question Are the data in a large US electronic health record (EHR) complete and accurate enough to track trends in implant use and to assess the durability of implants (hereafter referred to as implant survivorship)? Findings In this cohort study, EHR records of patients who had total hip arthroplasty in all Veterans Health Administration hospitals since 2000 were automatically reviewed using novel software; 80% to 95% of hip replacement components used since 2014 were accurately identified, trends in implant use matched known national trends, and known poor implants were found to be negative outliers. Meaning Automated analysis of the EHR provides a low-cost, low-overhead method to assess implant use and performance.


Introduction
Joint replacement registries serve to monitor implant use, complications, and failure and to support recalls and advisories. [1][2][3][4][5][6][7][8][9][10] Implant registries are costly to establish and maintain. 6 To our knowledge, no existing orthopedic implant registry relies entirely on electronic health record (EHR) data. Existing registries require at least some dedicated data entry in addition to infrastructure, security, space, and staff. Because registry data must be tracked for decades, costs are ongoing, and years of investment are commonly needed before the value of a registry can be realized.
The Veterans Health Administration (VHA) is the largest integrated health care system in the United States. Implants placed in the VHA are not followed up in a formal registry. The VHA was an early adopter of a national EHR. 11 The infrastructure of a national EHR is maintained as a requirement for clinical care. However, as with other US health care systems, the VHA does not care for a captured patient population. To our knowledge, it is unknown whether the data contained in the VHA EHR are extractable and interpretable by automated means or whether extracted data would be accurate and complete enough to provide clinically meaningful information similar to what can be provided by a formal implant registry.
In this cohort study, we sought (1) to quantify the extractability and accuracy of registry-relevant procedural information from the VHA EHR when using automated means and assess sources of missing data and error and (2) to assess the ability of these data to track trends in implant use and determine the durability of implants (hereafter referred to as implant survivorship). As tests of data utility, we hypothesized that, by using the VHA EHR, we would find that ceramic prosthetic femoral head use surpassed metal femoral head use at a similar time as reported by the American Joint Replacement Registry (AJRR) and that 2 recalled implants in the VHA EHR would be obvious negative outliers in Kaplan-Meier survivorship.

Methods
This retrospective cohort study of 37 205 patients receiving total hip arthroplasty (THA) at any VHA medical center from 2000 to 2017 was approved by the institutional review board of Stanford University. A waiver of informed consent was granted because the research involved no more than minimal risk to the participants because it involved materials that have been previously collected, the rights and welfare of the participates were not adversely affected because procedures were in place to protect confidentiality, the research would not affect the treatment of patients, and because the research could not be practicably carried out without the requested waiver. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guideline was followed. 12

Identifying the Side of Surgery and Part Numbers
Surgery side was identified from free text populating the Corporate Data Warehouse records entitled "ScheduledProcedure" or "PrincipalPostOpDiagnosis" using custom regular expressions. Prosthesis part numbers occupied a free-text field entitled "ProsthesisModel." Extraneous characters were removed from the contents of this record to create a "cleaned" part number.

Identifying the Prosthesis From the Part Number
The cleaned part number was mapped onto the US Food and Drug Administration Global Unique Device Identification Database (GUDID), which has information on implants marketed after 2013. We enriched the GUDID database by adding part numbers of old implants that were commonly used or of particular interest.
Mapping the part number in GUDID, we identified company name, model, size, and whether the part was a major component of the THA (shell, liner, stem, or prosthetic femoral head) or not (such as screw or hole plug). For one manufacturer, the information in the GUDID did not identify model, so we hard-coded this information.

Aggregating Parts
The first 6 digits of the part number were used to aggregate different-sized parts of the same model. This worked in many cases but not all. Adjustments were made and hard-coded as needed.

Identifying When Implants Were Removed
Knowing the patient, surgery date, side of surgery, and THA components placed, we determined whether patients had a subsequent THA or revision anywhere in the VHA system on the same side.
We identified new THA parts placed at the time of revision. We assumed that if a part, such as a stem, was placed at the time of revision, the prior stem must have been removed. Revisions of revisions were similarly tracked. The Current Procedural Terminology codes for removal of THA (ie, codes 27090 and 27091) identified operations in which implants were removed.

Calculating Kaplan-Meier Curves
Kaplan-Meier curves were calculated for prostheses used at least 100 times. Time to failure was the interval from implantation to removal for each THA component. Time to censor was the interval from implantation to last follow-up in any VHA clinic or to death. The VHA uses the Social Security Death Index, which identifies death occurring in a VHA facility or elsewhere.

Assessment of Data Extractability and Accuracy
We determined the annual number of primary and revision THA procedures and the number and percentage of major parts identified by part number each year. To assess accuracy, a THA surgeon (N.J.G.) reviewed 100 randomly selected THA operations. Surgery date, side of surgery, and part numbers were compared with data collected via automated means. Sources of error were identified. Accuracy, precision, recall, and F1 metrics for the complete process of implant part number extraction, GUDID mapping, and major THA part identification were calculated.

Assessment of Data Utility
To determine whether the trends in implant use were realistic, we tracked metal and ceramic femoral head use and identified when ceramic femoral head use surpassed metal femoral head use. This information was compared with the information reported by the AJRR. 13 We then determined whether Kaplan-Meier curves for recalled implants used in the VHA system could be identified as negative outliers compared with other implants. For this information, we compared the survivorship (ie, durability) of each shell and stem with at least 100 uses with the survivorship of other shells or other stems over the entire time interval. Finally, as an initial evaluation of relative implant survivorship, we categorized shells and stems as having survivorship that was better than, similar to, or worse than all other stems or shells in our data set.

Statistical Analysis
The 95% CIs of a proportion were calculated using the Wilson score interval. 14  GitHub). These Kaplan-Meier curves were deemed to be significantly different when the 95% CIs ceased to overlap.

Results
We identified 45 351 primary and revision THA procedures performed on 37 205 patients; 94.7% of patients were male, reflecting the overwhelmingly male VHA patient population. The mean (SD) age at surgery was 63.7 (10.1) years (range, 21-99 years).

Assessment of Data Extractability and Quality
Primary The manual review of 100 randomly selected surgical procedures served to assess accuracy and completeness of automated data retrieval and sources of error (Table). Surgery side was identified correctly in all cases. Primary and revision procedures were differentiated correctly in 98 cases. All major implant part numbers were correctly identified in 75 surgical procedures, and they were correctly mapped to the GUDID database and fully identified in 51 surgical procedures.
The 100 cases had 391 major parts owing to partial revisions, monoblock shells, and a miscoded hemiarthroplasty. The reviewer found part numbers in the medical records for 352 major parts. The computer identified 322 correct part numbers; 308 part numbers existed in the augmented GUDID database, 288 part numbers were mapped successfully in the augmented GUDID, and 283 parts were ultimately fully described and correctly identified via automated means as one of the major parts of THA. Reasons for data loss and error are shown in the Table.  There were 517 major and minor parts in the 100 reviewed cases. We defined a true positive as when the part was fully and correctly identified as a major part, a true negative when the part was correctly identified as not a major part, a false positive as when identification as a major part was wrong, and a false negative as when the part was not identified as a major part. With these definitions, the manual review revealed 291 true positives, 116 true negatives, 5 false positives, and 105 false negatives in part identification. Accuracy was thus 79%, precision was 98%, recall was 73%, and F1 was 84%.

Assessment of Data Utility
Metal and ceramic prosthetic femoral head use was tracked from 2000 to 2017 (Figure 3). Ceramic femoral head use surpassed metal femoral head use between 2015 (55% were metal femoral implant and 44% were ceramic femoral implant) and 2016 (45% were metal femoral implant and 54% were ceramic femoral implant). In the AJRR, metal was more common before 2015, and ceramic was more common after 2016. The AJRR reported roughly equal rates in 2015 and 2016. 13 We stratified individual implants as performing better than, similar to, or worse than other implants of the same type if their survivorship 95% CIs diverged and remained separated at all time points after diverging (eTables 1 and 2 in the Supplement). Implant-years of follow-up for each prosthesis were also calculated. We identified 6 high-and 3 low-survivorship acetabular shells.  about 98%. The AOANJRR, in a captured patient population, reports a 10-year cumulative percent revision for primary THA for arthritis of 5% and a 10-year cumulative percent revision of all primary hip replacement (partial and total) for all diagnoses of about 7% (93%-95% 10-year survivorship). 2 Figure 4 shows an example of a well-performing acetabular shell ( Figure 4A) and the 2 worstperforming acetabular shells in our data set, the Zimmer Durom ( Figure 4B) and the DePuy ASR ( Figure 4C). Both of these implants have been found to have very poor survivorship in multiple  registries as part of a nonhip resurfacing THA (ASR survivorship of 54%-57% at 10 years; Durom survivorship of 84%-85% at 10 years). 2,5 Our quantitative 10-year survivorship results for the ASR and Durom of approximately 85% and 90%, respectively, are high compared with foreign national registries, which is consistent with all of our quantitative results. The Durom was withdrawn in 2008. 17 The ASR was recalled in 2010. 18 The third poorly performing shell had only 124 shells in the data set and is no longer marketed. No specific concerns regarding that shell had been previously raised in the literature. We identified 6 high-survivorship and 2 low-survivorship femoral stems (eTable 2 in the Supplement). Concern has been raised about 1 low-survivorship stem owing to high initial migration. 19 The other low-surviving stem is used for revision surgery, explaining the low observed survivorship. A, This Kaplan-Meier curve shows an implant that was found to have better implant survivorship (ie, durability) than other shells in our cohort. The numbers "at risk" indicate the numbers of implants that were still being tracked at each time point for the shell of interest and all the shells identified in the Veterans Health Administration (VHA) electronic health record (EHR). B and C, These Kaplan-Meier curves show the 2 worst implants that we could identify in the VHA EHR using automated techniques. Both implants were either recalled or withdrawn from the market and are no longer available.

Discussion
Electronic health records are designed to record individual episodes of patient care and facilitate billing. Data are entered for these purposes. It is thus reasonable to ask whether data in a large EHR are complete and accurate enough to be extracted and repurposed to track implant use and performance and whether this can be done on a large scale by automated means.
In our 100-case manual review, we quantified the extractability and accuracy of registryrelevant data from the VHA EHR and identified areas for improvement. Accuracy, precision, recall, and F1 for fully identifying major implant parts among the 517 total implant parts were 0.79, 0.98, 0.73, and 0.84, respectively. Because these metrics were based on a random sample of cases spanning the full study period, they represent a lower bound on quality metrics that could be expected in later years when data were more complete. Future data capture may improve as bar code scanner use increases and legacy implant use decreases. Developing code to extract part numbers from other locations in the EHR and adding other implant libraries to cross-reference part numbers may further improve performance.
Data extraction and quality were sufficient to identify the demographic characteristics of the patients and the trends in ceramic and metal femoral head use in the United States. Using a novel approach to identify when prosthesis components were explanted, we used 18 years of EHR data to shells, and 2 DePuy Summit stems) are recognized in the AOANJRR as "superior" when paired with other good implants, but the fifth (Zimmer Continuum shell) is identified in the AOANJRR as having a higher than anticipated rate of revision. It is not necessarily expected that survivorship results will be consistent across registries given differences in practice patterns and patient demographic characteristics. Various factors, not simply implant design, may be associated with survivorship.
Implants that we identify as high or low survivorship merit further investigation.
Identifying reasons for implant failure will be important for future development. Current Procedural Terminology codes for removal of all THA components (ie, codes 27090 and 27091) may identify failure due to periprosthetic infection. Other revisions should be due to aseptic failure.
Natural language processing of the operative report may ultimately distinguish revision due to aseptic loosening, recurrent instability, or other aseptic processes.
Although this work begins the process of establishing a mature implant registry for the VHA, it is important to recognize the opportunity to further leverage the rich and diverse EHR data to track implant performance in novel ways. "Level 2" data, such as comorbidities, are difficult to capture in traditional registries 9 but are easily captured with the EHR as the data source. This will facilitate analyses of implant failure that account for confounding factors, such as patient demographics, comorbidities, and socioeconomic status. Furthermore, different metrics of implant failure using novel inputs that are in the EHR but not routinely collected in other implant registries can be investigated. 21 Analysis of unstructured notes with natural language processing or radiographs with machine learning 22-27 may augment traditional means of implant surveillance and facilitate earlier and more sensitive detection of poor implant performance. These are innovations that we believe will contribute to quality enhancement efforts and improve the science of implant assessment and surveillance.

Limitations
There were some limitations to this study. First, we were unable to identify patients who left the VHA system for revision surgery. Although it will be difficult to completely rectify this problem without extensive data sharing agreements among US health care professionals, data sources from Medicare and VHA community care are becoming available and may be used to improve overall data capture.
Second, information to notify veterans of implant recalls or advisories is not available if the primary surgery was outside of the VHA. Third, the veteran patient population is overwhelmingly male. This will normalize slightly with time as younger female veterans age, but the imbalance will remain substantial. Information gained will be useful for internal VHA quality initiatives, but it is not known whether relative implant survivorship results observed in this work are applicable beyond the VHA.
Fourth, we used the VHA EHR, which will be phased out as VHA converts to a commercial EHR.
Commercial EHRs, such as EPIC and Cerner, have records for all of the relevant data fields that were drawn on to perform this analysis, including part number. With equivalent data entry in a commercial EHR, our overall approach should remain applicable.
Finally, this work was performed specifically for THA implants. However, all implant types have searchable part numbers in the GUDID database. Implantation information should thus be trackable for other implant types. If failure of another implant type is treated by replacement with a similar implant, then failure of these implantable prostheses should be similarly trackable. Obviously, the specifics of the method would need to be tailored to the implant of interest and validated.

Conclusions
In this cohort study, extraction of registry-relevant information from the EHR of the largest integrated health care system in the United States was possible using automated means. The quantity and quality of the extracted data were sufficient to track trends in 18-year implant use and identify recalled implants as negative outliers. This approach was low cost and leveraged, in a novel way, the computational infrastructure of the EHR without adding reporting burden to hospital staff.
The general approach that we describe may be applicable to the analysis of other implants in other large EHRs. To facilitate further development, our computer code is freely available in github 28