Derivation and Validation of a 4-Level Clinical Pretest Probability Score for Suspected Pulmonary Embolism to Safely Decrease Imaging Testing

Key Points Question Can a pretest probability score make it possible to rule out pulmonary embolism solely on clinical criteria and optimized D-dimer measurement to safely decrease imaging testing? Findings In this study, the 4-Level Pulmonary Embolism Clinical Probability Score (4PEPS) was derived and validated using databases from 3 merged management studies. The safety and the efficacy of the 4PEPS strategy was confirmed in 2 external validation cohorts (false-negative rates: 0.71% and 0.89%; absolute reductions in imaging testing: −19% and −22%, respectively). Meaning The 4PEPS strategy may lead to a substantial and safe reduction in imaging testing for patients with suspected pulmonary embolism.

D espite the significant progress of the last decades, diagnosing pulmonary embolism (PE) remains a clinical challenge. The standard diagnostic strategy, based on clinical probability assessment, D-dimer testing, and computed tomography pulmonary angiography (CTPA), is proven to have a very low rate of diagnostic failure. 1 However, there has been a large increase in CTPA for suspected PE. 2,3 The exact reasons are likely multifactorial. The signs and symptoms of PE are very common and unspecific. As such, clinicians fear they might be missing a life-threatening condition and are prone to initiate a diagnostic process. Due to the lack of specificity of D-dimer testing, a large proportion of patients have a false-positive result and require imaging to rule out PE. Finally, CTPA is readily available, fast, minimally invasive, and more sensitive than ventilation/perfusion (V/Q) scans. A slight increase in PE diagnosis has been observed as a result but with no clear benefits in terms of outcome, especially PE-related mortality. 4,5 One explanation is that because more CTPAs are being performed, there is a greater risk of false-positive results or non-clinically relevant diagnoses. Moreover, CTPA exposes patients to risks of allergies, kidney failure, and cumulative radiation-induced cancer. 6,7 Several strategies have therefore been proposed to reduce PE overtesting and overdiagnosis (Table 1). 4,[8][9][10][11][12][13][14] These have proved satisfactory in terms of safety and efficacy, but they are based on different methods of assessing clinical pretest probability (CPP; eg, Wells 9 or revised Geneva 8 scores for PE, Pulmonary Embolism Ruleout Criteria [PERC] strategy, 10 or YEARS strategy 12 ), thus making it difficult to combine them and increasing the risk of misuse in clinical practice.
Our primary aim was to develop and validate a pretest probability score to safely reduce imaging testing by integrating all the previously proposed strategies: the 4-Level Pulmonary Embolism Clinical Probability Score (4PEPS). Our secondary goal was to retrospectively assess the safety of a diagnostic strategy based on this new score and its efficacy in reducing imaging testing.

Study Design
Four levels of CPP for 4PEPS were defined a priori: • Very low CPP, allowing exclusion of PE on clinical criteria only. • Low CPP, allowing exclusion of PE with a high-sensitivity D-dimer level less than 1.0 μg/mL (to convert to nanomoles per liter, multiply by 5.476). • Moderate CPP, allowing exclusion of PE with a D-dimer level less than 0.5 μg/mL or less than the age-adjusted cutoff value (calculated as age × 0.01 μg/mL for patients older than 50 years). • High CPP, not allowing a safe exclusion of PE with D-dimer testing and requiring imaging testing (CTPA or V/Q scan).
To derive the score, we predefined the upper limit for PE prevalence in each CPP category using the bayesian approach and considered 2% as the safety threshold for PE. 13,15,16 The negative likelihood ratios of a D-dimer test using 1.0 μg/mL as the cutoff value and using an age-adjusted cutoff value were established using the results of the YEARS study 12 and the Age-Adjusted D-Dimer Cutoff Levels to Rule Out Pulmonary Embolism (ADJUST-PE) study. 11 They were found to be 0.08 and 0.01, respectively. Accordingly, to achieve a posttest probability less than 2%, the upper limit of PE prevalence was set at 20% for low CPP and at 65% for moderate CPP. The present study was a retrospective analysis of data prospectively collected in 5 studies that were all approved by an ethical committee and performed with the informed consent of the participating patients. According to the current European legislation, an approval of an ethical committee was not required for the present study.

Source of Data
For the derivation and internal validation, we merged 3 prospectively collected databases from patients with suspected PE (n = 11 114). The first study was performed in 117 emergency departments (EDs) in France and Belgium (n = 1529; enrolled in 2003) 17 ; the second study was performed in 20 French EDs (n = 1645; enrolled in 2005 to 2006) 18 ; and the third study was performed in 12 EDs in the US (n = 7940; enrolled in 2003 to 2006). 15 Each database was randomly split into 2 groups, including 60% for the derivation cohort and 40% for the internal validation cohort.
Two other databases were used for external validation. The first study was performed in 6 EDs in France, Belgium, and Switzerland (n = 1819; enrolled in 2005 to 2006) 19 and the second in 12 EDs in France and Belgium (n = 1757; enrolled in 2015 to 2016). 20

Outcome
The outcome was a PE diagnosed on CTPA or highprobability V/Q scan during the initial diagnostic workup or a venous thromboembolism (VTE) occurring during follow-up (3 months for the 4 European studies and 45 days for the US study) in a patient in whom PE was initially ruled out. In all studies, the following were considered as VTE: symptomatic PE objectively confirmed with CTPA or high-probability V/Q scan and/or deep vein thrombosis on compression ultrasonography and/or sudden unexpected death potentially related to PE according to an independent adjudication committee.

Key Points
Question Can a pretest probability score make it possible to rule out pulmonary embolism solely on clinical criteria and optimized D-dimer measurement to safely decrease imaging testing?
Findings In this study, the 4-Level Pulmonary Embolism Clinical Probability Score (4PEPS) was derived and validated using databases from 3 merged management studies. The safety and the efficacy of the 4PEPS strategy was confirmed in 2 external validation cohorts (false-negative rates: 0.71% and 0.89%; absolute reductions in imaging testing: −19% and −22%, respectively).

Meaning
The 4PEPS strategy may lead to a substantial and safe reduction in imaging testing for patients with suspected pulmonary embolism.

4PEPS Derivation
We evaluated all of the clinical variables known to be potentially associated with PE and available in the database. 21 As patients were suspected of PE because of dyspnea or chest pain, these variables were not included. However, we took the variable of dyspnea and chest pain into account when both were present in a given patient. Variables with more than 2% of missing data were excluded, except those included in other prediction rules (PERC strategy, 10 revised Geneva score, 8 and Wells score 9 ). Namely, the following variables were excluded: history of hypertension, diabetes, dyslipidemia, coronary disease, long travel, chronic kidney failure, smoking, family history of VTE, body weight, respiratory rate, and antiplatelet treatment. We categorized the continuous variables according to the cutoff values previously chosen in other scoring systems and according to their clinical relevance. There were 4 categories for age (younger than 50 years, aged 50 to 64 years, aged 65 to 80 years, and older than 80 years), 3 categories for heart rate (less than 80 beats per minute, 80 to 100 beats per minute, and more than 100 beats per minute) and temperature (less than 38°C, 38 to 39°C, and greater than 39°C), and 2 categories for systolic blood pressure (less than 90 mm Hg and 90 mm Hg or greater) and pulse oximetry (Spo 2 ; less than 95% and 95% or greater).
To select the predictor variables associated with PE, we performed a univariate analysis by using the χ 2 test. 22 All variables with a 2-tailed P value less than .20 as well as the nonsignificant variables included in other prediction rules were included in a multivariate logistic regression model.
We performed a stepwise backward analysis including 1 variable for every 10 VTE events. 23,24 We then removed the nonsignificant variables, considering a 2-tailed P value less than .05 as significant. Only significant variables were left in the final score. We assigned points for the score according to the regression coefficients. Finally, we chose the cutoff values to achieve the predefined levels of PE prevalence in each CPP category. 25

4PEPS Validation
The accuracy of the score was assessed by calculating the receiver operating characteristic curve and analyzing the area under the receiver operating characteristic curve (AUC). The AUC confidence interval was computed with the DeLong-DeLong method. 26 Calibration was assessed with the Hosmer-Lemeshow goodness-of-fit statistic. 23 A Brier score was also reported, summarizing the magnitude of error in the probability forecasts as between 0.0 and 1.0, where a perfectly calibrated model would score 0.0.

4PEPS Strategy Safety and Efficacy Assessment
The safety of the 4PEPS strategy was retrospectively assessed using the false-negative rate if the strategy had been applied in the 2 external validation cohorts. This is the rate of PE diagnoses during the initial diagnostic process or VTEs found during the 3-month follow-up among patients with a very low CPP, a low CPP and D-dimer level less than 1.0 μg/ mL, a moderate CPP and D-dimer level less than the ageadjusted cutoff value, or a negative CTPA or V/Q scan. beats per minute (+1.5), clinical signs of deep venous thrombosis (+3), PE is the most likely diagnosis (+3). 9 c PERC strategy: age of 50 years or older (+1), heart rate of 100 beats per minute or greater (+1), room air pulse oximetry less than 95% (+1), unilateral leg edema (+1), hemoptysis (+1), recent surgery or trauma in the past 4 weeks (+1). 10 d ADJUST-PE strategy study: age-adjusted D-dimer cutoff value less than 0.5 μg/mL for patients younger than 50 years and calculated as age × 0.01 μg/mL for patients 50 years or older. 11 e YEARS strategy: 3-factor clinical rule derived from revised Wells score for PE, including clinical signs of deep vein thrombosis (+1), hemoptysis (+1), and PE is the most likely diagnosis (+1). 12 f PEGeD strategy: strategy using the 3-level revised Wells score for PE. 13 We defined the safety threshold of the 4PEPS strategy as a function of PE prevalence applying the recommendations of the International Society of Thrombosis and Hemostasis (1.82 + [0.00528 × prevalence]). 16 The respective PE prevalences in the first and second external validation cohorts were 21.4% and 11.7%, respectively. Thus, the acceptable upper limits of the 95% CI of false-negative rates were predefined at 1.93% and 1.88%, respectively. 16 Finally, the efficacy of the 4PEPS strategy was assessed by the rate of D-dimer and imaging testing, mainly CTPA, that could have been avoided if the 4PEPS strategy had been applied compared with the standard strategy, the PERC strategy, 10 the ADJUST-PE strategy, 11 the YEARS strategy, 12 and the Pulmonary Embolism Graduated D-Dimer (PEGeD) strategy 13 (Table 1).

Missing Data
Analyses were performed including all analyzable patients. Patients with missing data were excluded and no imputation was performed. However, a sensitivity analysis was carried out for the 2 external validation cohorts considering the missing vari-ables of 4PEPS as negative, ie, resulting in the lowest score and so representing the highest risk of a false-negative finding using the 4PEPS strategy.

Statistical Analysis
We calculated the 95% CIs by using the Mid-P exact value performed using OpenEpi version 2, an open-source calculator. All other statistical analyses were performed using SPSS version 25.0 (SPSS Inc).

4PEPS Derivation
A univariate analysis found a statistical association with PE diagnosis for 21 variables. All of these were included in the multivariate regression. In addition, we included the variable of estrogenic treatment since this criterion is present in the PERC strategy. 10 In the multivariate model, age of 65 to 80 years or older than 80 years, pulse rate of 80 to 100 beats per minute, systolic arterial pressure, hemoptysis, cancer, chronic cardiac failure, and pregnancy or post partum were not independently associated with PE. The remaining 13 variables were included in the final model, and we assigned points for each of them according to their regression coefficient. Table 3 represents the final model (4PEPS). The PE prevalence by 4PEPS and the distribution of 4PEPS in the derivation cohort are presented in the Figure and the eTable in the Supplement. According to the predefined cutoff values, a 4PEPS less than 0 corresponds to a very low CPP (less than 2%), a 4PEPS of 0 to 5 corresponds to a low CPP (less than 20%), a 4PEPS of 6 to 12 corresponds to a moderate CPP (less than 65%), and a 4PEPS greater than 12 corresponds to a high CPP (65% or greater) (

4PEPS Validation
For the 3 validation cohorts, the PE prevalence by 4PEPS and the distribution of the 4PEPS are presented in the Figure and the eTable in the Supplement. In the internal validation cohort, the AUC was 0.83 (95% CI, 0.81-0.85). In the first and second external validation cohort, the AUCs were 0.79 (95% CI, 0.76-0.82) and 0.78 (95% CI, 0.74-0.81), respectively. The AUCs and the degree of concordance between the observed and predicted prevalence are presented in the eFigure in the Supplement.

4PEPS Strategy Validation
When the 4PEPS strategy was retrospectively applied in the first and second external validation cohorts, the falsenegative rates were 11 of 1548 (0.71%; 95% CI, 0.37-1.23) and 14 of 1570 (0.89%; 95% CI, 0.53-1.49), respectively. No fatal PE or high-risk hemodynamically unstable PE were observed, and 3 of 11 false-negative VTEs in the high-prevalence cohort and 3 of 14 false-negative VTEs in the moderate-prevalence cohort were subsegmental PE. The upper limit of the 95% CI of the false-negative rate was less than the predefined cutoff value to consider the 4PEPS strategy as safe in the first (1.93%) and second (1.88%) external validation cohorts. Similar results were observed in the sensitivity analyses considering missing variables of 4PEPS as negative (high-prevalence cohort: 11 of 1687; false-negative rate, 0.65%; 95% CI, 0.34-1.13; moderateprevalence cohort: 14 of 1655; false-negative rate, 0.85%; 95% CI, 0.50-1.61).

Discussion
Using 5 multicenter cohorts regrouping more than 12 000 patients suspected of PE, we were able to derive and validate a new clinical probability score to help physicians diagnose PE and safely decrease diagnostic imaging. Applying the 4PEPS diagnostic strategy retrospectively to 2 external validation cohorts, the rate of false-negative tests was below 1%, and the 4PEPS strategy performed better than all previously proposed strategies in terms of reducing imaging testing.
Overuse of CTPA for suspected PE is an important concern. 3 There is increasing evidence that CTPA is frequently used inappropriately in patients for whom the benefits (probability of PE diagnosis and avoiding a PE complication) are outweighed by the risks (probability of a false-positive result, complication of anticoagulation, short-term or long-term adverse The first strategy developed to deal with overtesting was the PERC strategy. 10,15 This can be used for patients for whom the clinician has already established a low clinical probability of PE based on an implicit gestalt impression. A negative PERC strategy finding defines a subgroup of these patients with a very low PE prevalence (less than 2%) allowing PE to be ruled out without any testing. 15 However, applied alone or in association with the revised Geneva score, the PERC strategy appears to be insufficiently reliable. 28,29 The 4PEPS strategy may not have such a limitation. Another means to limit CTPA overuse is to optimize Ddimer testing. The ADJUST-PE study 11 prospectively confirmed the safety and utility of an age-adjusted cutoff value for patients 50 years or older (Table 1). However, the effect of the ADJUST-PE strategy on imaging testing rates remains limited (−10.8% or −5.2% in our high-prevalence and moderateprevalence external validation cohorts, respectively), particularly in young patients. A further proposal, based on the Bayes theorem, is to adjust the D-dimer cutoff value to the pretest probability. 30 This principle was assessed in 2 recent studies, the YEARS study 12 and PEGeD study. 13 Both studies used 1.0 μg/mL as the D-dimer cutoff value for patients with a low CPP, and both achieved a very low overall rate of false-negative testing. Of note, the PEGeD study was the most recent study and has the lowest PE prevalence (7.4%), with 87% of patients having a low CPP. 13 It should be used with caution in a population of patients with a higher PE prevalence. Indeed, recent external validation data of the PEGeD and YEARS strategies in  cohorts of European patients suggest a higher failure rate. 31 Moreover, since the methods of CPP assessment are different in the PERC strategy from the other strategies aiming to reduce overtesting, it is difficult to combine them. [10][11][12][13] For example, to combine the PERC and PEGeD strategies, the physician may have to first assess implicit clinical probability (gestalt); second, if low, the PERC strategy; and third, if positive, the revised Wells score. 10,13 The risk of misuse in clinical practice appears to be major and may have an important impact on safety. For example, although combining clinical gestalt and the PERC strategy has proven to be safe, the rate of failure when combining a low revised Geneva score and a negative PERC strategy finding is higher than 5%. 28 Here lies the main benefit of 4PEPS: a single rule to guide diagnostic strategy resulting in a substantial reduction in testing, especially imaging testing.
Most of the 4PEPS criteria are included in other rules or scores for CPP assessment. Nevertheless, in our study, some potentially relevant criteria were not statistically associated with a PE diagnosis (pregnancy, history of cancer, chronic respiratory disease, hemoptysis). As the derivation database was large (n = 5588), we do not think that this is caused by a lack of power. More probably, we suppose that this result reflects the fact that physicians suspect PE at a very low threshold in patients with these characteristics. 32 The first stage of the diagnostic process is deciding whether to investigate PE or not. This is why the PERC strategy needs to be combined with gestalt and why 4PEPS integrates the item PE is the most likely diagnosis. This criterion is sometimes criticized for a lack of objectivity and reproducibility. Nevertheless, it is included in the Wells score 9 and YEARS strategy, 12 is well-known by the ED physicians, and is easier to explain and to use than gestalt. The inclusion of factors decreasing the probability of PE diagnosis as well as factors increasing it allowed us to derive a 4-level score that rules out PE when negative. The 4PEPS calibration and accuracy of the 4PEPS appear to be at least similar to previous CPP scores for PE (Table 4). 8,33 To facilitate 4PEPS implementation in clinical practice, an internet-application for smartphone and computer has been developed (https://peps.shinyapps.io/PEPS/). 4PEPS will be also incorporated in the new version of the decision-support software SPEED (Suspected Pulmonary Embolism in Emergency Departments; http://www.thrombus.fr/). We have previously shown that, compared with posters and pocket cards, such decision-support systems available on smartphones improves diagnostic decision-making and reduces the number of tests to reach a validated diagnostic decision. 18 4PEPS could also be integrated in the electronic medical record for automated calculation. Using such setups, we believe that 4PEPS will be embraced by ED physicians and will lead to a substantial and safe decrease in imaging testing.

Strengths and Limitations
Our study has several strengths. We used a bayesian evidencebased medicine approach to define the prevalence limit in each CPP category, based on the predefined safety threshold and on the negative likelihood ratio of D-dimer. 34 We followed a wellvalidated method to derive and validate the score and the recent recommendations of the International Society of Thrombosis and Hemostasis to assess the safety of the 4PEPS strategy in ruling out PE. 16,22 The 5 databases of prospective multicenter international studies made it possible to define a large derivation cohort, an internal validation cohort, and 2 external validation cohorts. The results in terms of calibration and accuracy were very similar to each other, with an AUC around 80%. Finally, the safety of the 4PEPS strategy was confirmed in an external validation cohort with a moderate PE prevalence (11.7%) as well as in an external validation cohort with a high PE prevalence (21.5%). This reinforces the generalizability of our results.
Nevertheless, our study has some limitations. The studies used to derive and validate 4PEPS were all performed in ED settings and so 4PEPS may be not suitable for inpatients. Some variables were not systematically collected in these studies. They could not be included in our analyses. We also did not include patients with missing variables. However, the population for each cohort remains large, and similar results were obtained in the sensitivity analyses considering missing 4PEPS variables as negative. The score comprises 13 criteria that may be difficult to memorize, reinforcing the usefulness of an application for computer or handheld devices. Additionally, although we used clinical data from several prospective studies, we calculated this new score retrospectively. The 4PEPS strategy needs to be formally validated in a prospective implementation study.

Conclusions
In conclusion, using a bayesian approach, we derived a new 4-level clinical probability score (4PEPS) to help ED physicians make decisions regarding patients suspected of PE. The accuracy, safety, and efficacy of the 4PEPS strategy were confirmed in 2 independent external validation cohorts, one with a moderate PE prevalence and the other with a high PE prevalence. For both cohorts, applying 4PEPS resulted in a very low rate of diagnostic failure and a substantial reduction in imaging testing. It should now be tested in a formal outcome study.