Comparison of Magnetic Resonance Imaging–Based Risk Calculators to Predict Prostate Cancer Risk

Key Points Question How well do magnetic resonance imaging (MRI)-based risk calculators predict prostate cancer risk among adults in Europe and North America? Findings In this diagnostic external validation study of 2181 patients from 3 unique cohorts, all 4 MRI-based risk calculators (Prospective Loyola University Multiparametric MRI [PLUM], UCLA [University of California, Los Angeles]-Cornell, Van Leeuwen, and Rotterdam Prostate Cancer Risk Calculator–MRI [RPCRC-MRI]) demonstrated good discrimination. The RPCRC-MRI and PLUM models had somewhat better calibration in the European and North American cohorts, while all models were prone to underestimate cancer risk in the advanced serum biomarker cohort. Meaning The results support the use of RPCRC-MRI and PLUM in MRI-based screening pathways, but risk calculators incorporating advanced biomarkers are needed.


Introduction
Magnetic resonance imaging (MRI)-based risk calculators have emerged to replace or augment traditional prostate cancer (PCa) risk prediction models.The use of MRI changes PCa detection through both risk stratification as well as sampling of the prostate.2][3] Two recent population-based randomized clinical trials, the STHLM3MR-2 (Prostate Cancer Detection Using the Stockholm3 Test and MR/Fusion Biopsies) 4 and GÖTEBORG-2 (GÖTEBORG Prostate Cancer Screening 2), 5 have provided level 1 evidence on MRI to augment PCa screening, making the evaluation of MRI-based PCa risk calculators increasingly relevant.
While MRI-based models could be used to improve selection for prostate biopsy, numerous risk calculators exist, and few data are available to determine which models make the most accurate and consistent predictions across different populations and countries.7][8] Last, PCa prevalence varies depending on the underlying risk factors in different populations, which may lead to inconsistent estimates.The exact prevalence among men evaluated for clinical suspicion of harboring PCa may not be known, or it may change over time with modifications to screening practices within a particular institution, city, or country.[11][12][13] Therefore, we compared diagnostic discrimination and calibration directly for the most promising MRI-based PCa risk calculators in the literature within independent external cohorts from Europe and North America.In addition, we compared the performance of the same models within a separate cohort with high utilization of an advanced serum biomarker, Prostate Health Index (PHI), as a reflex test in the PSA screening pathway.

Study Cohorts
This multi-institutional, retrospective diagnostic study was conducted among consecutive patients  The MRI scans were generally multiparametric and 3T in all cohorts except for very rare instances of 3T contraindications leading to 1.5T MRI in the North American cohort.Patients receiving prostate biopsy (transrectal or transperineal) after MRI were included in the present study.The MRIs were graded using the Prostate Imaging-Reporting and Data System (PI-RADS), version 2.0 (scores range from 1 to 5, with higher scores indicating greater suspicion for PCa). 14In the PHI cohort, PHI was used in more than 80% of patients during the decision to pursue MRI and/or biopsy, and biopsy was avoided in more than 30% of men who underwent MRI. 9 Men who did not receive MRI and/or biopsy or who had a prior diagnosis of PCa were excluded for the present study.Complete case analysis was performed.Additional details specific to each cohort have been published

JAMA Network Open | Urology
previously. 3,9,10All cohorts were external to the models selected below.

PCa Risk Calculator Model Selection
The Angeles)-Cornell models. 3,11Other North American models were excluded based on prior evidence of lower performance on external validation or lack of agreement on sharing coefficients. 3,12,13Models were evaluated without recalibration, and predictions were calculated using published or authorprovided coefficients.Validation cohorts represented actual clinical practice in each setting and differed in the degree of biopsy-naive patients and MRI characteristics compared with original development populations for models.

Study Variables
Variables included were in line with requirements to obtain estimates from the

Statistical Analysis
The primary outcome was diagnosis of grade group 2 or higher PCa, which was defined as clinically significant PCa.The AUCs and calibration plots were evaluated to compare the 4 selected models.
A decision curve analysis was conducted to estimate the clinical utility of each model based on net benefit across a broad predefined threshold range of 0 to 40% for harboring clinically significant PCa. 15 All statistics and modeling were performed using R, version 4.  The RPCRC-MRI model was best calibrated in the European cohort, followed by the PLUM model, which underpredicted risk within the predicted probability range of approximately 27% to 45% (Figure 1A-B).The UCLA-Cornell and Van Leeuwen models were prone to overprediction in the European cohort (Figure 1C-D).In the North American cohort, the PLUM and RPCRC-MRI models were reasonably calibrated with minor overprediction for PLUM (approximately 27% to 57% predicted probability range) and underprediction for RPCRC-MRI (approximately 50% to 80%  1E-F).The UCLA-Cornell and Van Leeuwen models were prone to broader overprediction in the North American cohort (Figure 1G-H).

JAMA Network Open | Urology
On decision curve analysis (eFigure in Supplement 1), all models provided similar net benefit in the European cohort, with lowest values across the 10% to 30% threshold for the UCLA-Cornell model.The net benefit for all models relative to a biopsy-all approach was somewhat lower in the North American cohort; there was a higher net benefit for the PLUM and RPCRC-MRI models at a threshold of greater than 22% compared with the UCLA-Cornell and Van Leeuwen models (similar benefit for all at a threshold probability of 10% to 22%).

Advanced Serum Biomarker Screening Cohort
In the PHI cohort, AUCs for clinically significant PCa were slightly higher for the UCLA-Cornell model at 0.83 (95% CI, 0.81-0.85)and the PLUM model at 0.82 (95% CI, 0.80-0.84)compared with the European models with AUCs of 0.80 (95% CI, 0.78-0.82)for the Van Leeuwen and 0.79 (95% CI, 0.77-0.81)for the RPCRC-MRI models (Table 2).For calibration, all models were prone to underprediction, with the UCLA-Cornell model followed by the PLUM model exhibiting the best calibration (Figure 2A-C).Notably, the Van Leeuwen model was very poorly calibrated, with a negative slope and inability to calculate an intercept to obtain a calibration plot.On decision curve analysis, the UCLA-Cornell model demonstrated highest net benefit (Figure 2D).The PLUM and Van Leeuwen models exhibited similar net benefit from threshold probabilities of 15% to 40%, although Van Leeuwen was the lowest-performing model at threshold probabilities less than 15%.

Discussion
7][18] More recently, the STHLM3-MRI and GÖTEBORG-2 screening trials provided level 1 evidence on use of MRI for PCa screening, and while each study evaluated some unique parameters, a primary population of interest was based on a PSA level cutoff of 3 ng/mL or greater and a PI-RADS cutoff of 3 or greater. 4,5,19The MRI-based PCa risk calculators can provide more nuanced risk estimates for PCa screening by considering continuous values for PSA levels and PI-RADS rather than strict cutoffs and incorporating other "free" clinical risk factors such as age, family history, race and ethnicity, and prostate volume.
The present study externally validated and compared the performance of 4 promising MRI-based PCa risk calculators in independent cohorts from Europe and North America as well as a cohort that frequently used an advanced biomarker.Overall, we found all models performed well in the European and North American cohorts, with better calibration for the RPCRC-MRI and PLUM models over the UCLA-Cornell and Van Leeuwen models.However, the UCLA-Cornell and PLUM models performed slightly better than the others in the PHI cohort, with the UCLA-Cornell model demonstrating the best calibration.
Our results suggest that the RPCRC-MRI and PLUM models may be optimal choices for PCa risk prediction when MRI is used without additional advanced biomarkers to guide the decision to perform prostate biopsy.The differences in AUC or calibration between the RPCRC-MRI and PLUM models in the European or North American settings were minimal.However, all models in the present study were developed without the consideration of advanced biomarkers beyond PSA level in the screening paradigm.
In the PHI cohort, where PHI and MRI were both commonly considered before pursuing prostate biopsy, many patients at lower risk did not undergo biopsy.This likely contributed to the overall higher observed PCa risk relative to the predicted expected risk across all 4 models.The PHI cohort was also from North America, with the 2 North American models outperforming the European models in this setting, and the UCLA-Cornell model demonstrating the best calibration.The findings suggest the PHI cohort and UCLA-Cornell model development cohort may have come from settings with more similar screening practices relative to the other cohorts.Notably, while not a variable in the model, a subgroup of patients at UCLA did clinically use percentage of free PSA, which is 1 component of the PHI.
The implication of these findings is that while use of MRI-based PCa risk calculators is justified in conjunction with advanced biomarkers, screening could be further optimized if the advanced biomarker value was directly incorporated into the risk calculator. 20One study recently developed flexible models that consider additional data on percentage of free PSA levels or PHI, when available, in addition to MRI to make risk predictions for biopsy-naive men. 21The Stockholm3 test also serves as a proprietary risk calculator, including biomarker and clinical information without MRI, and although the risk calculator can be augmented with MRI data as an exercise, the additional value over sequential screening may be minor. 22Last, an alternative approach to screening would be to use a  Additionally, we found the Van Leeuwen model was poorly calibrated across all 3 settings we evaluated, arguing against its use.The results suggest an opportunity to consider recalibration of models prior to prospective clinical use if no well-calibrated model with preserved diagnostic discrimination exists in a given population.The PHI cohort is a unique setting where models have not been previously evaluated and constitutes a screening approach that may increase in practice. 20To our knowledge, the present study also provides the first external validation of the UCLA-Cornell risk calculator. 11

Limitations
This study has a few limitations that should be acknowledged.The study was retrospectively designed and conducted among patients receiving MRI and prostate biopsy.While general PCa screening practices in each cohort are known, the rate of PCa among men selected not to receive MRI and biopsy are unknown, with the evaluated models requiring complete data on both MRI and biopsy outcomes.Additional factors that could predict the presence of PCa, such as anterior lesion location, were not evaluated due to lack of inclusion in any evaluated model.6][27] Despite the limitations, the study compared 4 promising MRI-based PCa risk prediction models to identify the optimal risk calculator choice in 3 distinct cohorts.

Conclusions
In this diagnostic study, the PLUM, UCLA-Cornell, Van Leeuwen, and RPCRC-MRI risk calculators had good discrimination in the European (AUC, 0.90) and North American (AUCs, 0.83-0.85)cohorts, with better calibration for the RPCRC-MRI and PLUM models.In a cohort with high use of an advanced serum biomarker, all models were prone to underestimate clinically significant PCa risk, with highest AUC and calibration for the UCLA-Cornell model followed by the PLUM model.The results support the use of the PLUM or RPCRC-MRI models in MRI-based screening pathways regardless of a European or North American setting.However, tools specific to screening pathways incorporating advanced biomarkers as reflex tests are needed due to underprediction by available models.

JAMA Network Open | Urology
without a prior PCa diagnosis receiving multiparametric MRI before prostate biopsy.The European cohort consisted of patients from the Norwegian University of Science and Technology, Trondheim, Norway (January 1, 2016, to December 31, 2017); the North American cohort, patients from the University of Alabama at Birmingham Prospective MRI-Targeted Prostate Biopsy Cohort (January 1, 2015, to December 31, 2020); and the PHI cohort, patients from Northwestern Medicine hospitals affiliated with the Northwestern University Feinberg School of Medicine, Chicago, Illinois (January 1, 2.1 (R Project for Statistical Computing) and Stata, version 15.0 (StataCorp LLC).Additional statistical details and code are provided in the eMethods in Supplement 1.

Figure 2 .
Figure 2. Calibration Plots for the Outcome of Clinically Significant Prostate Cancer and Decision Curve Analysis in the Prostate Health Index Advanced Serum Biomarker Cohort Comparison of MRI-Based Risk Calculators for Prostate Cancer JAMA Network Open.2024;7(3):e241516.doi:10.1001/jamanetworkopen.2024.1516(Reprinted) March 7, 2024 2/11 Downloaded from jamanetwork.comby guest on 03/24/2024 2018, to December 31, 2022).Institutional review board approval was obtained at each institution with a waiver of informed consent owing to the use of deidentified retrospective data.We followed the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) reporting guideline.

Table 1 ,
3,6,11ison of MRI-Based Risk Calculators for Prostate Cancer total of 303 patients were included in the European cohort (0 of African ancestry), 371 in the North American cohort (87 [23.5%] of African ancestry), and 1507 in the PHI cohort (189 [12.5%] of African ancestry), for an overall count of 2181 patients with a median age of 65 (IQR, 58-70) years and median PSA level of 5.92 (IQR, 4.32-8.94)ng/mL.The prevalence of any PCa and clinically significant including biopsy-naive proportions of 239 patients (78.9%) for the European cohort, 178 (48.0%) for the North American cohort, and 1459 (96.8%) for the PHI cohort.Notably, the rate of biopsy-naive patients in the development cohorts for the PLUM model was 550 of 1010 (54.5%); for the UCLA-Cornell model, 1449 of 2354 (61.6%); and for the Van Leeuwen model, 344 of 393 (87.5%).3,6,11While the development cohort for the RPCRC-MRI model had 504 of 961 (52.4%) biopsy-naive patients, models were developed separately for the biopsy-naive subgroup and the subgroup with a history of negative biopsy findings.

Table 2 .
AUC Estimates Based on Receiver Operating Curves for the 4 Selected Models in the North American, European, and PHI Advanced Serum Biomarker Cohorts Figure 1.Calibration Plots for the Outcome of Clinically Significant Prostate Cancer in the European and North American Cohorts Abbreviations: AUC, area under the curve; PHI, Prostate Health Index; PLUM, Prospective Loyola University Multiparametric Magnetic Resonance Imaging (MRI); RPCRC-MRI, Rotterdam Prostate Cancer Risk Calculator-MRI; UCLA, University of California, Los Angeles.
Downloaded from jamanetwork.combyguest on 03/24/2024 risk calculator solely among men with PI-RADS 3 lesions where the most uncertainty exists or consider forgoing systematic biopsy.23,24Afewpriorstudies[25][26][27]haveperformed comparisons of MRI-based PCa risk models.Each of these studies has generally identified the European-based RPCRC-MRI or Van Leeuwen models as the most promising for clinical use, but the studies predated the publication of the North American PLUM and UCLA-Cornell models.Püllen et al 26 included 307 European patients and found RPCRC-MRI provided the greatest net benefit among 3 assessed models, but the Van Leeuwen model was not evaluated.Lee et al 25 included 449 Asian patients and found the Van Leeuwen model provided the greatest net benefit among 6 assessed models where RPCRC-MRI was also included.Finally, Saba et al 27 included 468 European patients and found RPCRC-MRI followed by the Van Leeuwen models provided the greatest net benefit at a threshold probability of 15% among 4 assessed MRIbased models.The present study includes a much larger overall sample of 2181 patients across European and North American cohorts while evaluating models from both settings.We verify prior results by demonstrating the RPCRC-MRI model performed well in a European cohort but extend the findings to the North American setting and show comparable performance to the PLUM model in both.