PREMM1,2 indicates Prediction of Mutations in MLH1 and MSH2 model; HNPCC, hereditary nonpolyposis colorectal cancer. Model is available online at http://www.dfci.org/premm. Adapted with permission from the Dana-Farber Cancer Institute.
PREMM1,2 indicates prediction of mutations in MLH1 and MSH2 model. A, Receiver operating characteristic curve illustrating sensitivity and specificity of the PREMM1,2 model at different cutoffs for predicted probabilities. The square represents the sensitivity and 1 − specificity value for fulfillment of the Amsterdam II Criteria and the triangle represents the sensitivity and 1 − specificity value for fulfillment of the revised Bethesda Guidelines at single cutoff points, as these criteria are dichotomous. B, Receiver operating characteristic curve illustrating sensitivity and specificity of the PREMM1,2 model and the Leiden model at different cutoffs for predicted probabilities.
Customize your JAMA Network experience by selecting one or more topics from the list below.
Balmaña J, Stockwell DH, Steyerberg EW, et al. Prediction of MLH1 and MSH2 Mutations in Lynch Syndrome. JAMA. 2006;296(12):1469–1478. doi:10.1001/jama.296.12.1469
Context Lynch syndrome is caused primarily by mutations in the mismatch repair genes MLH1 and MSH2.
Objectives To analyze MLH1/MSH2 mutation prevalence in a large cohort of patients undergoing genetic testing and to develop a clinical model to predict the likelihood of finding a mutation in at-risk patients.
Design, Setting, and Participants Personal and family history were obtained for 1914 unrelated probands who submitted blood samples starting in the year 2000 for full gene sequencing of MLH1/MSH2. Genetic analysis was performed using a combination of sequence analysis and Southern blotting. A multivariable model was developed using logistic regression in an initial cohort of 898 individuals and subsequently prospectively validated in 1016 patients. The complex model that we have named PREMM1,2 (Prediction of Mutations in MLH1 and MSH2) was developed into a Web-based tool that incorporates personal and family history of cancer and adenomas.
Main Outcome Measure Deleterious mutations in MLH1/MSH2 genes.
Results Overall, 14.5% of the probands (130/898) carried a pathogenic mutation (MLH1, 6.5%; MSH2, 8.0%) in the development cohort and 15.3% (155/1016) in the validation cohort, with 42 (27%) of the latter being large rearrangements. Strong predictors of mutations included proband characteristics (presence of colorectal cancer, especially ≥2 separate diagnoses, or endometrial cancer) and family history (especially the number of first-degree relatives with colorectal or endometrial cancer). Age at diagnosis was particularly important for colorectal cancer. The multivariable model discriminated well at external validation, with an area under the receiver operating characteristic curve of 0.80 (95% confidence interval, 0.76-0.84).
Conclusions Personal and family history characteristics can accurately predict the outcome of genetic testing in a large population at risk of Lynch syndrome. The PREMM1,2 model provides clinicians with an objective, easy-to-use tool to estimate the likelihood of finding mutations in the MLH1/MSH2 genes and may guide the strategy for molecular evaluation.
Lynch syndrome (also called hereditary nonpolyposis colorectal cancer) is the most common hereditary colorectal cancer syndrome in Western countries, accounting for 2% to 5% of all colorectal cancers (CRCs).1,2 Lynch syndrome is associated with underlying mutations in the mismatch repair system,3,4 most commonly in the MLH1 and MSH2 genes.5 Existing clinical criteria to identify Lynch syndrome families include the Amsterdam Criteria6 and Bethesda Guidelines,7 and these have been updated, modified, and revised by authorities in the field.8,9 However, the Amsterdam Criteria and some components of the Bethesda Guidelines remain complex, and the relative importance of the specific aspects of personal and family history included in these guidelines are unclear. In hereditary breast-ovarian cancer syndrome, multiple models have been developed to predict mutations in the BRCA1 and BRCA2 genes,10,11 and these models are widely implemented by health care professionals as they assess their patients' genetic risk.
Using data from a large cohort of individuals undergoing genetic testing of MLH1 and MSH2, we developed a clinical model, the PREMM1,2 model (Prediction of Mutations in MLH1 and MSH2) to predict the presence of mutations in the MLH1 and MSH2 genes based on personal and family history of individuals. For practical application, we have made it available in a Web-based format, so it can be easily accessible to clinicians evaluating individuals with a personal or family history suggestive of Lynch syndrome.
The original cohort for model development consisted of 1219 consecutive unrelated probands who submitted blood samples for full gene sequencing of MLH1 and MSH2 to Myriad Genetic Laboratories Inc, Salt Lake City, Utah, starting in 2000. Testing was ordered by health care professionals (mainly geneticists, oncologists, gastroenterologists, or gynecologists) for individuals with a personal or family history suggestive of Lynch syndrome. Data were obtained from the test order form (completed by the health care professional ordering genetic testing) and included the patient's age, sex, and ancestry as well as specific details about personal and family cancer history. We excluded 278 probands for whom the personal and family cancer history were not available and 43 probands who reported a personal history of a Lynch syndrome–associated diagnosis but not age at diagnosis, leaving 898 probands included in the analysis.
Among these 898 probands, there were 2382 relatives reported. We narrowed this group of relatives to include only those who fulfilled the following criteria: (1) first- or second-degree relatives of the proband; (2) affected with Lynch syndrome–associated cancers (of the stomach, ovaries, urinary tract, small intestine, pancreas, bile ducts, brain [glioblastoma multiforme], or sebaceous glands) or colonic adenomas; (3) on the affected side of the family; and (4) age at diagnosis known. This left a total of 1618 reported relatives in the final cohort.
The validation cohort consisted of 1057 consecutive unrelated probands who submitted blood samples for full gene sequencing and large rearrangement analysis of MLH1 and MSH2 genes to the same diagnostic laboratory after August 2004. Personal and family history data were obtained in the same way as described for the development cohort. After excluding 41 individuals who did not meet the aforementioned criteria, the validation cohort included 1016 probands.
The test order form used in both cohorts asks specifically for maternal or paternal origin of each relative. When both sides of the family were affected by Lynch syndrome–associated tumors (which occurred in 3 instances of 1914 kindreds), the family history was carefully reviewed to make an assessment of the lineage most likely to be affected. Ethnicity was classified based on the information provided by the health care professional using prespecified categories on the test order form. These data are included because they are relevant for generalizability of the results and demonstrate the heterogeneity of the study population.
The study was investigator-initiated. Data collection and model development occurred independently: collection of clinical data and molecular analyses occurred at Myriad Genetic Laboratories, and an anonymized data set was provided to Dana-Farber/Harvard Cancer Center investigators for all further data analyses. The statistical analysis was conducted by clinical researchers (J.B. and D.H.S.) and an independent statistician (E.W.S.) not affiliated with Myriad Genetic Laboratories. The study was reviewed and approved by the Dana-Farber/Harvard Cancer Center institutional review board; a waiver of informed consent for study participants was obtained because the analysis was performed on deidentified data, without the need for patient contact.
From each sample of blood, DNA from white blood cells was extracted and purified, amplified by polymerase chain reaction, and directly sequenced in forward and reverse directions. For the MLH1 gene, approximately 2300 base pairs were sequenced, comprising 19 exons and approximately 560 adjacent noncoding intronic base pairs. For the MSH2 gene, approximately 2800 base pairs were sequenced, comprising 16 exons and approximately 470 adjacent noncoding intronic base pairs. Chromatographic tracings of each amplicon were analyzed by a proprietary computer-based review followed by visual inspection and confirmation. Genetic variants were detected by comparison with a consensus wild-type sequence constructed for each gene. All potential genetic variants were independently confirmed by repeat polymerase chain reaction amplification and sequencing.
For large rearrangement analysis, aliquots of genomic DNA were digested individually with 3 restriction enzymes or combination of enzymes for MLH1 analysis and 3 restriction enzymes or combinations of enzymes for MSH2 analysis. Digested DNA was electrophoresed in an agarose gel, transferred to a membrane, and hybridized with a gene-specific probe labeled with phosphate 32. The probe binds to all fragments containing coding sequences of that gene. Autoradiographs and phosphorimages were produced and analyzed for the presence of novel bands and for fragment dosage, from which it was determined which, if any, exons had been deleted or duplicated. Positive and negative controls were run with each batch. All potential mutations were independently confirmed.
Mutations were classified as deleterious, suspected deleterious, uncertain, favor polymorphism, or polymorphism. All nonsense and frameshift mutations that occurred at or before amino acids 733 and 888 of MLH1 and MSH2, respectively, were considered to be deleterious. In addition, specific missense mutations and noncoding intervening sequence mutations were considered to be deleterious on the basis of data derived from linkage analysis of high-risk families, functional assays, biochemical evidence, and/or demonstration of abnormal messenger RNA transcript processing. Genetic variants for which the available evidence indicates a likelihood, but not proof, that the mutation is deleterious were classified as “suspected deleterious.” Examples include mutations that occur at the conserved locations of splice acceptors and splice donors. Missense mutations, mutations that occurred in intronic regions whose clinical significance has not yet been determined, and nonsense and frameshift mutations that occurred distal to amino acid position 733 of MLH1 and distal to amino acid position 888 of MSH2 were considered to be variants of uncertain significance. Genetic variants that are highly unlikely to contribute substantially to cancer risk were considered to be polymorphisms. For the purposes of this study, we classified individuals found to have either deleterious or suspected deleterious mutations as mutation-positive. Those with all other genetic variants and polymorphisms were included in the mutation-negative group.
Variables related to the proband were the presence and age at diagnosis of CRC, colonic adenomas, endometrial cancer, and other Lynch syndrome–associated cancers (the latter were considered as 1 group). Variables related to the family history included the number of relatives with CRC, endometrial cancer, and other Lynch syndrome–associated cancers, the relationship to the proband (first- vs second-degree), the minimum age at diagnosis for each cancer in the family, and the presence of a relative with more than 1 Lynch syndrome–associated cancer. Because adenomas were reported in only 5% of the relatives (79/1618), we were concerned that this information was unreliable and we therefore did not analyze the effect of adenomas in relatives. Age was treated as a continuous variable, and the effect of age was analyzed separately for each diagnosis. In probands diagnosed as having the same cancer more than once, the age at diagnosis was defined as the youngest age. Restricted cubic spline functions in logistic regression models were used to explore the possibility that the effect of age at diagnosis was nonlinear.12 In relatives, the minimum age and mean age for any given diagnosis in the family appeared to have similar effects, so we chose to use minimum age for ease of use in clinical practice.
We used univariate analyses to determine how best to include each element of personal and family history in a single multivariable model. We created a variable for probands with 2 or more separate CRCs since this group was reasonably large and had a high prevalence of mutations. Similarly, for relatives with CRC and endometrial cancer, we created variables indicating both the number of affected relatives (1 vs ≥2) and their relationship to the proband (first- vs second-degree). We included 2 variables for each diagnosis in the multivariable model: an indicator variable for the presence or absence of that diagnosis and a variable relating to the age at diagnosis. Finally, the magnitude of the age effect for each diagnosis is presented in decades rather than years for ease of interpretation.
We aimed to create a prediction rule that would be simpler to use than a full multivariable model and would generate more robust predictions of mutation risk. All decisions about model specification were based on the development cohort. First, we critically assessed all variables in the model with a P value greater than .20 and eliminated 3 that did not achieve this P value: age at diagnosis of other Lynch syndrome cancers in the proband, minimum age at diagnosis of other Lynch syndrome cancers in the relatives, and the presence of a relative with multiple cancers. Second, we combined clinically similar age variables with similar statistical effects. One such composite variable included the effects of the age at diagnosis of CRC or adenoma in the proband as well as the minimum age at diagnosis of CRC in first- and second-degree relatives, and the other reflected the effects of age at diagnosis of endometrial cancer in the proband and first- and second-degree relatives. Finally, we created summary variables for each cancer diagnosis in relatives, in which second-degree relatives were weighted to have half the effect of first-degree relatives.
The modeling process was internally validated by bootstrap resampling. Two hundred random samples were drawn with replacement; predictive models were developed in each sample, including variable selection, and evaluated in the development cohort.13,14 For external validation, we assessed the performance of the prediction rule derived from the development cohort in the validation cohort. An updated version of the prediction rule was based on logistic regression coefficients estimated from both cohorts, after testing for differences in effects between the development and validation cohorts by statistical interaction terms (“interaction by cohort”).
To test the accuracy of the updated model in predicting MLH1 or MSH2 mutations, we categorized predicted probabilities of mutation into 5 prespecified but arbitrary categories: 5% or less, 5.1% to 10%, 10.1% to 20%, 20.1% to 40%, and more than 40%. Sensitivity and specificity were calculated and were plotted in a receiver operating characteristic curve. We also included the sensitivity and specificity for the Amsterdam Criteria and revised Bethesda Guidelines6-9 and assessed predictions made using the Leiden model for 1086 probands with CRC.15 Multivariable modeling was performed using SAS version 8 software (SAS Institute Inc, Cary, NC), and internal and external validation were performed using S-Plus version 6 software (Insightful Corp, Seattle, Wash). Discrimination between patients with and without mutations was quantified by the area under the receiver operating characteristic curve (AUC), calculated with 95% confidence intervals (CIs). Calibration was assessed graphically and by the Hosmer-Lemeshow goodness-of-fit statistic.13
The median ages of individuals undergoing genetic testing were 49 and 50 years in the development and validation cohorts, and 63% and 73% of the 898 and 1016 probands were women, respectively (Table 1). Patients were mainly of European ancestry, but other ancestries were also represented, including Latin American, African, Asian, Native American, and Middle Eastern. Ordering health care professionals were mainly geneticists (28% and 26%, respectively) and oncologists (36% and 41%, respectively), but other specialties (ie, gastroenterologists and gynecologists) were also represented. The majority of tests (98% and 99%) were ordered from within the United States, with tests ordered from all 50 states. Overall, 29% and 27% of patient histories fulfilled the Amsterdam I/II Criteria, and 57% and 53% met one of the revised Bethesda Guidelines, respectively. There were no significant differences between demographic and clinical characteristics in the 2 cohorts.
In the development cohort, 14.5% (130/898) of the study individuals were found to have mutations: 6.5% (58/898) had mutations in MLH1 and 8.0% (72/898) had mutations in MSH2. In the validation cohort, the overall prevalence of mutations was 15.3% (155/1016): 5.3% (54/1016) had mutations in MLH1 and 9.9% (101/1016) had mutations in MSH2. Of the 155 mutations detected in the validation cohort, 113 (73%) were point mutations and 42 were large rearrangements, the majority in MSH2 (83% [35/42]). Mutations were particularly prevalent among probands with 2 or more separate CRCs (45% and 44%, respectively), endometrial cancer (24% and 30%), other Lynch syndrome–associated cancers (22% and 19%), and multiple diagnoses (28% and 39%) (Table 2). The prevalence of mutations in the probands increased with increasing numbers of first-degree relatives with CRC or endometrial cancer. As expected, probands with mutations had a younger mean age at CRC diagnosis than those who did not have mutations, and the age at diagnosis of CRC and endometrial cancer was also younger among the relatives of probands with mutations (Table 3). In probands, the age difference was most apparent for CRC and colonic adenomas, and in relatives, it was most apparent for CRC and endometrial cancer.
In the multivariable model (Table 4), the risk of finding a mutation was similarly increased in probands diagnosed as having 1 CRC (odds ratio [OR], 2.2; 95% CI, 1.9-2.5), endometrial cancer (OR, 2.5; 95% CI, 2.1-3.1), or other Lynch syndrome–associated cancers (OR, 2.1; 95% CI, 1.7-2.5). Probands with adenomas also had a significantly increased risk of a mutation, with an OR of 1.8 (95% CI, 1.5-2.2). Probands with metachronous or synchronous CRC had a very high OR at 8.2 (95% CI, 5.6-12.0). Among relatives, both the presence and number of first-degree relatives with CRC and endometrial cancer strongly increased the risk of finding a mutation in the proband. Diagnosis of CRC or endometrial cancer at a younger age was clearly associated with an increased risk of finding a mutation (OR per decade younger at time of diagnosis, 1.5; 95% CI, 1.5-1.5).
The multivariable model had an AUC of 0.79 (95% CI, 0.76-0.83) at internal validation. The effects were similar in the development and validation cohorts for most predictors (Table 4). However, effects were significantly larger in the validation cohort for one or multiple CRCs in the proband (interaction by cohort, P = .002 and P = .02) and for endometrial cancer in the proband (interaction by cohort, P = .01) without any obvious reason. When only point mutations were considered, external validation of the model in the validation cohort showed an AUC of 0.79 (95% CI, 0.74-0.83), as previously predicted with bootstrapping. Interestingly, the AUC increased to 0.80 (95% CI, 0.76-0.84) when large rearrangement mutations were accounted for in the validation cohort, reflecting that some patients had previously been misclassified as not having a mutation.
The updated prediction rule was based on the combination of the development and validation cohorts (Table 4). A small effect for cohort was incorporated (OR, 1.28), reflecting the higher prevalence of mutations due to rearrangement analysis in the validation cohort. The equation with the variables included in this updated prediction rule is presented in the Box. The Web-based clinical model is shown
in Figure 1 and is accessible to health care professionals at the Dana-Farber Cancer Institute Web site (http://www.dfci.org/premm).
Predicted probability of a mutation in MLH1 or MSH2 = 1/[1 + exp (−L)], where L= −3.87 + 1.33V1 + 2.78V2 + 1.44V3 + 0.59V4 + 0.41V5 + 0.951V6 + 1.27V7 + 0.964V8 + 2.48V9 + 0.404V10 − (0.358)V11/10 − (0.293)V12/10.
V1 = presence of CRC in the proband; V2 = 2 or more CRC in the proband; V3 = endometrial cancer in the proband; V4 = other HNPCC cancer in the proband; V5 = adenoma in the proband; V6 = 1 for presence of 1 CRC in first-degree relative + 0.5 for presence of CRC in second-degree relatives; V7 = 2 or more first-degree relatives with CRC; V8 = 1 for presence of 1 first-degree relative with endometrial cancer + 0.5 for presence of any second-degree relatives with endometrial cancer; V9 = 2 or more first-degree relatives with endometrial cancer; V10 = first- or second-degree relatives with other HNPCC cancer; V11 = sum of ages at diagnosis of CRC/adenoma; V12 = sum of ages at diagnosis of endometrial cancer.
For each diagnosis, brackets are interpreted as [diagnosis] = 1 if the proband or relatives have had the diagnosis, [diagnosis] = 0 otherwise.
For V11 and V12, ages at diagnosis are calculated as [youngest age at diagnosis in years − 45] if the proband or relatives have had the diagnosis. For V11, we consider the sum of 4 ages at diagnosis: age at diagnosis of CRC in the proband, age at diagnosis of adenoma in the proband, age at diagnosis of CRC in a first-degree relative, age at diagnosis of CRC in a second-degree relative. For V12, we consider the sum of 3 ages at diagnosis of endometrial cancer: age in the proband, age in a first-degree relative, and age in a second-degree relative.
If a proband or relative has had a given diagnosis, but the age at diagnosis is unknown, then the age at diagnosis should be estimated. If no age is entered, the model defaults to age at diagnosis = 45 years.
Abbreviations: CRC, colorectal cancer; HNPCC, hereditary nonpolyposis colorectal cancer (other HNPCC-associated cancers: stomach, ovaries, urinary tract, small intestine, pancreas, bile ducts, brain [glioblastoma multiforme], sebaceous glands).
Upon grouping by predicted likelihood of carrying a mutation, patients in the combined cohort were distributed reasonably evenly across 5 categories of risk, with a predicted risk of mutation of 5% or less for 482, 5.1% to 10% for 540, 10.1% to 20% for 460, 20.1% to 40% for 282, and greater than 40% for 150. The model demonstrated excellent ability to discriminate between risk groups (Table 5) with an AUC of 0.80 (Figure 2A). Sensitivity and specificity depended on the cutoff used for the predicted risk of mutation. If a low cutoff, such as 5%, was used, many patients would be considered for testing, with a sensitivity of 94% but a specificity of 29%. If a high cutoff, such as 40%, was used, specificity would be much better (92%), but many patients with mutations would be missed (sensitivity of 29%). Fulfillment of the Amsterdam II Criteria had a sensitivity of 63% with a 78% specificity, while the revised Bethesda Guidelines had a 74% sensitivity with a specificity of 48% (Figure 2A). Of the 1914 individuals, 105 and 75 mutation carriers did not fulfill the Amsterdam II Criteria or the revised Bethesda Guidelines, respectively, and therefore would not have been tested if only these criteria had been considered (Table 6). Compared with the revised Bethesda Guidelines, a 10% cutoff led to testing of fewer patients (47% vs 55%), while missing fewer mutation carriers (15% vs 26%). A safer cutoff of 5% led to more testing (75%) and a lower miss rate (6%). For 1086 probands with CRC, the Leiden model had an AUC of 0.755 compared with 0.806 for the PREMM1,2 model (Figure 2B).
Our study reports the prevalence of MLH1/MSH2 mutations detected from a large and diverse cohort of probands undergoing genetic testing on the basis of clinical history, largely without prior molecular prescreening. We found an overall prevalence of deleterious point mutations in 14.5% of individuals in a cohort of 898 with gene sequencing alone and an increase in prevalence to 15.3% (155/1016) with the addition of Southern blot analysis, with 27% (42/155) of detected mutations corresponding to large rearrangements.
Previous estimates for prevalence of MLH1 and MSH2 mutations, the most common genes associated with Lynch syndrome, have ranged from 0.3% to 88% and depend greatly on the population studied, prior selection based on microsatellite instability and/or immunohistochemistry, and the sensitivity of the laboratory techniques used for germline mutation detection.16-22 Although not truly “population-based,” our findings likely closely reflect what one would expect to see among individuals currently undergoing direct clinical genetic testing for Lynch syndrome in the US population at risk of the disease.
Because of the large sample size of our cohorts, we were able to precisely quantitate the relative importance of known clinical parameters in Lynch syndrome. The most significant clinical predictors of finding a mutation according to the proband's history were the presence of 2 or more CRCs (associated with an OR of 16) and according to the family history were the number of first-degree relatives with CRC or endometrial cancer. Age at diagnosis was more important as a factor for CRC than for endometrial or other Lynch syndrome cancers. In the latter cases, the clustering of such tumors with CRC in a kindred was much more important than the age at which they were diagnosed. Although history of adenomas could only be assessed for probands, we observed that they were a significant predictor of mutation status in both the derivation and validation cohorts, although less strong than that of a CRC diagnosis.
Despite the fact that Lynch syndrome is the most common hereditary CRC predisposition syndrome, the identification of at-risk families, the approach to molecular evaluation, and clinical management continue to pose significant challenges for researchers and clinicians.23 One of the main topics of debate has been how to approach the molecular evaluation of patients and their families. Strategies ranging from using existing diagnostic criteria alone to population-based molecular testing of all colorectal tumors using immunohistochemistry have been proposed.21,22,24,25
In hereditary breast-ovarian cancer syndrome, several models have been developed for risk stratification of BRCA1 and BRCA2 gene mutations and are widely used in clinical practice to assist in genetic evaluation and counseling.10,11 The availability of similar models has been more limited for Lynch syndrome. The most widely used diagnostic criteria, the Amsterdam Criteria and Bethesda Guidelines, help researchers and clinicians identify individuals and families at risk of this syndrome but include broad and often complex variables that may encompass multiple diagnoses across generations and are not designed to determine the likelihood of carrying a genetic mutation for an individual patient. Wijnen et al15 developed a multivariable model to identify predictors of MLH1 and MSH2 point mutations in 184 unrelated kindreds referred to high-risk clinics that contained 3 predictors of mutations in MLH1/MSH2: fulfillment of the Amsterdam Criteria, younger mean age at diagnosis of CRC in the family, and presence of endometrial cancer in the kindred. Recently, a quantitative model was developed from a familial cancer clinic population in the United Kingdom,26 which added 5 variables to the Amsterdam Criteria to improve its ability to predict mismatch repair gene mutations (number of CRC and endometrial cancers in the family, number of individuals with ≥2 CRC or endometrial primaries, mean age at diagnosis, and number of individuals with ≥5 adenomas). Both of these models include the rather complex variables within the Amsterdam Criteria and were developed using relatively small populations from dedicated high-risk clinics. Our larger and heterogeneous study population allowed the PREMM1,2 model to be more detailed, taking into account the age at diagnosis in probands and relatives, the presence of colonic adenomas in probands, and the different effect of each cancer diagnosis among first- and second-degree relatives. This increased level of detail led to better sensitivity and specificity combinations than achieved with the Amsterdam II Criteria and revised Bethesda Guidelines. In contrast with the model of Wijnen et al,15 which provides a family estimate, PREMM1,2 can be used to generate separate probabilities of carrying a mutation for each individual in a family and may help to determine which family member might be most appropriate for testing. More recently, 2 models have included microsatellite instability or immunohistochemistry data to refine the estimated probability of finding a mutation. The first, a mendelian model for determining MLH1 and MSH2 carrier probabilities, is based on published estimates of mutation frequencies and cancer penetrances in both mutation carriers and noncarriers and includes clinical microsatellite data.27 How this model performs on actual data from clinical practice is not yet known. The second model was developed in a large population-based cohort of early onset (<55 years) CRC patients undergoing genetic testing for DNA mismatch repair genes.28 Data from microsatellite instability and immunohistochemistry were incorporated to refine carrier prediction at different cutoffs. However, its applicability in CRC patients aged 55 years or older or patients with other Lynch syndrome–associated tumors has not been assessed.
As is shown by its good discriminatory ability, the PREMM1,2 model may become an effective tool for mismatch repair gene mutation risk stratification, which will complement the existing molecular diagnostic tools and other Bayesian models currently in development. The PREMM1,2 model can be used to give accurate estimates of a priori risk of carrying MLH1/MSH2 mutations. How these risks are translated into clinical decision making depends on a variety of factors, including the availability of comprehensive genetic testing services (sequencing and large rearrangement analysis), the timelines of testing information for clinical management decisions, insurance coverage for testing, and the availability of tissue for analysis. Based on the risk estimate generated from the model and the above factors, a clinician may choose whether genetic evaluation should be pursued, as well as the approach to testing, such as prescreening of a tumor specimen with microsatellite instability or immunohistochemistry vs direct germline analysis. Microsatellite instability results were reported for only 47 probands and, hence, were not included in the model. The PREMM1,2 model might well be used in the initial assessment of individuals at risk of Lynch syndrome, before microsatellite instability information is available to the clinician. A health care professional may use the tool to decide whether to refer the patient for further risk assessment and whether to pursue molecular prescreening.
It is important to consider some limitations of our study. The main potential source of error is that our model relies on the clinical history reported by health care professionals on the test order form and the inability to verify diagnoses or collect additional information on certain diagnoses. Previous evidence in the literature shows that accuracy of self-reported family history in first-degree relatives by probands is quite reliable, while it may not be as accurate in second-degree relatives.29-32 Although reporting errors certainly are likely to occur, the fact that health care professionals are the sources of data likely minimize those based on erroneous diagnoses. Reporting errors are likely to represent both underreporting and overrepresentation of cancer diagnoses. For example, because of time limitations when completing the test order form, health care professionals may only report diagnoses that are considered sufficient to justify ordering the genetic test. Conversely, unaffected relatives are not reported on the test order form and, therefore, overrepresentation of cancer diagnoses may occur in large families in which many unaffected individuals may be present. Despite these inaccuracies, strong predictive effects were found that were similar in the development and validation cohorts, which illustrates the capability of routinely obtained information for selection of patients for further diagnostic workup. We did not have detailed pedigree information on each family and, therefore, could not incorporate the impact of family size and unaffected individuals on the likelihood of carrying a mutation. Finally, our model predicts mutation status only for MSH2 and MLH1. However, we plan to continue to update the model with incorporation of data from MSH6 sequence analysis when sufficient data are available.
In conclusion, we determined which aspects of personal and family history were most important in predicting the outcome of clinical genetic testing in a large, diverse population at risk of Lynch syndrome from across the United States. Our prediction rule includes specific and discrete variables and does not rely on complex combinations of diagnoses across generations. The PREMM1,2 model has been externally validated and is available as a user-friendly Web-based model to provide clinicians with an objective tool to estimate the likelihood of finding mutations in the MLH1 and MSH2 genes and to help guide the strategy for molecular evaluation.
Corresponding Author: Sapna Syngal, MD, MPH, Dana-Farber Cancer Institute,44 Binney St, Boston, MA 02115 (firstname.lastname@example.org).
Author Contributions: Dr Balmaña had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. Drs Balmaña and Stockwell contributed equally to this work.
Study concept and design: Balmaña, Stockwell, Deffenbaugh, Syngal.
Acquisition of data: Deffenbaugh, Ward, Scholl, Tazelaar, Burbidge, Syngal.
Analysis and interpretation of data: Balmaña, Stockwell, Steyerberg, Stoffel, Deffenbaugh, Reid, Scholl, Hendrickson, Syngal.
Drafting of the manuscript: Balmaña, Stockwell, Steyerberg, Syngal.
Critical revision of the manuscript for important intellectual content: Balmaña, Stockwell, Steyerberg, Stoffel, Deffenbaugh, Reid, Ward, Scholl, Hendrickson, Tazelaar, Burbidge, Syngal.
Statistical analysis: Balmaña, Stockwell, Steyerberg, Stoffel, Reid.
Obtained funding: Syngal.
Administrative, technical, or material support: Deffenbaugh, Ward, Scholl, Hendrickson, Tazelaar, Burbidge, Syngal.
Study supervision: Balmaña, Deffenbaugh, Scholl, Syngal.
Financial Disclosures: Dr Stoffel reports having received lecture honoraria from Myriad Genetics Laboratories Inc; Dr Syngal reports having received lecture honoraria from and having served on a clinical advisory board for Myriad Genetics Laboratories Inc; and Mss Deffenbaugh, Reid, and Burbidge, Drs Ward, Scholl, and Tazelaar, and Mr Hendrickson were employed by Myriad Genetics Laboratories Inc when the study was conducted. No other disclosures were reported.
Funding/Support: Dr Balmaña was supported by a grant from La Caixa, Barcelona, Spain, Dr Steyerberg by grants from the Trust Fund Erasmus University Rotterdam and a Marx Fellowship from Dana-Farber Cancer Institute, Dr Stoffel from a Junior Faculty Career Development Award from the American College of Gastroenterology, and Dr Syngal by National Institutes of Health/National Cancer Institute grant K24 CA 113433. No grant support was obtained from Myriad Genetics Laboratories Inc to support the study.
Role of the Sponsor: The funding sources played no role in the design and conduct of the study; collection, management, analysis, and interpretation of data; or preparation, review, or approval of the manuscript.
Acknowledgment: We thank Meredith Regan, ScD, Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, for her assistance in data preparation; Jimmy L. Dias, BA, and William B. Dilworth, BA, Interactive Communications, Dana-Farber Cancer Institute, for their technical assistance in the Web-based model development; and Beth M. Ford, BA, Medical Oncology, Dana-Farber Cancer Institute, for her assistance in data collection and administrative issues. None of those listed herein received compensation for their work.