Yourman LC, Lee SJ, Schonberg MA, Widera EW, Smith AK. Prognostic Indices for Older AdultsA Systematic Review. JAMA. 2012;307(2):182-192. doi:10.1001/jama.2011.1966
Author Affiliations: Division of Geriatrics, Department of Medicine, University of California, San Francisco (Drs Yourman, Lee, and Widera); San Francisco Veterans Affairs Medical Center, San Francisco (Drs Lee, Widera, and Smith); and Division of General Medicine and Primary Care, Department of Medicine, Beth Israel Deaconess Medical Center, Boston, Massachusetts (Dr Schonberg).
Context To better target services to those who may benefit, many guidelines recommend incorporating life expectancy into clinical decisions.
Objective To assess the quality and limitations of prognostic indices for mortality in older adults through systematic review.
Data Sources We searched MEDLINE, EMBASE, Cochrane, and Google Scholar from their inception through November 2011.
Study Selection We included indices if they were validated and predicted absolute risk of mortality in patients whose average age was 60 years or older. We excluded indices that estimated intensive care unit, disease-specific, or in-hospital mortality.
Data Extraction For each prognostic index, we extracted data on clinical setting, potential for bias, generalizability, and accuracy.
Results We reviewed 21 593 titles to identify 16 indices that predict risk of mortality from 6 months to 5 years for older adults in a variety of clinical settings: the community (6 indices), nursing home (2 indices), and hospital (8 indices). At least 1 measure of transportability (the index is accurate in more than 1 population) was tested for all but 3 indices. By our measures, no study was free from potential bias. Although 13 indices had C statistics of 0.70 or greater, none of the indices had C statistics of 0.90 or greater. Only 2 indices were independently validated by investigators who were not involved in the index's development.
Conclusion We identified several indices for predicting overall mortality in different patient groups; future studies need to independently test their accuracy in heterogeneous populations and their ability to improve clinical outcomes before their widespread use can be recommended.
Quiz Ref IDFailure to consider prognosis in the context of clinical decision making can lead to poor care. Hospice is underutilized for patients with nonmalignant yet life-threatening diseases.1 Healthy older patients with good prognosis have low rates of cancer screening.2 Older adults with advanced dementia or metastatic cancer are screened for slow-growing cancers that are unlikely to ever cause them symptoms but may lead to distress from false-positive results, invasive workups, and treatments.3,4 In recognition of these phenomena, guidelines increasingly incorporate life expectancy as a central factor in weighing the benefits and the burdens of tests and treatments (Table 1). Prognostic indices offer a potential role for moving beyond arbitrary age-based cutoffs in clinical decision making for older adults.2 However, little is known about the quality of prognostic indices for older adults, limiting their clinical use.
We performed a systematic review to describe the quality and limitations of validated non–disease-specific prognostic indices that predict absolute risk of all-cause mortality in older adults. Recognizing that older adults are more likely to have more than 1 chronic illness than younger adults, we focused on non–disease-specific indices.
We used broad Medical Subject Heading terms (eg, mortality, prognosis, aged) to search MEDLINE, EMBASE, Cochrane, and Google Scholar from their inception through November 2011 for English-language–validated prognostic indices that predicted absolute risk of all-cause mortality in patients whose average age was 60 years or older. Authors of included studies and experts in the field were contacted and asked for additional published and unpublished sources. We excluded indices that estimated intensive care unit (ICU), in-hospital, or disease-specific mortality. Two investigators (L.C.Y. and A.K.S) independently applied these inclusion and exclusion criteria to select prognostic indices and independently abstracted their data. Disagreements were resolved by consensus or, if necessary, the involvement of a third investigator (S.J.L).
There are no accepted criteria to assess the quality of prognostic indices. Therefore, we adapted criteria from previous work published by experts in medicine and epidemiology.28- 35 We abstracted data on the quality of prognostic indices, including information on potential bias, generalizability, and accuracy (Table 2). For discrimination, we considered C statistics in the range of 0.50 to 0.59 to indicate poor, 0.60 to 0.69 to indicate moderate, 0.70 to 0.79 to indicate good, 0.80 to 0.89 to indicate very good, and 0.90 or greater to indicate excellent discrimination.44 For calibration, we considered 10 or more percentage points' difference between predicted and observed mortality to be evidence of poor calibration and less than 10 percentage points' difference to be evidence that the model was well calibrated. To further assess the potential limitations of these indices in clinical practice, we tracked studies that predicted greater than 50% mortality, since 50% mortality represents the median residual lifespan. We report 95% confidence intervals on measures of discrimination and calibration where available.
One investigator title-screened 21 593 studies to identify 4120 potentially relevant abstracts (eFigure). After excluding studies with participants whose average age was less than 60 years old; studies that predicted only relative risk; or indices that predicted only disease-specific, in-hospital, or ICU mortality, there were 341 studies published between January 1987 and November 2011. After review of the full text of these studies, 317 studies were excluded, leaving 24 studies (eFigure).36- 43,45- 60 Three of these studies presented updated versions of an index,36,40,53 and 5 provided additional validation for an index,38,43,54,58,59 resulting in a total of 16 unique indices.
All indices were developed using secondary analysis of existing data sets of participants from the United States (11 indices)36,39- 41,45,46,49,51,55- 57 and western Europe (4 indices).42,47,48,52 The most common final predictors of mortality included functional status and comorbidities (each only absent in <5 indices). Three indices tested only reproducibility and did not evaluate any form of transportability (split sample validation only47,48 and bootstrapping only57) (Table 2). Only a single form of transportability was tested for 4 indices (geographic39,46,52 and historical51). For 4 indices, the investigators who developed the index tested the transportability of their index in a separate validation study.37,38,42,43,49,55,58,59 Two indices were additionally validated by an investigator not involved in the index's development.36,38,41,54
None of the examined indices had a C statistic ≥0.90; 3 indices had C statistics between 0.80 and 0.89, suggesting very good discrimination39,40,49; 10 indices had C statistics between 0.70 and 0.79, suggesting good discrimination36,37,41,42,46,48,52,55- 57; and 3 indices had C statistics between 0.60 and 0.69, suggesting moderate discrimination.45,47,51 Indices were generally well calibrated across risk groups (Table 3). Two indices reported a greater than 10% difference between predicted and observed mortality.36,40
We present a descriptive summary of each index by setting. Results of data abstraction regarding potential bias, generalizability, and accuracy are shown in Table 3 and Table 4.
Our review identified 6 indices for community-dwelling older adults. Indices estimated mortality risk from 1 year56 to 5 years.55 The highest-risk group from Schonberg et al at 9-year follow-up predicted 92% mortality (95% CI, 86%-96%).58
Gagne et al56 developed a mortality risk score to predict 1-year mortality by combining conditions in the Romano et al62 implementation of the Charlson et al index63 and the van Walraven et al64 implementation of the Elixhauser et al system.65 The sample was a secondary analysis of Medicare enrollees 65 years and older who in 2004 participated in a pharmacy assistance contract for low-income seniors who did not qualify for Medicaid prescription drug coverage in Pennsylvania (development cohort, n = 120 679) and New Jersey (validation cohort, n = 123 855). The model had good discrimination and was well calibrated (Table 3). Reclassification measures compared the model favorably against the Romano/Charlson and van Walraven/Elixhauser indices.
The 15-month index by Mazzaglia et al52 is a 7-item questionnaire for primary care physicians that was developed in 2470 primary care patients who were 65 years and older residing in northwestern Florence, Italy, and validated in a sample of 2926 similar patients residing in southwestern Florence. The model was well calibrated and had good discrimination, but it predicted the narrowest range of mortality of any examined index (0%-10% risk).
Carey et al46 developed a 2-year index for community-dwelling elderly individuals from a sample of 4516 adults 70 years and older from the eastern, western, and central United States who had been interviewed in the Asset and Health Dynamics Among the Oldest Old (AHEAD) study in 1993. Carey et al subsequently validated the index in 2877 similar interviewees from the southern United States. The index had good discrimination and was well calibrated across all 3 risk levels but predicted only a narrow range of mortality (5%-36% risk).
The index by Carey et al for 3-year mortality45 was developed in functionally impaired, nursing home–eligible, community-dwelling adults who were 55 years and older in the years 1988 through 1996, living in the western United States (n = 2232), and enrolled in the Program of All-Inclusive Care for the Elderly (PACE), a senior daycare program providing multidisciplinary services. Validation was conducted in PACE participants from the eastern and midwestern United States (n = 1667). The index was well calibrated but showed only moderate discrimination. Accuracy was similar for 1-year mortality.
Lee et al39 developed a 4-year mortality index in community-dwelling adults older than 50 years from the eastern, western, and central United States who were interviewed in the Health and Retirement Survey of 1998 (81% participation rate, n = 11 701). To test geographic transportability, the index was validated in interviewees from the southern United States (n = 8009). The Lee et al index was well calibrated and showed very good discrimination.
The index by Schonberg et al55 to predict 5-year mortality was developed from a nationally representative sample of adults older than 65 years (n = 16 077) who responded to the 1997-2000 National Health Interview Survey (NHIS) (74% participation rate); it was well calibrated and had good discrimination in a random sample of n = 8038 adults drawn from the same data source. Schonberg et al58 then further validated the index in respondents to the 2001-2004 NHIS (n = 22 057, 25% aged >80 years, 57% female, 12% dependent in at least 1 instrumental activity of daily living, 18% with diabetes, 15% with cancer) and found no change in discrimination (C statistic, 0.75). The Kaplan-Meier method demonstrated widening separation between risk groups out to 9 years.
Two indices were developed for the nursing home, both using the Minimum Data Set (MDS), a clinical and administrative data set that is federally required of all US nursing homes. The MDS Mortality Rating Index by Porock et al37 to estimate 6-month mortality in nursing home patients was developed using data from all Missouri long-term care residents in 1999. Study authors later created a simplified version of this model using the same data set.53 The revised Flacker and Kiely36,50 long-stay index for 1-year mortality was developed and validated from the MDS using a split sample of nursing home residents who were 65 years and older and residing longer than 1 year in Medicare-certified nursing homes within New York (n = 63 077). Both indices demonstrated very good discrimination and were well calibrated across a wide range of mortality risk levels, except the revised Flacker and Kiely for the highest risk group (20% difference).
Kruse et al38 prospectively validated indices by Porock et al and Flacker and Kiely in a small, prospective, single nursing home study in 2007 (n = 130, mean age 83 years, 61% female, 24% dementia, 23% congestive heart failure). For the Porock et al index, discriminatory ability was lower in the validation study by Kruse et al (C statistic, 0.59; 95% CI, 0.46-0.72) than in the original derivation study by Porock et al (C statistic, 0.75) or using the simplified score (C statistic, 0.76). For the revised Flacker and Kiely index, discriminatory ability was the same in both the original derivation study by Flacker and Kiely (C statistic, 0.71) and the external validation by Kruse et al (C statistic, 0.72; 95% CI, 0.62-0.81).
We identified 8 indices that estimated mortality risk for hospitalized older adults. Seven indices estimated 1-year mortality. Five were intended for use in the emergency department or on hospital admission40,42,47,49,57 and 3 after hospital discharge.41,48,51
The “Silver Code” by Di Bari et al,47 a 1-year index for emergency triage of individuals aged 75 years and older, was developed and validated using administrative records of patients admitted to the hospital via the emergency department from Florence, Italy, in 2005 (n = 10 913). They achieved 91% linkage across 4 administrative data sets (demographics, hospitalizations, prescription medications, and mortality). Random split sample validation was conducted on half the cohort. The index was well calibrated and discriminatory ability was moderate.
Fischer et al49 conducted a retrospective medical record review to develop a 1-year index for hospitalized elderly individuals using 4 prespecified predictors called the CARING criteria, collected at admission. Their sample included patients admitted to the medical service of a US Department of Veterans Affairs hospital in a 4-month period in 1999 (n = 873). Participants admitted in the first 2 months of the study period were included in the development cohort; the remainder were in the validation cohort. The model had very good discrimination and a reported error rate of 0.26 in the validation cohort. Youngwerth et al59 later prospectively tested the external validity of the CARING criteria in a younger, sex-balanced sample from a university hospital in 2005 (n = 427, average age 54 years, 50% female). No C statistic was reported for the external validation.
The Burden of Illness Score for Elderly Persons by Inouye et al40 updated previous indices developed by the same group60,66 by adding functional and laboratory data to diagnoses from administrative data to estimate 1-year mortality. Participants were drawn from a prospective study of individuals aged 70 years and older who were hospitalized at Yale–New Haven Hospital from 1989 through 1991 (n = 525). The study was validated in a sample of 1246 participants from 27 Connecticut hospitals who were 65 years and older with a principal discharge diagnosis of pneumonia from 1995 through 1996. The investigators demonstrated improvement in the C statistic with the addition of laboratory and functional and cognitive measures to administrative data (validation C statistics, administrative alone, 0.59; all measures, 0.77). The model was well calibrated at the extremes but was less accurate in middle-risk groups (Table 4).
Pilotto et al42 used information from the standardized Geriatrics Assessment, performed at admission, to develop a 1-year prognostic index for hospitalized individuals aged 65 years and older in a sample of 838 consecutively admitted patients to the geriatrics unit of an Italian hospital in 2004, validating in 857 participants from 2005. They subsequently tested the model's accuracy at 1 year and 1 month in participants from the same hospital from 2005 to 2007 (n = 4088).43 The model was well calibrated and demonstrated good discrimination in the larger validation study (C statistic, 0.71; 95% CI, 0.70-0.74), and performance was similar at 1 month (C statistic, 0.76; 95% CI, 0.73-0.79).
Teno et al57 developed a nomogram to predict 1- and 2-year mortality based on medicine and ICU patients aged 80 years and older who were enrolled in the Hospitalized Elder Longitudinal Project (HELP) from 5 different hospitals across the United States from 1993 to 1994 (n = 1266). Teno et al tested the reproducibility of the index in 150 random samples from the original 1266 patients. The Teno et al nomogram is convenient in that it predicts multiple end points from a single score. The index includes the APACHE III scale, which requires arterial blood gas measurement.
Levine et al51 developed a 1-year prognostic model for hospitalized elderly individuals after discharge using data from a cohort of patients admitted to hospitalist and nonhospitalist physicians at the University of Chicago Hospitals from July 1997 through June 1999 (development cohort, n = 2739) and July 1999 through June 2001 (validation cohort, n = 3643). The index had moderate discriminatory ability and was well calibrated.
Walter et al41 developed a 1-year index for elderly individuals after hospital discharge using secondary data from a study of patients aged 70 years and older who were hospitalized between 1993 and 1997 at the University of Hospitals Cleveland (development cohort, n = 1495) and the Akron City Hospital (validation cohort, n = 1427). The model demonstrated good discrimination and was well calibrated across risk groups. Rozzini et al54 subsequently externally validated the index's performance predicting 6-month mortality in a retrospective analysis of 840 consecutively admitted participants to a hospital in Italy and found monotonic increases in mortality for each predicted risk level (observed 4%, 10%, 25%, and 46% 6-month mortality).
The Dramé et al48 index for 2-year mortality was developed in hospitalized adults aged 75 years and older based on secondary data obtained in the emergency department as part of the SAFES study in France (n = 870). It showed good calibration and discrimination in a split sample validation of 436 older adults.
Our review identified 16 validated non–disease-specific prognostic indices for older adults. Studies were abstracted for information about index quality, including potential for bias, generalizability, and accuracy.
We highlighted criteria for evaluating prognostic indices and identified several high-quality prognostic indices. Quiz Ref IDUnfortunately, although these indices hold the promise of improving the targeting of interventions in older adults, there is insufficient evidence at this time to recommend the widespread use of prognostic indices in clinical practice. Only 2 indices were validated by investigators not involved in the studies' development, and no index had been prospectively tested and found to be accurate in a large diverse sample. Confidence intervals were not presented for either measures of discrimination or calibration for 14 indices. By our measures, no study was completely free from potential sources of bias. Testing of transportability was limited, raising concerns about overfitting and underfitting. These factors limit a clinician's ability to assess the accuracy of these indices across patient groups that differ according to severity of illness, methodology of data collection, geographic location, and time.
Even if quality barriers are overcome, important limitations remain. Several indices require collection of information that may not be routinely assessed in elderly patients, such as activities of daily living. Many of these indices rely on clinical information from administrative data sets, and the accuracy of codes from the International Classification of Diseases, Ninth Revision, has been called into question.61 Thus, indices by Gagne et al, Inouye et al, and Levine et al may be better suited to risk adjustment than clinical use. Moreover, coding algorithms are subject to change. The MDS has been updated to a new version (3.0) since the development of indices for nursing home patients, and some variables in indices by Porock et al and Flacker and Kiely have been changed or are no longer present.67 Finally, PubMed has no single Medical Subject Heading term for prognostic index, making it difficult for a busy clinician to locate these studies.
Ultimately, an index will be judged not only on its accuracy across diverse settings, but also on its clinical effect. Studies that demonstrate effect on prognostic estimates, clinician behavior, and patient outcomes have a higher level of evidence for use in clinical decision making (eg, Ottawa ankle rules).35 We are aware of only 2 small studies that tested the effect of these indices on clinical outcomes.51,68Quiz Ref IDThe highest level of evidence, however, would come from large prospective trials that randomize clinicians to using the index or not, evaluating the effect of the index on prognostic estimates, clinical decision making, and patient outcomes. Such large randomized trials have not been performed.
None of the C statistics for the included indices were higher than 0.90, suggesting unexplained variation in mortality. However, discriminatory ability of these indices is consistent with other indices that commonly drive clinical decisions, such as the CHADS2 index to help determine warfarin therapy (C statistic, 0.68-0.72)69; the Framingham risk score to help determine lipid therapy (C statistic, 0.63-0.83)70; and the TIMI risk score to help determine invasive therapy for unstable angina (C statistic, 0.65).71
There may be a limited role for the highest-quality indices in the right settings. If patient characteristics align closely with those of the development or validation cohorts, clinicians may find prognostic information useful to help inform, though not replace, their clinical judgment. Prediction rules have been shown to outperform clinicians in terms of prognostication,72,73 whereas human prediction on its own is fraught with bias.74 The indices we identified were developed from heterogeneous groups of patients. Applying this information to the individual patient, therefore, requires a nuanced use of the index. Patients are likely to have conditions that are not included in the index (eg, Parkinson disease). The clinician must account for these conditions and decide whether their effect is adequately accounted for by the indices' predictors.
Quiz Ref IDIndices are most likely to be clinically useful when they predict a wide range of mortality. Clinical decisions are most likely to be influenced by either very low or very high mortality risk. Although 10 indices predicted greater than 50% mortality, only 3 predicted greater than 80% risk in the highest risk group. Quiz Ref IDMidrange probabilities may still be useful in clinical decisions in which life expectancy plays a role, allowing patient preferences to drive the physician's recommendation. The following case illustrates this issue.
Ms A is a 75-year-old clinic patient who has been hospitalized twice in the past year for chronic obstructive pulmonary disease and has a history of diabetes and difficulty walking a quarter mile. She has not been previously screened for colon cancer. The US Preventive Services Task Force recommends that individual factors should determine the decision to screen or not screen patients aged 75 to 85 years; patients must live at least 7 years to benefit from screening, and the net benefits in this age group are small.16 Using indices developed for community-dwelling elderly individuals, it is determined that Ms A has a 54% to 67% mortality risk at 4 years (Lee et al index) and 75% at 9 years (Schonberg et al index). Should Ms A undergo colorectal cancer screening?
In this case, the prognostic information may be helpful as her physician discusses the possibility of colon cancer screening in relation to other health priorities, such as maintaining mobility. Because her median life expectancy is less than 4 years, Ms A will probably not live long enough to benefit from screening. And if screening is difficult for her, there is enough uncertainty in her likelihood of benefit that she probably should focus on other priorities. However, if she feels strongly about wanting to be screened, the estimates are not strong enough on their own to refute that decision.
We have refrained from explicitly ranking or categorizing the quality of these indices, recognizing that no agreed-on scientifically developed system for rating index quality currently exists. Some will argue that minimizing risk for potential bias is of critical importance, while others might argue that an index should be judged on its ability to perform accurately across diverse settings. Our review excluded indices that estimated only relative risk or had not been validated, and future research may find that some of these indices are generalizable and accurate. Our ability to assess publication bias was limited by our small sample size.
While neither a clinician nor an index can predict with absolute certainty how long an older adult will live, validated prognostic indices might improve the accuracy of the prognostic assumptions that influence clinical decisions. However, further research is needed before general prognostic indices for elderly individuals can be recommended for routine use. Future research should focus on prospectively testing the validity of these indices across diverse clinical settings and analyzing their effect on clinical decision making and patient outcomes.
Corresponding Author: Alexander K. Smith, MD, MPH, Division of Geriatrics, University of California, San Francisco, 4150 Clement St (181G), San Francisco, CA 94121 (firstname.lastname@example.org).
Author Contributions: Dr Smith had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Yourman, Lee, Schonberg, Widera, Smith.
Acquisition of data: Yourman, Smith.
Analysis and interpretation of data: Yourman, Lee, Schonberg, Smith.
Drafting of the manuscript: Yourman, Lee, Smith.
Critical revision of the manuscript for important intellectual content: Yourman, Lee, Schonberg, Widera, Smith.
Obtained funding: Smith.
Administrative, technical, or material support: Yourman, Schonberg, Widera, Smith.
Study supervision: Widera, Smith.
Conflict of Interest Disclosures: The authors have completed and submitted the ICMJE Form for Disclosure of Potential Conflicts of Interest and none were reported.
Funding/Support: Dr Yourman was supported by a National Institute on Aging T32 predoctoral fellowship position (5T32AG000212). Dr Smith was supported by career development grants from the Greenwall Foundation and the National Center for Research Resources to the Clinical and Translational Science Institute–University of California, San Francisco (UL1 RR024131).
Role of the Sponsor: The funding organizations had no role in the design and conduct of the study; in the collection, analysis, and interpretation of the data; or in the preparation, review, or approval of the manuscript.
Additional Contributions: We gratefully acknowledge Gloria Won, MLIS, H. M. Fishbon Memorial Library, University of California, San Francisco, for her help with searching EMBASE. She did not receive compensation for her contribution.