Creation of derivation data set and test data set.
Customize your JAMA Network experience by selecting one or more topics from the list below.
Donnan PT, Dorward DWT, Mutch B, Morris AD. Development and Validation of a Model for Predicting Emergency Admissions Over the Next Year (PEONY): A UK Historical Cohort Study. Arch Intern Med. 2008;168(13):1416–1422. doi:10.1001/archinte.168.13.1416
Current international health policy has emphasized the importance of managing long-term conditions in the community with the aim of preventing emergency hospitalizations. Previous algorithms and rules have been developed but are limited to those older than 65 years and generally only for readmission. Our aim was to develop an algorithm to predict emergency hospital admissions in the whole population of those 40 years or older.
The design was a historical cohort observational study from 1996 to 2004 with at least 1 year of follow-up and split-half validation, set in the population of Tayside, Scotland (n = 410 000). Participants were 40 years or older with a 3-year history of prescribed drugs and hospital admissions. The main outcome measure was first emergency hospital admission in the following year, analyzed using logistic regression.
A total of 186 523 subjects 40 years or older were identified at baseline. A derivation data set (n = 90 522) yielded 6793 participants (7.5%) who experienced an emergency hospital admission in the following year. Strong predictors of admissions were age; being male; high social deprivation; previously prescribed analgesics, antibacterials, nitrates, and diuretics; the number of respiratory medications; and the number of previous admissions and previous total bed-days. Discriminatory power was good (c statistic, 0.80) and split-half validation gave good calibration, especially for the highest decile of risk.
A population-derived algorithm provided the first easy-to-use algorithm, to our knowledge, to predict future emergency admissions in all individuals 40 years or older. The model can be implemented at individual patient level as well as family practice level to target case management.
Recent international and UK health policy has emphasized the key issue of the management of long-term conditions and the need to anticipate future care to prevent the increasing trend of emergency hospital admissions.1 The Scottish UK Kerr report called on
all NHS [National Health Service] Boards to put in place a systematic approach to caring for the most vulnerable (especially older people) with long-term conditions with a view to managing their conditions at home or in the community and reducing the chance of hospitalisation1(p6)
The achievement of this goal is likely to involve screening of individuals incorporating risk stratification to target care, better integration of services and information technology systems, and support for self-care.1 Many predictive algorithms have been produced in the United States to identify high risk of admission among elderly subjects, although these have tended to be applied to those enrolled in mostly Medicare health maintenance organizations and their generalizability is questionable.2-5 The NHS of England has piloted several US-inspired models of risk stratification and case management aimed at the top 1% to 2% of patients in terms of need.6,7 However, initial objectives were limited to reducing future emergency admissions in those who had already experienced 2 or more emergency admissions.8 The risk stratification model based on English admissions data has been developed by the King's Fund, London, England, but currently this has also been based only on predicting readmission in those who have already experienced an admission for certain trigger conditions such as chronic obstructive pulmonary disease.9 More recent algorithms have aimed at even older age groups,10 and all have fairly poor predictive ability, with an area under the receiver operating characteristic curve typically less than 0.7.9-11 These algorithms clearly leave out of consideration the important group of people at high risk who are younger than 65 years and those who have never experienced an emergency admission but are likely to experience an emergency admission in the future. Because complications of long-term conditions tend to increase in those older than 40 years, we chose the age range 40 to 65 years for this study. It has clearly been demonstrated that emergency admissions in those older than 65 years fall over time, partly owing to regression to the mean but also because patients die or go into long-term care.8 Also, the success of interventions has been variable, and identifying high-risk patients has not yet demonstrated a reduction in emergency admissions following implementation of these algorithms.12
The aim of this study was therefore to develop a more comprehensive, robust algorithm for use by clinicians and policy makers in predicting future emergency hospital admissions in all individuals 40 years or older in the next year (the Predicting Emergency Admissions Over the Next Year [PEONY] algorithm). This work has immediate relevance to developing policy and service reorganization, as well as informing future intervention studies of different models of case management.
The study population consisted of all subjects 40 years or older in the Tayside region of Scotland, United Kingdom (population, approximately 410 000) and who were registered with a Tayside general practice. The Tayside population is a mixture of urban and rural communities, with a low nonwhite population, and is considered representative of Scotland. In a dynamic population, entry into the study occurred at any time from January 1, 1996, to March 31, 2004. To be eligible, each individual had to have data on the history of hospital use and drug prescribing over a 3-year period, as well as a minimum of 12 months of follow-up information. Baseline was defined after the initial 3 years and follow-up occurred over the following year. Those subjects with either less than a 3-years history or less than 1 year of follow-up data were excluded.
In Tayside, every individual registered with a general practitioner has a unique identifier, the community health index. This 10-digit number is used for all health encounters in the Tayside region and enables record linkage of both primary care and secondary care data. For the purposes of research, all data sets are made anonymous according to Standard Operating Procedures of the Health Informatics Centre, and this study received ethical approval from the Tayside research ethics committee and from the Caldicott Guardian and will be subject to an independent external audit.13
For each individual 40 years or older in the population, a baseline time point of January 1, 1996 (or date thereafter for people entering Tayside later) was derived such that all the risk factors under consideration were present within a period of 3 years retrospectively, and each individual had at least 1 year of complete follow-up. One group of risk factors that was assessed for potential inclusion were previous hospital use as determined by the Scottish Morbidity Record 1. The Scottish Morbidity Record 1 consists of records of each finished consultant episode that can be contiguous to form a single admission. Each finished consultant episode contains the date of admission and discharge, whether emergency or not, along with a set of International Classification of Diseases codes classifying the finished consultant episode. We determined the number of previous emergency admissions, using the same definition as the Information Services of the Scottish Executive, the number of previous admissions (planned or unplanned), total bed-days in the previous 3 years, and mean length of stay in the previous 3 years. The demographic factors of age at baseline, sex, and social deprivation (high/low and unknown) based on census information linked to postal code14 were also considered. In addition, receipt of drugs, some of which are markers of long-term conditions, were constructed based on the class of medication from the British National Formulary,15 as well as the number of prescriptions of each category obtained in the previous 3 years. The classes of drugs correspond to particular chapters so that, for example, British National Formulary 2.2 includes all diuretics (these are listed in appendix 1; http://www.dundee.ac.uk/hic/links/). The 23 factors were created using data from the community-based database of pharmacist-dispensed prescriptions in Tayside held in the Health Informatics Centre as described elsewhere.13 Hence, it is known that the patients actually received the drugs.
The main outcome for this study was the first emergency hospital admission in the follow-up year as defined by the Information Services of the Scottish Executive based on a field in the Scottish Morbidity Record of Register of hospitalization, which indicates nonelective admission, the Tayside portion of which is held in the Health Informatics Centre.13 Hence, ascertainment of events was close to 100%. Information of all causes of death was obtained from the General Registry Office for Scotland.
The characteristics of the study population were summarized by means and standard deviations for continuous measurements and as percentages for categorical factors. The data set was split in half at random into a derivation data set and a test set. In the derivation data set, the main binary outcome of first emergency admission in the following year was modeled using logistic regression.16 From this model, odds ratios (ORs) and associated 95% confidence intervals (CIs) were obtained by exponentiating the regression coefficients. Absolute risk was estimated from the linear predictor of the final model.
Factors for potential inclusion in the model were initially considered for inclusion with a univariate significance level of P < .20 and/or were judged to be of clinical importance. In total, 53 potential factors were considered initially. The multiple regression then proceeded as follows: first, stepwise regression was implemented in both forward and backward directions, using the standard criteria of significance at the 5% level for entry, to assess whether the same model resulted. Tests for interactions between risk factors were performed at a P < .05 level of significance.
However, with such large data sets there is a danger of adding spuriously significant factors, and so Schwarz's Bayesian Information Criterion was used to select the final model, which penalizes the likelihood heavily for large data sets as well as large numbers of parameters in the model.17
Finally, earlier rejected variables were added again to the final model to confirm that they were not statistically significant, clinically important, or potential confounders.
Performance of the algorithm obtained from the derivation data set was tested on the randomly selected test data set.18 First, overall discrimination ability was assessed for the derived function on the derivation data itself, and second, this model was used on the test data set. Discrimination was assessed using the c statistic or area under the receiver operating characteristic curve statistic, which is an estimate of the probability of assigning a higher risk to those who have an emergency admission in the following year compared with those who do not. This is an important criterion when ranking people by risk and is clearly essential for risk stratification. Calibration of the algorithm on the derivation data set was also assessed by comparing observed and expected emergency admissions by deciles of predicted risk using a χ2 test. Finally, the c statistic or area under the receiver operating characteristic curve and calibration test were also calculated for the derived algorithm applied to the test validation data set. Sensitivity, specificity, positive predictive value (PPV), yield (1/PPV), and likelihood ratio (sensitivity/1 − specificity) were calculated for different cutoff points of the predicted risk (×100) to give a percentage. In addition, using methods derived for Framingham scores, a clinical scoring system was developed to simplify use in a clinical scenario.19 All analyses were implemented in SAS version 9 (SAS Institute Inc, Cary, North Carolina) statistical software.
A total of 186 523 individuals 40 years and older (80%) were included from the 234 133 in the total Tayside population (Figure). The randomly selected derivation data set consisted of 93 156 subjects, and after the exclusion of 2634 subjects with less than 1 year of follow-up, 90 522 for derivation of the model remained. Table 1 gives the characteristics of the derivation data set population. Those who experienced an outcome of an emergency admission (7.5%) tended to be older and more likely to be from a deprived area. They were also twice as likely to have had previous emergency admissions, with mean total bed-days 6 times greater than those who had no emergency admission in the following year. There was little difference in practice characteristics in terms of rurality, list size, and existence of a chronic disease management program. All British National Formulary groups of drugs as defined in appendix 1 were more prevalent in those who experienced emergency admissions (Table 2). For example, those who experienced emergency admissions were 4 times more likely to have been prescribed anticoagulants in the previous 3 years.
Initially, univariate logistic regression was performed considering all 53 potential variables for entry. Of these, a total of 49 had a P value of < .20, and so these were considered for potential inclusion in multiple regression models. Both backward and forward methods were used, with the standard P < .05 significance level for entry and removal, giving the same model with 32 parameters. Interactions were considered, and a further 7 parameters were added, giving a 39-parameter model. Finally, factors were removed and added considering the Bayesian Information Criterion as criteria for model fit. This criterion penalizes large data sets and large models and eliminates some terms significant at the P < .05 level, to give a final model with 35 parameters, including the intercept.
Table 3 gives the odds ratios for these factors in the final model. Although there was not much evidence of differences between men and women univariately, in the final model, men had greater risk than women, and this risk was attenuated by the number of previous admissions (planned or unplanned). Not surprisingly, age was a strong risk factor, but this was also attenuated by experiencing a previous emergency admission (Table 3). Previous emergency admissions (yes or no) also had a number of interactions with previous drug prescribing. There were increased risks of emergency admission in those prescribed gastrointestinal drugs, diuretics, antiplatelets, and antibacterials. However, in those who had experienced a previous emergency admission, these drugs were no longer significant markers of high risk. Indeed, for antibacterials there was some evidence of a reduced risk for those with previous admissions, although not reaching statistical significance (Table 3). Independently of the other measures of hospital use, the mean number of bed-days in the previous 3 years was a strong risk factor for emergency admission in the following year. Those who lived in socially deprived areas were 13% more likely to have an emergency admission compared with more affluent areas (Table 3). Of the other drugs prescribed, antihypertensives, nitrates, antiparkinsonian drugs, antipsychotics, and drugs to treat anemia were all associated with higher risk. Nitrates alone were associated with increased risk, but in combination with anticoagulants they showed a reduced risk of emergency admission, which was statistically significant (Table 3). Receipt of respiratory drugs as well as an increasing number of drugs were strongly significant markers of high risk. An increasing number of prescriptions of hypnotics and anxiolytics, analgesics, and diabetes medication were all associated with increased risk. Antidepressants were associated with higher risk but less so in more elderly subjects. Antiosteoporotic drugs were associated with a reduced risk of emergency admission. Appendix 2 (http://www.dundee.ac.uk/hic/links/) contains the factors required to estimate the risk of emergency admission.
The final model was then used to derive probabilities of emergency admission in the next year and was expressed as a percentage from 0% to 100%. For clinical purposes, we also derived a simpler points system derived from the regression coefficients, and this is shown in appendix 3 (http://www.dundee.ac.uk/hic/links/).19 As an example, consider a man aged 72 years and is from a highly deprived area; has previous emergency admissions, 8 previous admissions, and 106 total bed-days in the previous 3 years; and is in receipt of hypertension and heart failure drugs, nitrates and calcium channel blockers, respiratory drugs, anxiolytics, antidepressants, analgesics, and antibacterials: the absolute risk of admission in the next year is 52.8% (in the top percentile, 47 points). In contrast, a woman aged 50 years and is from an affluent area; has no previous emergency admissions, 1 previous admission, and 1 total bed-day; and is in receipt of ulcer-healing drugs, diuretics, and antibacterials: the absolute risk of admission in the next year is 2.4% (10 points).
The discriminatory power for the prediction algorithm on the Tayside data was c = 0.80, with no significant lack of calibration on the derivation data set (P = .14). On applying the model to the random split-half test/data set, discriminatory power was still good, with a c statistic of 0.79. The calibration was extremely good for the top decile of risk in the test data set, but there was some indication of lack of fit to the ninth decile (P = .006). Calculation of sensitivity, specificity, and PPV are given for different cutoff points to identify high risk in terms of percentages and points from the clinical scoring system (Table 4). In a typical family practice with 3000 people 40 years or older, a cutoff point of greater than 46 would identify 29 people for potential intervention (Table 4).
This study, to our knowledge, provides the first population-derived model, for the prediction of emergency admissions in those 40 years or older and validated on a random split-half test/data set. The estimates are based on individual history of hospital and medication use.
Existing algorithms have tended to only deal with the small subset of patients who have already experienced an admission, such as the King's Fund model,9,11 which has a discriminatory performance of 69% compared with the PEONY model, which has a discriminatory performance of 80%, similar to algorithms in coronary heart disease, such as in the Framingham Heart Study.20 This is the probability that the algorithm would correctly identify those at high risk and should be high, especially if population risk stratification is a major goal.1 In addition, the King's Fund model does not appear to penalize the selection of variables because spurious significance is likely with very large data sets and testing of large numbers of candidate factors. The superior performance of the PEONY algorithm is probably because of the use of community prescribing as markers of ill health and long-term conditions in particular. Although it is clear that the majority of patients who experience emergency admissions are in the older-than-65-year age group, this is a somewhat arbitrary cutoff since it is also clear that there is considerable NHS resource use in those younger than 65 years. In our study, 33.7% of admissions were in this age group, demonstrating that those younger than 65 years are an important group and where an intervention is likely to have greatest effect. This algorithm takes a more realistic account of this by considering those 40 years or older as potential patients for emergency admissions. It is sobering to consider that in some very deprived areas in the United Kingdom, life expectancy is on average 7 years lower than in the most affluent areas.21
Age also appears to be important in relation to other predictors of emergency admission. Factors such as previous emergency admission and antidepressant use became less strong predictors in the more elderly, suggesting that these are stronger risk factors in younger patients. Factors also differed whether the patients had had previous admissions, either emergency or planned. Many drug markers were stronger risk factors in those with no history of admissions, and risk became attenuated once admissions had occurred. The evidence from this study also points to areas where intervention might be beneficial, for example, in younger individuals with polypharmacy who have not yet experienced an admission. Previous high intensity of hospital use is clearly a strong risk for future admissions. The results also indicate that perhaps poor control in conditions such as asthma and diabetes (increasing number of prescriptions) are risk markers for future admissions.
When considering the practical use of the algorithm at scores of 47 or greater to identify high risk, PPV is good, indicating that 6 of every 10 people identified by the algorithm would have gone on to have emergency admissions without any intervention, although sensitivity is low (ie, most emergency admissions would be missed). For an average practice list size of 6000 (assuming 50% are 40 years or older), this represents about 29 people for case management. At the other extreme, a cutoff score of 23 or greater for high risk would give better sensitivity, but more resources would be wasted in that only 1 of every 5 people would have gone on to be admitted. This would represent about 790 people for case management. Where to intervene and how intensively are clearly important and would require judgment of the cost-effectiveness of different forms of intervention.
One potential weakness of this algorithm is that it does not account for other primary care factors such as intensity of use of primary care services, and we will be assessing this in further work funded by the Chief Scientist Office of Scotland to improve the performance of the algorithm with detailed practice level data. Of course, as with all predictions, the algorithm needs to be assessed on independent data sets to assess validity further.18
A next step is to evaluate the algorithm in practice because there are well-documented examples of the failure of prediction to alter management.22,23 Finally, interventions to case manage individuals at high risk need to be evaluated to assess their effect on emergency admissions,24 and evidence for any effect is still limited,25,26 although there is some evidence of success in particular conditions such as mental health,27chronic obstructive pulmonary disease,28 and vaccination against influenza.29
In conclusion, this study provides for the first time to our knowledge a validated algorithm to predict emergency admission in the next year for all patients 40 years or older. It provides a useful tool for the individual clinician in face-to-face consultation as well the general practice as a whole, and for health insurers, policy makers, and planners at the health board level faced with the increasing prevalence of hospital use. This work has immediate relevance to developing policy and service reorganization, as well as informing future intervention studies of different models of case management.
Correspondence: Peter T. Donnan, PhD, Tayside Centre for General Practice, Health Informatics Centre, Community Health Sciences, Mackenzie Bldg, University of Dundee, Kirsty Semple Way, Dundee DD2 4BF, Scotland (email@example.com).
Accepted for Publication: January 13, 2008.
Author Contributions: Dr Donnan had full access to all data and takes responsibility for the integrity of the data and the accuracy of the data analysis. Study concept and design: Donnan, Dorward, Mutch, and Morris. Acquisition of data: Donnan and Morris. Analysis and interpretation of data: Donnan. Drafting of the manuscript: Donnan, Mutch, and Morris. Critical revision of the manuscript for important intellectual content: Donnan and Dorward. Statistical analysis: Donnan. Obtained funding: Donnan and Mutch. Study supervision: Mutch and Morris.
Financial Disclosure: None reported.
Funding/Support: This study was funded by an unrestricted grant from NHS Tayside.
Additional Information: The Health Informatics Centre is a member of the Medical Research Council Health Services Research Collaboration.