Outcomes of a Citywide Campaign to Reduce Medicaid Hospital Readmissions With Connection to Primary Care Within 7 Days of Hospital Discharge

This cohort study of adult Medicaid patients discharged from the hospital assesses the outcomes of a program to reduce readmissions by increasing access to timely primary care appointments after hospitalization.


Data sets
Data used in this analysis include internal programmatic data that record staff interactions with patients, patient insurance eligibility information provided by payers, Camden Health Information Exchange records, and all-payer claims data from four regional health care systems. Administrative records from these sources were linked to obtain a more complete picture of patients' medical encounters across all institutions. Each data set is described below.
A) Medicaid managed care organization (MCO) eligibility files Two Medicaid MCOs, UnitedHealthcare and Horizon NJ Health, provided monthly lists of their eligible members to the Camden Coalition of Healthcare Providers (Camden Coalition). Any individuals who appeared on these lists between January 2014 and April 2016 were included for the purposes of the data file linkage. The MCO files included 86727 records with 77194 unique combinations of gender, date of birth, first name, and last name.
Fields from the MCO eligibility files that were used in linkage: Field Description subscriber_id MCO-generated account id, nominally unique per person within a given MCO's records. In practice, individuals can have more than one id over time.

B) All-payer hospital claims data
The Camden Coalition has a dataset of hospital insurance claims from 2010 through 2016 containing 6899261 records from all inpatient, emergency, observation encounters and a limited number of outpatient encounters from regional health care systems at several sites. Allpayer claims from the following health care systems are included in the database: Cooper University Health Care, Lourdes Health System, Virtua Health System, and Kennedy Health System (now Jefferson Health).
Fields from hospital claims data that were used in linkage:

C) Camden Health Information Exchange
The Camden Coalition's 7-Day Pledge daily workflow began with triaging inpatient records from the Camden Health Information Exchange (CHIE) to identify patients eligible for the intervention. Several cleaning steps were performed on the data files before they were linked. Two or more records with identical values in all but one field were combined into a single record. Because there were no facially invalid genders or birth dates, no values for these variables were removed before linkage.

Social security numbers
After removing all non-numerals and anything with more than or less than 9 digits, Social Security Numbers (SSNs) with impermissible, or overly common default values -even where the latter otherwise follow a valid pattern -were removed because they are not unique to an individual and thus would be more likely to cause records to be improperly linked together. Removed SSN's included those ending in 4 identical digits, having a sequence of 5 or more consecutive identical digits, starting with 666 or 999, or consisting of only 2 different digits regardless of the pattern. Some sequences of numbers occurring in particular positions are never used by the Social Security Administration (SSA). These were removed, but care was taken to leave in the number patterns that are not used in "true" SSNs but which are used in non-citizen tax ID numbers or temporary numbers for those awaiting an SSN. There are also several SSNs that have been invalidated by the Social Security Administration because they have been used as examples in advertising, instructions on common documents, or other similar situations. In addition, there were some SSN anomalies specific to the data files that were also removed. Some SSN values were actually composed of 3 random or repeated digits followed by 6 digits in sequence from the date of birth.

Names
First names that were generic values indicating a newborn who had not yet been named at the time the record was created were removed. Name suffixes (e.g. "Sr."), prefixes (e.g. "Mr."), and middle initials were parsed out into separate fields since they are inconsistently captured in administrative data, adding a false specificity. Names with internal spaces ("San Gabriel", "Von Trapp", "Jo Ann" etc.) had those spaces collapsed for the same reason. Hyphenated surnames were split into "last_name1" and "last_name2". Where possible, surnames that would have been hyphenated but which had been compressed into a single name without hyphens or spaces were also split into two last name fields by comparing them to a list of known two-part names. Administrative name capture is often not well adapted to multipart surnames such as many Hispanic names, which are often recorded or provided in different order or using only one of the two parts in different times and settings. Records with artificial name values like "SEE MEMO" had those values removed before linkage.

Data linkage methodology MCO↔Claims
Each record with the same hieid or subscriber id or cleaned ssn or the same gender, date of birth, phonetic first_name, and phonetic last_name1 or the same gender, date of birth, phonetic first_name, and phonetic last_name2 was considered to belong to the same individual. When a record belonged to more than one grouping the groups with a record or records in common were merged. This transitive linkage process was repeated until all remaining groups did not overlap.
A phonetic transformation of a name substitutes standardized representations of phonemes for letter sequences in a way that emphasizes most consonants over most vowels subject to other general pronunciation rules. The phonetic transformation serves two purposes: 1) it accounts for many transcription errors from spoken information; and 2) it generalizes the names to minimize the noise introduced by many types of spelling and typographic errors. The phonetic algorithm used was "double metaphone" rather than the more common SOUNDEX because it is intended to be more sensitive to a wider variety of pronunciations. [1][2] The linkage is conservative about dates of birth and gender, more flexible with names, and literal but trusting about SSN, allowing it to be sufficient on its own to link records. Because SSN can therefore have an outsized impact on linkage, the results were reviewed to screen out any more extreme cases of false-positive over-linking that may have occurred. Eight nominally valid SSNs occurred multiple times in the dataset with an array of entirely different names, subscriber ids, dates of birth, and genders in a way that caused the formation of large groups of overlapping but clearly unrelated records. These SSNs were removed and the linkage process was repeated.
As a final step, the linked records were compared on SSN, gender, birthdate, and name combinations as described above to the previously linked claims records so that any groups of eligibility records that matched to the same linked group in the claims data or any one eligibility group that was matched to more than one claims linked group ( total n = 306) was inspected for possible false positive links and split apart if needed based on clerical review.

Data linkage methodology (MCO↔Claims)↔CHIE
After the linkage between MCO and Claims data sets was complete, we obtained a dataset consisting of hospital encounters for the population listed in the MCO Eligibility Files. We then linked that data to data in the CHIE. The goal was to verify that each record in the client tracking system did exist in the hospital claims data. We performed a strict deterministic linkage at this stage with the following matching conditions: 1. Subscriber_ID, last name, first name, date of birth, and gender should be matched; 2. Visit type must be inpatient; 3. Admit date and discharge date should be matched exactly.
After this linkage, we found that around 90% of our records could be found exactly in the hospital claims data. For the unmatched 10%, we further diagnosed and grouped them into three categories: 1. Patients' identifiers matched, admit dates and discharge dates matched, but visit types were not matched; 2. Patients' identifiers matched, visit types matched, but either admit date or discharge date were not matched; 3. Patients' identifiers matched, but none of the visit types, admit dates, or discharge dates matched.
Our care team staff further looked into the electronic medical records to diagnose situations 1 and 2 above. We found that for visit type, it was difficult to determine which data source was more reliable since some dates were correct in the hospital claims data and some were correct in the CHIE. For the dates mismatched in 2, we found that hospital claims data were more reliable, and therefore updated the dates based on the dates from the hospital claims data. For situation 3, we applied a "3-day rule," which meant that if we found an inpatient record from the hospital claims data for the same patient that differed by only three days from our CHIE dates, we believed they were the same records and we used the dates from the hospital claims data.

eAppendix 2. Logistic regression modeling to produce propensity scores
We used logistic regression to develop the propensity scores for our analysis. The The Quan index, used to capture the count and severity of medical chronic conditions, and mental health and substance use chronic conditions, was calculated using the ICD9 and ICD10 diagnostic codes in the claims data. We grouped the diagnostic codes based on the clinical classification methods from the Agency for Healthcare Research and Quality (AHRQ). [3][4][5][6] Each group category was counted at most once. We used the R package ICD to compute the Quan index.
We performed bivariate analysis to determine the form of each covariate to include in our model. Decisions were based on empirical distributions and each variable's association with post-discharge primary care appointment timing, the dependent variable. In some cases (e.g. number of emergency department visits and number of chronic mental health conditions) we grouped the data to smooth the extreme effect of outliers. We maintained the original distribution for the number of substance use diagnoses because of the large differences in the 7-day primary care appointment rate for patients with 0, 1, and 2 diagnoses: 355 out 1098 (32%) of patients with 0 substance use diagnoses had a primary care appointment within 7 days of discharge, compared to 86 of 350 (25%) for patients with 1 substance use chronic condition, and 9 of 83 (11%) patients with 2 substance-related chronic conditions.
To build our full model and generate propensity scores, we used forward selection based on the Akaike Information Criterion (AIC). 7 The details of our logistic regression model and a comparison of the fit of the model with the raw values to the fit of the model with the transformed values are shown in the table below. Propensity scores were produced from the model with the transformed covariates. Although age, Quan index, and prior hospitalizations showed no significant relationship to the timing of a post-discharge primary care appointment, they were included in the final model to produce the propensity scores because of their clinical relevance.
Binary logistic regression results examining the relationship between timing of post-discharge primary care follow-up and covariates in raw and transformed format (n=1531) a

eAppendix 3. Standard error calculation for nearest neighbor propensity score matching
For each patient in the treatment group, we used nearest neighbor matching with replacement to identify five matched referents in the non-treatment pool. To quantify uncertainty, we used a method for calculating standard error for nearest neighbor matching without replacement discussed in the work of Michael Lechner. [10][11] The formula is as follows: Where N1 is the number of matched treated individuals, _ is the number of times individual j from the control group has been used, and a is the number of nearest neighbors that we select for each treatment record. Lechner found little difference between bootstrapped variances and the variances calculated according to the above equation. [10][11]

eAppendix 4. Return on investment calculation
To estimate return on investment for the 7-DP program, we considered fixed costs and variable costs, along with the estimated savings associated with avoided hospitalizations.
• Fixed costs to run the program for a year were $230000.
• The variable costs associated with each successful 7-day primary care visit were $200, made up of a $150 incentive payment, a $22.50 gift card for the patient (includes fees not passed on to the patient for a $20 gift card), and a round trip taxi which we estimate at $27.50. Taxi fare is a conservative estimate, as only about 25% of patients required a taxi and many rides were cheaper than this estimate. • Based on the analysis of claims data during the period evaluated, we estimated that each avoided hospitalization created an average of $10300 in cost savings.
Using these figures, we applied the following reasoning to determine the number of patients needed to treat in a year for the program to break even: • Based on the analysis presented in the paper, the treatment 90-day average number of admissions was 0.502 and the non-treatment 90-day average was 0.629 for a difference of .127 admissions. Each successful visit was associated with a 0.127 inpatient reduction within a 90 day period. • For X 7-day visits, 0.127*X inpatient admissions will be avoided.
• Estimated cost per avoided hospital admission is $10300.
• If we assume 30% 7-day appointment success rate per year, to achieve 208 successful visits, we will need 208/0.3=694 attempted engagements. • 694 engagements per year works out to just under 3 patients per day based on 20 work days in a month. This is a manageable benchmark given the same number and level of staff around which the fixed costs were estimated. a. The rate is the number of records for which the first associated readmission occurred within 30 or 90 days of discharge divided by the total number of records in the appropriate category (treatment or non-treatment). b. The total number of readmissions is the number of readmissions associated with an index discharge that occurred within 30 or 90 days of that discharge. The numerator for the mean is the count of all readmissions for the specified category (treatment or non-treatment); the denominator is the total number of records for that category. c. Differences were calculated as the proportion of records in the non-treatment pool, matched pairs treatment group, or reweighted treatment groups with any readmission occurring in 30 or 90 days of index discharge, or the mean number of readmissions in 30 or 90 days, minus the proportion or mean in the treatment group. eTable 3. Readmission results for sensitivity analysis excluding patients who refused services or were unreachable in the hospital or following discharge from the hospital a a. This analysis excludes 52 cases for which the patient refused to schedule an appointment through the program and 219 cases for which program staff were unable to reach the patient in the hospital or after discharge. If the patient was unreachable in the hospital, program staff attempted to reach the patient for up to 7 days following discharge by phone or through a letter sent to the patient's home address. b. The rate is the number of records for which the first associated readmission occurred within 30 or 90 days of discharge divided by the total number of records in the appropriate category (treatment or non-treatment). c. The total number of readmissions is the number of readmissions associated with an index discharge that occurred within 30 or 90 days of that discharge. The numerator for the mean is the count of all readmissions for the specified category (treatment or non-treatment); the denominator is the total number of records for that category. d. Differences were calculated as the proportion of records in the non-treatment pool, matched pairs treatment group, or reweighted treatment groups with any readmission occurring in 30 or 90 days of index discharge, or the mean number of readmissions in 30 or 90 days, minus the proportion or mean in the treatment group.