Trends and Variations in Emergency Department Use Associated With Diabetes in the US by Sociodemographic Factors, 2008-2017

Key Points Question Has emergency department (ED) use for diabetes-related illness changed over time in the United States, and does it vary by state, level of urbanization, race and ethnicity, and insurance type? Findings In this serial cross-sectional study of 32 million ED visits from 2008 to 2017, the rates of diabetes-related ED use increased across all geographic areas and subgroups studied. Racial and ethnic disparities and rural and urban disparities in ED use varied widely across states and remained significantly different within states, with Black adults having a mean of approximately 3 times as many diabetes-specific ED visits as White adults. Meaning This study suggests that, despite health reforms during the past decade, disparities in diabetes-related ED use persisted from 2008 to 2017, warranting further study and policy action to address access and underlying social determinants of health.


eTable 1. Missing Observation and Percentage Missing by Variable for NEDS Dataset
Within the NEDS datasets, we observed missingness primarily in patient location (rural/urban), and the primary payer variables. Our year specific NEDS datasets did not have missingness >1% for any of our analytical variables (eTable 1). Analyses conducted using NEDS data were conducted using complete case selection. 0.63% of observations within the combined NEDS datasets had any missing value for analytical variables and were excluded for complete case analysis.

eTable 2. Descriptive Statistics of NEDS Discharge Records With Complete Data Compared to Data With Missing Observations for ≥1 Analytic Variable
Discharge records within the NEDS dataset with missing values were more likely to be from patients that were rural (missing: 29.9%; complete: 20.0%), Medicaid-insured (missing: 24.7%; complete: 18 Dataset generated from National Emergency Department Sample data, years 2008, 2011, 2014, 2016. All-cause diabetes visits include all discharge records w ith a diabetes-related diagnosis, w hile diabetes-specific visits include all visits w ith a prinicpal diagnosis of a diabetes specific condition/complication.

eAppendix 1. Study Selection and Missing Data Methods
Within the SEDD datasets, we observed missingness primarily in race, patient location (rural/urban), and the primary payer variables Within the SEDD datasets, missingness for race for the imputed state-year datasets ranged from 0.03% to 40.23%, as observed in North Carolina. Race-specific estimates for North Carolina in 2008 were not reported because of data quality issues reported by HCUP. The missingness for the patient location variable remained below 3% in all datasets, while primary payer missingness remained below 2%. 0.59% of the observations within the combined SEDD datasets had data missing for analytical variables (eTable 3).
To impute these missing observations, we used an algorithm proposed and evaluated by Ma, Zhang, Lyman, Huang 1,2 . Ma et al., focus explicitly on imputing missing data in the HCUP SID making it an ideal approach for our work. They show that conditional multiple imputation (MI) is the optimal approach when using HCUP data. We applied their algorithm to impute data for variables missing greater than 1% per year. The algorithm was applied to each state by year dataset. We did not impute value for variables that were completely missing each year and state.
Consequently, states such as NE which did not collect race data were not imputed and excluded from our statespecific race estimates. Further, states such as NC and IA that had race data which was not uniformly coded throughout years were imputed however, race-specific rates were not reported in our state specific race estimates.
As noted by Ma et al., it is impossible to directly test whether the data are missing not at random (MNAR). For practical purposes we similarly assume that the data are missing at random (MAR) which has been recommended for practical applications especially when there are a large number of predictors of that can be included in the imputation model. [3][4][5] the HCUP data itself includes many quality predictors but following MA et al., we additionally include information on racial and socioeconomic status distributions from the U.S. Census and information on hospital characteristics the American Hospital Association database.
Conditional MI imputes data variable by variable rather than using a joint distribution. The benefit of this approach that it allows different types of variables to be modeled separately (e.g., categorical versus continuous variables).
We use the MICE algorithm in R. 6  Identifying analogous population figures to match subgroup populations broken down by rural/urban status, race, and region in the HCUP data was straightforward. To align insurance definitions with ACS population denominator estimates, if an ACS participant had multiple insurance providers, we considered their age, disability status, and employment status to determine likely primary payer based on a 2018 Medicare User Guide. 10 We recoded insurance categorization in HCUP to align with the ACS estimates using coding guidelines provided by methodology reports published by AHRQ. 11 The IPUMS ACS provided rural/urban variable had high missingness (>10%). We replaced/imputed the variable with the rural-urban continuum codes published by the United States Department of Agriculture Economic Research Center, which allocates all Public Use Microdata Area data to rural/urban categorization based on metro and nonmetro population shares. 12 Calculation of Crude Population Utilization Rates by Subpopulation NEDS Counts and standard errors of inpatient and emergency department events generated using NEDS data were weighted and adjusted for complex survey design factors. HCUP provides discharge weights for the NEDS that are used to generate nationally representative estimates of ED utilization. Thus, the crude diabetes-related ED utilization rate for a given demographic/insurance group using NEDS data is given by equation (1), Where wi is the discharge weight associated with the patient i, and vi is the variable of interest (i.e. rural, 1=rural, 0=not rural). N represents the population estimate for the variable of interest. 13 Variance for survey-weighted estimates were estimated using Taylor Series Linearization methodology provided in the Survey package for R. 14 We assumed constant denominator population counts, 13 consequently the standard error of the crude rate was calculated as: = 10,000 / Where SEs is the standard error of the estimated count.

SEDD
Although the SEDD are considered a census of visits to hospitals in a given state, because we are estimating agestandardized estimates and are thus describing a different estimate than the direct rate of use within a state for a given year, we considered the state databases to be samples from a "super-population" for the purposes of variance estimation. We assumed a Poisson distribution of counts for purpose of variance estimation. 15,16 The state databases are not weighted as each file contains nearly all discharges from nearly all hospitals within a given state, thus the crude ED utilization rate using SEDD data is given by equation (2), Where S is the number of estimated counts, and N is the corresponding denominator population eFigure. We assumed constant denominator population counts, consequently the standard error of rates was calculated as: = 10,000 / Where SEs is the standard error of the estimated count.

Standardization of Rates
Standardized rates were calculated using the direct method. Standardized rates reflect rates that would be expected for the observed study population if it had the same population distribution as the standard population.

Standard Population
All rates excluding age-specific and insurance-specific rates are age standardized using standard population distribution of the U.S. Adult Population. Standard population data of national age distribution was collected from the 2010 CDC National Mortality Database. We estimated the observed crude rate for each of the for each of the following age-specific bins for each demographic group: 18-29, 30-44, 45-64, 65-74, 75+. Standard population estimates for the age distribution in the United States were generated from the 2010 CDC National Mortality Database. 17

Calculating Standardized Rates and Standardized Rate Error Estimates
We calculated the weighted average of bin-specific observed rates using weights proportional to the percentage of the standard population in age-specific cell.

= = ∑
Where i = 1 to 5 standard population bins, wi is the proportion of the standard population in population group i of the total standard population and Pi is the observed age-specific crude rate. 18 The variance of standardized rates was calculated as: Where i = 1 to 5 standard population bins, wi is the proportion of the standard population in population group i of the total standard population, and Pi is the estimated age-specific crude rate.
95% confidence intervals for standardized rates were calculated as: ( − 1.96 * √ , + 1.96 * √ ) Age and Insurance Specific Rates Age-specific rates and insurance-specific rates were not standardized because of the age-distributions within these groups. Insurance-specific crude rates were reported because direct standardization is unstable in the case of a zero-cell count (i.e., # of diabetes-specific ED visits in Vermont, ages 18-29, Medicare population). Therefore, insurance-specific rates do not control for population age composition and variability between states in these rates may be a result of variation in age distribution between geographies. 19 Rate Ratios To compare rates of ED, use between groups and by diabetes status, we calculated rate ratios. Where 1 is the rate being compared and 2 is the reference rate. Confidence intervals for standardized rate ratios were then estimated assuming a log-normal distribution. 20 We generated 95% confidence intervals around our rate ratios by first estimating the variance of the RR: 3.2 (2.9, 3.5) a Dataset generated from National Emergency Department Sample data, years 2008, 2011, 2014, and 2016. All-cause diabetes visits include all discharge records with a diabetesrelated diagnosis while diabetes-specific visits include all visits with a principal diagnosis of a diabetes specific condition/complication. b Continuous variables reported as mean (sd) and proportions presented as proportion (95% C.I.)

eAppendix 3. Analysis by Diabetes Status
We also estimated rates of ED use among people with diabetes and without diabetes using the NEDS. Diabetes status was determined in NEDS if any recorded discharge diagnosis code had an ICD-9 code (250.XX) or ICD-10 (E10.XXX, E11.XXX, E13.XXX) code indicative of diabetes. Events consistent with both diabetes diagnosis and ages >=18 were selected. We estimated health-care use among those without diabetes by selecting all discharge records which did not have presence of an ICD-9 or ICD-10 code indicative of diabetes. The denominator population estimates were generated using CDC's Behavioral Risk Factor Surveillance Survey (BRFSS), which includes a variable defining self-reported diabetes status. We used the BRFSS to estimate the national US adult population with and without diabetes.
All rates are age-direct standardized using a standard population distribution of the U.S. Adult Population in 2010 using the 2010 US Adult Population using estimates available from the CDC Mortality Database. Standardized rate ratios (SRRs) were calculated across years, using 2008 service utilization rates as the reference. SRRs were also calculated by diabetes status, using non-diabetes service utilization rates as the reference.