Assessment of Relative Utility of Underlying vs Contributory Causes of Death

IMPORTANCE In etiological research, investigators using death certificate data have traditionally extracted underlying cause of mortality alone. With multimorbidity being increasingly common, more than one condition is often compatible with the manner of death. Using contributory cause plus underlying cause would also have some analytical advantages, but their combined utility is largely untested. OBJECTIVE To compare the relative utility of cause of death data extracted from the underlying cause field vs any location on the death certificate (underlying and contributing combined). DESIGN, SETTING, AND PARTICIPANTS This study compares the association of 3 known risk factors (cigarette smoking, low educational attainment, and hypertension) with health outcomes based on where cause of death data appears on the death certificate in 2 prospective cohort study collaborations (UK Biobank [N = 502655] and the Health Survey for England [15 studies] and the Scottish Health Surveys [3 studies] [HSE-SHS; N = 193873]). Data were collected in UK Biobank from March 2006 to October 2010 and in HSE-SHS from January 1994 to December 2008. Data analysis began in June 2018 and concluded in June 2019. smaller cohort studies, there may be too few events to facilitate effect estimate computations using the underlying field alone.


Introduction
Death records have long been collected for the purposes of monitoring the health of populations, 1,2 quantifying disease prognosis, 3,4 and evaluating the impact of primary 5 and secondary interventions. 6 To examine the influence of environmental and genetic characteristics on disease and injury events, mortality records have also been extensively deployed in ecological, 7 case-control, 8 experimental, 9 and, most frequently, prospective cohort studies. [10][11][12] The use of death records as a proxy for a health end point of interest is particularly important in contexts where linkages to other electronic health registries, such as hospital data or cancer records, are not viable, and clinical examination of study members is financially or logistically prohibitive. Linkage to death registers has the further advantage of having no additional burden on study participants. Scientific endeavors in which death data have been central to the understanding of disease etiology include the Framingham studies, where hypertension was first shown to be a risk factor for heart disease 13 and stroke 14 ; the original Whitehall study, 15 where it was demonstrated that elevated blood glucose within the normal range was related to vascular events; and the British Doctors Study 16 where, perhaps most famously, smoking was first prospectively linked to lung cancer. There are numerous other examples. 17 To accord with the World Health Organization guidelines, 18 death certificates are formulated in 2 parts. For the purposes of epidemiological research, the underlying (ie, immediate or direct) cause of death is almost exclusively extracted. Other diseases or injuries that contributed to the death but were not directly implicated appear in another section of the certificate. However, in practice, this contributory information is very rarely used. 19 With multimorbidity being common in an era of effective treatments, more than one condition can be compatible with the manner of death. 20 Analyses that use only the underlying cause of death may therefore omit valuable information that is readily available.
In estimating burden of disease, reliance on underlying cause compared with incorporating contributory causes appears to lead to underestimates of the importance of several leading causes of death. 21 However, the impact for etiological research is largely unknown. In the only study of which we are aware, 22 investigators found the same predictive capacity for classic risk factors in analyses featuring cardiovascular disease deaths irrespective of placement on the death certificate. No such comparison was made for other important causes of death. Using data from the contributory field of death may have the analytical advantage of facilitating investigation of the determinants of rarer causes of death (eg, intentional injury 23 and dementia 24 ) where, particularly in smaller cohort studies, a reliance on underlying cause alone may result in too few events to facilitate statistical computations. The value to investigators of larger studies of commonly occurring conditions might be enhanced statistical precision.
Using data from 2 large cohort studies, we examined associations of 3 known risk factors with major causes of death, including cancer, cardiovascular disease, injury, and dementia. To provide findings of interest to a range of disciplines, we used physiological (hypertension 25 ), psychosocial (educational attainment 26 ), and behavioral (cigarette smoking 27,28 ) risk factors. Our aim was to examine if these risk factors had the same magnitude of association with cause-specific mortality when end point data were extracted from the underlying field alone vs the underlying and contributory fields combined (ie, any mention).

Included Cohort Studies
We used data from UK Biobank, 29 a prospective cohort study, and a pooling of 18 identical cohort studies from the Health Survey for England (15 studies) and the Scottish Health Surveys (3 studies) (HSE-SHS). [30][31][32] These studies were selected because they offer similar, standard processes for data collection. Participants in both studies gave full informed consent. In the UK Biobank, ethical approval was received from the North West Multi-center Research Ethics Committee, and the research was carried out in accordance with the Declaration of Helsinki. 33 In HSE-SHS, ethical approval for data collection was granted by the London research ethics council or the local research ethics councils. This study analyzed existing anonymized data, and therefore, no further ethical approval was required. This report follows the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guideline.

Baseline Data Collection
The sampling and protocols of these studies have been well described. [29][30][31][32] In brief, baseline data  (Table).
Responses to history of cigarette smoking habits (ie, ever smoker vs never) and highest attained educational qualification (ie, no university degree vs Նuniversity undergraduate degree) were collapsed into binary categories for the purposes of presentation brevity. In UK Biobank, systolic and diastolic blood pressure measurements were taken twice while the participant was seated using the Omron HEM-7015IT digital blood pressure monitor (Omron Healthcare). 20 Blood pressure in the present analyses was based on the average of the 2 measurements. In HSE-SHS, blood pressure was measured using the Dinamap 8100 automated device (GE Critikon). 34 Following a 5-minute seated rest, 3 readings of systolic and diastolic blood pressure were taken from the right arm at 1-minute intervals. Blood pressure used in the present analysis was based on the mean of the second and third measurement. We defined hypertension according to existing guidelines as systolic/diastolic blood pressure of at least 140/90 mm Hg and/or use of antihypertensive medication. 35 attributable to (1) all cancers combined (codes C00-C97), (2) lung cancer (C34), (3) cardiovascular disease (I20-5, I50, I60-70, I73, and I74), (4) coronary heart disease (I20-5), (5) cerebrovascular disease (I60-9), (6) external causes (V01-Y99), and (7) dementia (F00-F02, F03, F05, F10, G30, G31, I67 and A81). 24 Where necessary, corresponding codes from earlier revisions of the ICD were used.

Ascertainment of Cause-Specific Mortality
The any mention category was a combination of the underlying and contributory cause of death fields.

Statistical Analysis
Hazard ratios (HRs) and accompanying 95% CIs were computed using Cox regression models 36

Results
In the  number of deaths in the any mention group was necessarily higher for all conditions, a differential that was least pronounced for cancer, which may reflect dissemination of the primary malignancy (ie, underlying cause).
In Figure 1, we show the age-adjusted and sex-adjusted HRs for baseline cigarette smoking status associated with deaths from cardiovascular disease, its different presentations (ie, coronary heart disease and cerebrovascular disease), and all cancers combined. The expected associations were apparent, such that ever having smoked cigarettes was associated with an elevated rate of mortality from all conditions. Within studies, the size of the effect estimates was very similar, irrespective of whether the underlying or any mention field was used, so that the RHRs all hovered around unity (P value for difference Ն .09 Owing to the higher numbers of deaths in the any mention group, statistical precision was somewhat higher, as evidenced by the tighter 95% CIs. In Figure 2 we show HRs for the association of educational attainment with the same mortality outcomes featured in Figure 1, with the addition of dementia and external causes. As anticipated, a Shaded squares indicate the hazard ratios (HRs), and error bars denote the 95% CIs for the association of smoking status with the risk of death from a range of diseases. The reference group is never having smoked cigarettes. The ratio of hazard ratios (RHR) summarizes the difference, with underlying cause as the reference group, between the effect estimate for the outcome as ascertained from different locations on the death certificate. The number of study participants and deaths in the sample used in this survival analysis is marginally lower than the full cohort owing to missing data for the exposure of interest. Shaded squares indicate the hazard ratios (HRs), and error bars denote the 95% CI for the association of educational attainment with the risk of death from a range of longterm diseases and injury. The reference group is having a university undergraduate degree or higher. The ratio of hazard ratios (RHR) summarizes the difference, with underlying cause as the reference group, between the effect estimate for the outcome as ascertained from different locations on the death certificate. The number of study participants and deaths in the sample used in this survival analysis is marginally lower than the full cohort owing to missing data for the exposure of interest.

Discussion
In this study, known associations of risk factors with an array of health end points were essentially the same irrespective of whether death data were drawn from the underlying cause field on the death certificate or a combination of underlying and contributory categories. These observations were confirmed in independent data sets. The any mention field is, as described, a combination of underlying and contributory fields. For outcomes where there is a small difference in absolute numbers of cases between these groups, such as cancer, the HRs based on analyses of each group will necessarily be nearly identical. More surprising is the similarity in effect estimates where discordance in the number of events is high, that is, for all other outcomes featured herein: cardiovascular disease (and the different presentations it comprises), dementia, and external causes.
An implication of our findings is that using the contributory field alongside the underlying cause field may have the advantage of facilitating investigation of risk factors for the occurrence of rarer forms of death where, particularly in smaller cohort studies, there may be too few events to compute effect estimates using the underlying field alone. The value in larger studies, such as those used here, might be marginally improved statistical precision.
As described, we were able to identify only 1 other study that has systematically compared the utility for etiological research of using a combination of the underlying cause and contributory cause fields on death certificates with using the underlying cause filed alone. 22 Using mortality records from the Western Electric Study, the 5 risk factors examined-age, systolic blood pressure, blood cholesterol, body mass index, and cigarette smoking status-revealed near-identical HRs for the  Shaded squares indicate the hazard ratios (HRs), and error bars denote the 95% CIs for the association of hypertension status with the risk of death from different presentations of cardiovascular disease. The reference group is not having hypertension. The ratio of hazard ratios (RHR) summarizes the difference, with underlying cause as the reference group, between the effect estimate for the outcome as ascertained from different locations on the death certificate. The number of study participants and deaths in the sample used in this survival analysis is marginally lower than the full cohort owing to missing data for the exposure of interest. association of these risk factors with cardiovascular disease mortality for the 2 sources of the mortality outcome. There are also examples of investigators who have followed this analytical process but reported their findings qualitatively only. Thus, in our previous work, 40 psychological distress was associated with an elevated risk of death as drawn from the major ICD-10 chapters, whether extracted from underlying cause on the death certificate or in combination with contributory cause. Similar results were seen when the predictive capacity of alcohol intake and obesity for liver disease was assessed, 41 when pulmonary function was associated with dementia death, 42 and when we investigated the association of neuroticism with mortality from various causes. 43

Strengths and Limitations
While the present study has strengths-its relative novelty and the comparison of results across large, well-powered studies-there are inevitably some shortcomings. Biomedical data, while available for analyses in HSE-SHS, were not available in UK Biobank at the time of analyses. It has therefore not been possible to compare the association of cholesterol fractions, glycated hemoglobin, and inflammatory markers, all linked etiologically with cardiovascular disease, 44 with mortality across different placements of cause of death on the death certificate. Based on the present results, we think it is unlikely that these risk indices will yield results very different from the patterns described here. Second, our exposure variables are known to be associated with the major stroke subtypes (ie, ischemic and hemorrhagic); however, on UK death certificates stroke subtype is usually too ill defined to be useful. 45 Therefore, we were not able to run such analyses. Third, while UK Biobank is undoubtedly rare in its scale and broad in its content, it had an unconventionally low response to its baseline survey of approximately 6%. This has prompted debates about the generalizability of its findings. 46-49 This notwithstanding, HSE-SHS, an independent data set, had response rates in the normal range (64%-78%). 50

Conclusions
Risk factor-end point associations were not sensitive to the placement of mortality data on the death certificate. Using cause of death positioned anywhere on a death certificate may have the advantage of facilitating investigation of risk factors for the occurrence of rarer forms of death where, particularly in smaller cohort studies, there may be too few events to facilitate effect estimate computations using the underlying field alone.