Global, Regional, and National Cancer Incidence, Mortality, Years of Life Lost, Years Lived With Disability, and Disability-Adjusted Life-Years for 29 Cancer Groups, 1990 to 2017

This systematic analysis describes cancer burden for 29 cancer groups across 195 countries from 1990 through 2017 to provide data needed for cancer control planning.


Bias of categories of input data
Bias of the input data included for the COD database is described elsewhere. 13 Cancer registry data can be biased in multiple ways. A high proportion of ill-defined cancer cases in the registry data requires redistribution of these cases to other cancers, which introduces a potential for bias. Changes between coding systems can lead to artificial differences in disease estimates; however, we adjust for this bias by mapping the different coding systems to the GBD causes. Underreporting of cancers that require advanced diagnostic techniques (e.g., leukemia, brain, pancreatic, and liver cancer) can be an issue in cancer registries from low-income countries. On the other hand, misclassification of metastatic sites as primary cancer can lead to overestimation of cancer sites that are common sites for metastases like brain or liver. Since many cancer registries are located in urban areas, the representativeness of the registry for the general population can also be problematic. The accuracy of mortality data reported in cancer registries usually depends on the quality of the vital registration system. If the vital registration system is incomplete or of poor quality, the mortality-to-incidence ratio can be biased to lower ratios.

Data analysis
Flowcharts describing the conceptual overview of the data processing are available in eFigure 1 and eFigure 2.

Cancer registry data formatting
Cancer registry data went through multiple processing steps before integration with the COD database. First, the original data were transformed into standardized files, which included standardization of format, categorization, and registry names (#1 in eFigure 1). Second, some cancer registries report individual codes as well as aggregated totals (e.g., C18, C19, and C20 are reported individually, but the aggregated group of C18-C20 (colorectal cancer) is also reported in the registry data). The data processing step, "subtotal recalculation" (#2 in flowchart), verifies these totals and subtracts the values of any individual codes from the aggregates. In the third step (#3 in the flowchart), cancer registry incidence data and cancer registry mortality data are mapped to GBD causes. A different map is used for incidence and for mortality data because of the assumption that there are no deaths for certain cancers. One example is basal cell carcinoma of the skin. In the cancer registry incidence data, basal cell carcinoma is mapped to non-melanoma skin cancer (basal cell carcinoma). However, if basal cell skin cancer is recorded in the cancer registry mortality data, the deaths are instead mapped to non-melanoma skin cancer (squamous cell carcinoma) under the assumption that they were indeed misclassified squamous cell skin cancers. Other examples are benign or in situ neoplasms. Benign or in situ neoplasms found in the cancer registry incidence dataset were simply dropped from that dataset since cancer registries do not collect non-malignant neoplasms in a standardized way. The same neoplasms reported in a cancer registry mortality dataset were mapped to the respective invasive cancer (e.g., melanoma in situ in the cancer registry incidence dataset was dropped from the dataset; melanoma in situ in the cancer registry mortality dataset was mapped to melanoma). Mapping for incidence and mortality data can be found in eTable 4 and eTable 5. In the fourth data processing step (#4 in the flowchart), cancer registry data were standardized to the GBD age groups. Age-specific incidence rates were generated age weights from administrative claims data as specified in appendix section 2.1.5 (James SL, Abate D, Abate KH, et al. Global, regional, and national incidence, prevalence, and years lived with disability for 354 diseases and injuries for 195 countries and territories, 1990-2017: a systematic analysis for the Global Burden of Disease Study 2017. The Lancet. 2018;392(10159):1789-1858. doi:10.1016/S0140-6736(18)32279-7), 14 while age-specific mortality rates were generated from the CoD data. 13 Age-specific weights were then generated by applying the age-specific rates to a given registry population that required age-splitting to produce the expected number of cases/deaths for that registry by age. The expected number of cases/deaths for each sex, age, and cancer were then normalized to 1, creating final, age-specific proportions. These proportions were then applied to the total number of cases/deaths by sex and cancer to get the agespecific number of cases/deaths. In the rare case that the cancer registry only contained data for both sexes combined, the age-specific cases/deaths were split and reassigned to separate sexes using the same weights that are used for the age-splitting process. Starting from the expected number of deaths, proportions were generated by sex for each age (e.g., if for ages 15-19 years old there are 6 expected deaths for males and 4 expected deaths for females, then 60% of the combined-sex deaths for ages 15-19 years would be assigned to males and the remaining 40% would be assigned to females). In the fifth step (#5 in the flowchart), data for cause entries that are aggregates of GBD causes were redistributed. Examples of these aggregated causes include some registries reporting ICD10 codes C00-C14 together as, "lip, oral cavity, and pharyngeal cancer." These groups were broken down into subcauses that could be mapped to single GBD causes. In this example, those include lip and oral cavity cancer (C00-C08), nasopharyngeal cancer (C11), cancer of other parts of the pharynx (C09-C10, C12-C13), and "Malignant neoplasm of other and ill-defined sites in the lip, oral cavity, and pharynx" (C14). To redistribute the data, weights were created using the same method employed in age-sex splitting (see step four above). For the undefined code (C14 in the example) an "average all cancer" weight was used, which was generated by adding all cases from SEER/NORDCAN/CI5 and dividing those by the combined population. Then, proportions were generated by subcause for each aggregate cause as in the sex-splitting example above (see step four). The total number of cases from the aggregated group (C00-C14) was recalculated for each subgroup and the undefined code (C14). C14 was then redistributed as a insufficiently specific code in step six. Distinct proportions were used for C46 (Kaposi sarcoma). C46 entries were redistributed as "other cancer" and HIV. In the sixth step (#6 in the flowchart), unspecified codes ("garbage code") were redistributed. Redistribution of cancer registry incidence and mortality data mirrored the process of the redistribution used in the cause of death database and has not changed compared to GBD 2013. 19 In the seventh step (#7 in the flowchart), duplicate or redundant sources were removed from the processed cancer registry dataset. Duplicate sources were present if, for example, the cancer registry was part of the CI5 database but we also had data from the registry directly. Redundancies occurred and were removed as described in "Inclusion and Exclusion Criteria," where more detailed data were available, or when national registry data could replace regionally representative data. From here, two parallel selection processes were run to generate input data for the MI models and to generate incidence for final mortality estimation. Higher priority was given to registry data from the most standardized source when creating the final incidence input, whereas for the MI model input, only sources that reported incidence and mortality were used. In the eighth step (#8 in the flowchart), the processed incidence and mortality data from cancer registries were matched by cancer, age, sex, year, and location to generate MI ratios. These MI ratios were used as input for a three-step modeling approach using the general GBD ST-GPR 17 approach with the HAQ Index as a covariate in the linear step mixed effects model using a logit link function. 20 logit � , , , � = + β1 , + � β2 + β3 + ϵ , , , c: country, a: age group, t: time (years); s: sex HAQI: Healthcare Access and Quality index I: indicator variable ϵc,a,s,t: error term This is different compared to GBD 2016, where we used the Socio-demographic Index (SDI) as a predictor. Predictions were made without the random effects. The ST-GPR model has three main hyperparameters that control for smoothing across time, age, and geography. The time adjustment parameter ( ) was set to 0.07, which aims to borrow strength from neighboring time points (i.e., the exposure in this year is highly correlated with exposure in the previous year but less so further back in time). The age adjustment parameter ω was set to 1, which borrows strength from data in neighboring age groups. The space adjustment parameter was set to 0.02. Zeta aims to borrow strength across the hierarchy of geographical locations. 18 For the amplitude parameter in the Gaussian process regression we used 1 and for the scale we used a value of 15.
The data cleaning has remained the same as in GBD 2016 where we excluded data based on the SDI quintile categorization. For each cancer, MI ratios from locations in SDI quintiles 1-4 (low to high-middle SDI) were dropped if they were below the median of MI ratios from locations in SDI quintile 5 (high SDI). We also dropped MI ratios from locations in SDI quintiles 1-4 if the MI ratios were above the third quartile + 1.5 * IQR (inter-quartile range). We dropped all MIR that were based on less than 25 cases to avoid noise due to small numbers except for mesothelioma and acute lymphoid leukemia, where we dropped MIR that were based on less than 10 cases because of lower data availability for these two cancers. We also aggregated incidence and mortality to the youngest five-year age bin where we had at least 50 data points to avoid MIR predictions in young age groups that were based on few data points. The MIR in the age-bin that was used to aggregate MIR was used to backfill the MIR for younger age groups. Since MI ratios can be above 1, especially in older age groups and cancers with low cure rates, we used the 95 th percentile of the cleaned dataset that only included MIR that were based on 50 or more cases to cap the MIR input data. This "upper cap" was used to allow MIR over 1 but to constrain the MIR to a maximum level. To run the logit model, the input data were divided by the upper caps and model predictions after ST-GPR was rescaled by multiplying them by the upper caps. Upper caps used for GBD 2017 were the following: To constrain the model at the lower end, we used the 5 th percentile of the cancer-specific cleaned MIR input data to replace all model predictions with this lower cap. Final MI ratios were matched with the cancer registry incidence dataset in the ninth step (#9 in the flowchart) to generate mortality estimates (Incidence * Mortality/Incidence = Mortality) (#10 in the flowchart). The final mortality estimates were then uploaded into the COD database (#11 in the flowchart). Cancer-specific mortality modeling then followed the general CODEm process.

Cause of death database formatting
Formatting of data sources for the cause of death database has been described in detail elsewhere (#11 in the flowchart). 13

CODEm models
Mortality estimates for each cancer were generated using CODEm (#12 in the flowchart). Methods describing the CODEm approach have been described elsewhere. 2,21 In brief, the CODEm modeling approach is based on the principles that all types of available data should be used even if data quality varies; that individual models but also ensemble models should be tested for their predictive validity; and that the best model or sets of models should be chosen based on the out of sample predictive validity. Models were run separately for countries with extensive and complete vital registration data and countries with less VR data to prevent an inflation in the uncertainty around the estimates in "datarich" countries. Covariates were selected based on a possible predictive relationship between the covariate and the specific cancer mortality. Level 1 covariates have a proven strong relationship with the outcome such as etiological or biological roles. Level 2 covariates have a strong relationship but not a direct biological link. Covariates that are more distal in the causal chain or are mediated through Level 1 or 2 covariates are categorized as Level 3. 21 Differences in covariate selection between GBD 2016 and GBD 2017 by cause and direction of the covariate can be found in eTable 9.

Liver cancer etiology split models
For GBD 2017, the etiologies for liver cancer were expanded to include a separate etiology of liver cancer due to non-alcoholic steatohepatitis (NASH). To find the proportion of liver cancer cases due to the five etiology groups included in GBD (1. Liver cancer due to hepatitis B, 2. Liver cancer due to hepatitis C, 3. Liver cancer due to alcohol, 4. Liver cancer due to NASH, 5. Liver cancer due to other causes), a systematic literature search was performed in PubMed on 10/24/2016 using the following search string: "("liver neoplasms"[All Fields] OR "HCC"[All Fields] OR "liver cancer"[All Fields] OR "Carcinoma, Hepatocellular" [Mesh]) AND (("hepatitis B"[All Fields] OR "Hepatitis B" [Mesh] OR "Hepatitis B virus" [Mesh] OR "Hepatitis B Antibodies" [Mesh] OR "Hepatitis B Antigens"[Mesh]) OR ("hepatitis C"[All Fields] OR "Hepatitis C" [Mesh] OR "hepatitis C antibodies" [MESH] OR "Hepatitis C Antigens" [Mesh] OR "Hepacivirus"[Mesh]) OR ("alcohol"[All Fields] OR "Alcohol Drinking" [Mesh] OR "Alcohol-Related Disorders" [Mesh] OR "Alcoholism" [Mesh] OR "Alcohol-Induced Disorders"[Mesh])) NOT (animals [MeSH] NOT humans [MeSH])". Also, studies not found through this search but included in the meta-analysis by de Martel et al, were included. 22 We also included the study by Hong et al, after the authors provided us with additional data on the overlap in risk factors. 23 Studies were included if the study population was representative of liver cancer population for the respective location. For each study, the proportions of liver cancer due to the five specific risk factors were calculated. Cases were considered to be due to NASH when the manuscript explicitly listed the etiology to be NASH or non-alcoholic fatty liver disease (NAFLD). Cases where the etiology was listed as "cryptogenic," "idiopathic," or "unknown" were included within the "other causes" category. In manuscripts where the etiology for a case was not known but major categories could not be ruled out (for example, the study tested for hepatitis B and C, but did not assess alcohol use), these cases were excluded from the numerator of the study (in other words, did not contribute a proportion to any etiology). Remaining risk factors were included under a combined "other" group (for example, hemochromatosis, autoimmune hepatitis, Wilson's disease, etc.). If multiple risk factors were reported for an individual patient, these were apportioned proportionally to the individual risk factors. The proportion data found through the systematic literature review were used as input for five separate DisMod-MR 2.1 models to determine the proportion of liver cancers due to the five subgroups for all locations, both sexes, and all age groups (step #16 in the flowchart). A study covariate was used for publications that only assessed liver cancer in a cirrhotic population. The reference or "gold standard" that was used for crosswalking was the compilation of all studies that assessed the etiology of liver cancer in a general population. For liver cancer due to hepatitis C and hepatitis B, a prior value of 0 was set between age 0 and 0.01. For liver cancer due to alcohol, a prior value of 0 was set for ages 0 to 5 years. For liver cancer due to hepatitis C, hepatitis C (IgG) seroprevalence was used as a covariate as well as a covariate for alcohol (liters per capita), hepatitis B prevalence (HBsAg seroprevalence), and NASH/NAFLD prevalence, forcing a negative relationship between the alcohol, hepatitis B, hepatitis C, and NASH/NAFLD covariates and the outcome of liver cancer due to alcohol proportion. For liver cancer due to hepatitis B, seroprevalence of HBsAg was used as a covariate as well as a covariate for alcohol, hepatitis C IgG seroprevalence, NASH/NAFLD prevalence, and the population coverage of three-dose Hepatitis B vaccination, forcing a negative relationship between these covariates and the outcome of liver cancer due to hepatitis B proportion. For liver cancer due to alcohol, alcohol (liters per capita) was used as a covariate as well as a covariate for proportion of alcohol abstainers, hepatitis B and hepatitis C seroprevalence, and NASH/NAFLD prevalence, forcing a negative relationship between the proportion of alcohol abstainers, NASH/NAFLD, and hepatitis B and hepatitis C covariates and the outcome of liver cancer due to alcohol proportion. For liver cancer due to NASH, NASH/NAFLD prevalence was used as a covariate as well as a covariate for obesity prevalence and mean body mass index (BMI), forcing a positive relationship between these covariates and the outcome of liver cancer due to NASH proportion. All covariates used were modeled independently. To ensure consistency between cirrhosis and liver cancer estimates and to take advantage of the data for the respective other related cause (e.g., liver cancer due to hepatitis C and the related cause cirrhosis due to hepatitis C), we generated covariates from the liver cancer proportion models that we used in the cirrhosis etiology proportion models. We then created covariates from the cirrhosis etiology proportion models and used those in the liver cancer etiology models.
Since the proportion models are run independently of each other, the final proportion models were scaled to sum to 100% within each age, sex, year, and location, by dividing each proportion by the sum of the five (step # 17). For the liver cancer subtype mortality estimates, we multiplied the parent cause "liver cancer" by the corresponding scaled proportions (step # 18). Single cause estimates were adjusted to fit into the separately modeled all-cause mortality in the process CoDCorrect.

CoDCorrect
CODEm models estimate the individual cause-level mortality without taking into account the all-cause mortality (#13 in the flowchart). To ensure that all single causes add up to the all-cause mortality and that all child-causes add up to the parent cause, an algorithm called "CoDCorrect" is used (#14 and #15 in the flowchart). Details regarding the algorithm can be found elsewhere. 13

Incidence estimation
GBD cancer incidence estimates were generated by dividing final mortality estimates (after CoDCorrect adjustment) by the MI ratio for the specific cancer (#1 eFigure 2). To propagate uncertainty from the MI ratios and the mortality estimates to incidence, this process was done at the 1,000-draw level. It was assumed that uncertainty in the MI ratio is independent of uncertainty in the estimated age-specific death rates.

Prevalence and YLD estimation
Prevalence is estimated as 10-year prevalence for all cancers. After transforming the final GBD cancer mortality estimates to incidence estimates (step 1 in the flowchart), incidence was combined with the relative yearly survival estimates up to 10 years (step 7 in the flowchart). For GBD 2017 we updated our methods to more directly utilize MIRs to generate these yearly cancer relative survival estimates. Previous reports suggest that the value of (1 -MIR) may serve as a proxy for 5-year relative survival, with the exact correlation varying slightly by cancer type. 24 We used SEER*Stat to obtain national mortality, incidence, and relative survival statistics from the nine SEER registries reporting from 1980 to 2014 (step 2), by cancer type, sex, 5-year blocks (i.e., 1980-1984, 1985-1989, etc.), and 5-year age groups (except combining 80+). For each cancer, we modeled 5-year relative survival with the SEER MIRs using Poisson regression, weighted by the number of incident cases (step 3). To reduce variability due to small samples, we only included MIRs based on at least 25 incident cases (except for the rarer cancers mesothelioma, nasopharyngeal cancer, and acute myeloid leukemia, where MIRs based on at least 10 cases were included). These models were then applied to the GBD MIR estimates to predict an estimated 5-year survival for each age/sex/year/location (winsorized to between 0 and 100% survival; step 4). To obtain yearly survival estimates up to 10 years, we compared these estimates to the SEER sex-specific all-ages relative survival statistics from 2004 (the latest year with 10-year survival available). The proportion of the predicted GBD survival estimate to the SEER survival statistic was used to scale the SEER 10-year relative survival curve for each country (step 5).
To transform relative to absolute survival (adjusting for background mortality), GBD 2017 lifetables were used (step 6 and 7 in the flowchart) to calculate lambda values: lambda= (ln(nLxn/nLxn+1))/5, where nLx=person years lived between ages x and x+n (from GBD lifetable). Absolute survival was then calculated using an exponential survival function (absolute survival = relative survival * e lambda*t ).
Survivors beyond 10 years were considered cured. The survivor population prevalence was divided into two sequelae (1. diagnosis and primary therapy; 2. controlled phase). The yearly prevalence of the population that did not survive beyond 10 years was divided into the four sequelae by assigning the fixed durations for each of the diagnosis and primary therapy phase, metastatic phase, and terminal phase, and assigning the remaining prevalence to the controlled phase (step 9 in the flowchart). Duration of these four sequelae remained the same as for GBD 2016. eTable 12 lists the duration of each, along with the sources used to determine their length. YLDs were calculated by multiplying each phase with the respective disability weight (eTable 13). To generate the total YLDs for each cancer (with the exception of cancers where additional disability is added due to procedures -see next paragraph) the YLDs for each cancer sequela were added (step 13 in eFigure 2).
Additional disability was estimated for breast cancer (disability due to mastectomy), larynx cancer (disability due to laryngectomy), colon and rectum cancer (disability due to stoma), bladder cancer (disability due to incontinence), and prostatectomy (disability due to incontinence and impotence) (#10 in eFigure 2). Hospital data were used to estimate the number of cancer patients undergoing mastectomy, laryngectomy, stoma, prostatectomy, and cystectomy. These proportions remained the same as in GBD 2013, GBD 2015, and GBD 2016 and were used as input for proportion models that were run in DisMod-MR 2.1 (#9 in eFigure 2). 24 The procedure proportion (proportion of cancer population that undergoes procedures) from hospital data was used as input for a proportion model in DisMod-MR 2.1 in order to estimate the proportions for all locations, by age, and by sex.
Since colostomy or ileostomy procedures are done for reasons other than cancer, a literature review was done to determine the proportion of ostomies due to colorectal cancer. The "all cause" colostomy proportions were multiplied by 0.58 based on the results of the literature review showing that on average 58% of ostomies are done for colorectal cancer. [27][28][29] The final procedure proportions were applied to the incidence cases of the respective cancers and multiplied with the proportion of the incidence population surviving for 10 years to determine the incident cases of the cancer population that underwent procedures and that survived beyond 10 years. These incident cases were used again as an input for DisMod-MR 2.1, with a remission specification of zero and an excess mortality rate prior of 0 to 0.1, as well as with increasing the age of the population and the year by 10 years to reflect prevalence after that population has survived 10 years. This approach was updated compared to GBD 2016, where we did not include an age or time shift. The results from this model are incidence and lifetime prevalent cases of persons with these cancer-related sequelae who have survived beyond 10 years.
Since disability associated with prostatectomy comes from impotence and incontinence, and not from the prostatectomy itself, 18% of the prostatectomy prevalence was assumed to have incontinence and 55% was assumed to have impotence, based on a literature review done for GBD 2013. [30][31][32][33][34][35][36][37] Cases were assigned disability for either impotence or incontinence, but no cases were assigned disability from both.
We assumed that for the population surviving up to 10 years, only the prevalence population being in remission experiences additional disability due to procedures (e.g., a woman suffering from metastatic breast cancer does not experience additional disability due to a mastectomy during this phase). To estimate the prevalence of the cancer population in remission during the first 10 years after diagnosis with and without procedure-related disability, we multiplied the prevalence of the population in the remission phase with the proportion of the population undergoing a procedure. This step allowed us to estimate disability during the remission phase for both the population experiencing disability due to the remission phase alone, as well as the population experiencing disability from the remission phase and the additional procedure-related disability.
Lastly, the procedure sequelae prevalence and general sequelae prevalence were multiplied with their respective disability weights (eTable 13) to obtain the number of YLDs (steps 11, 12, 13 in the flowchart). The sum of these YLDs is the final YLD estimate associated with each cancer.

Probability of cancer
The cumulative probability of developing cancer for certain age groups and an approximated lifetime risk for all cancer groups (age 0 to 79) as well as the odds of developing cancer for 2017 were calculated. The method use does not take into account competing risks of death. The cancer risk is approximated using the following formula 38 : Additional method summaries for NMSC, benign and in situ neoplasms, and myelodysplastic, myeloproliferative, and other hematological neoplasms Non-melanoma skin cancer (squamous and basal cell carcinoma)

Case definition
Non-melanoma skin cancer (NMSC) is defined as basal cell carcinoma and squamous cell carcinoma. NMSC does not include other types of skin cancer (e.g., melanoma, Merkel cell carcinoma).

Input data
We estimated squamous cell and basal cell skin cancer incidence by using cancer registry as well as primary literature and MarketScan data for incidence. Only cancer registries that were listed in CI5 VIII as registering squamous cell carcinoma or basal cell carcinoma, respectively, were included in the analysis.

Modeling strategy
For cancer registry data reported at the three-digit level (i.e., C44: Other and unspecified malignant neoplasm of skin), proportions from Karagas et al were used to split C44 into squamous cell carcinoma and basal cell carcinoma. 39 The only new data we added compared to GBD 2015 were MarketScan data. DisMod-MR 2.1 was used to model incidence and prevalence. Prevalence was calculated as function of two extreme scenarios (duration 1 versus 5 years). Country-, age-, sex-, and year-specific duration was estimated using a country-age-sex-year-specific relative access-to-care-score.
The access to care score was based on the melanoma mortality to incidence ratio:

Input data
We estimated MDS/MPN deaths using vital registration data (as outlined above). We did not use cancer registry data for these neoplasms, as it has only been reported within cancer registries since 2001 and is recognized to be underreported. 40 We estimated MDS/MPN prevalence using MarketScan claims data from the United States in the years 2000, 2010, and 2012, as well as hospital and outpatient data from other health systems worldwide.

Modeling strategy
We modeled deaths for all locations and years, by age and by sex, using CODEm. As MDS/MPN can be a precursor to leukemia, our MDS/MPN CODEm model used the same covariates as the CODEm model for acute myeloid leukemia.
We modeled the prevalence of these diseases for all locations, by age, year, and by sex using a prevalence model in DisMod-MR 2.1. Each of the MarketScan 2000, 2010, and hospital data sources were crosswalked to the 2012 MarketScan data. For DisMod model specifications, cause-specific mortality rates came from the CODEm model, remission was specified to be zero, and the excess mortality rate was set to be inversely related to the Healthcare Access and Quality index covariate.
While this broad category of hematological neoplasms is heterogeneous in its components' severity or propensity for transformation to leukemia, modeling these components separately was not feasible for 2017. This is an admitted limitation, and an area of desired future improvement as data availability improves. For GBD 2017, the generic medication disability weight was assigned for all MDS/MPN cases.
Benign and in situ intestinal neoplasms; benign and in situ cervical and uterine neoplasms; other benign and in situ neoplasms

Case definition
For GBD 2017 we newly estimated three categories of benign and in-situ neoplasms: intestinal neoplasms; cervical and uterine neoplasms; and other neoplasms. Benign and in situ intestinal neoplasms were defined as any diagnosed non-invasive intestinal growth. Benign and in situ cervical and uterine neoplasms were defined as any non-invasive cervical and uterine growth, except for uterine fibroids. Other benign and in situ neoplasms were defined as any non-invasive neoplasms not covered by other causes.

Input data
To estimate the prevalence of each of these categories for all locations, by age, year, and sex, the prevalence of these neoplasms from hospital data was used as input for a prevalence model in DisMod-MR 2.1. These inputs included MarketScan claims data from the United States in the years 2000, 2010, and 2012, as well as hospital and outpatient data from other health systems worldwide. Each of these data sources were crosswalked to the 2012 MarketScan data.

Modeling strategy
In the DisMod model, excess mortality rate was specified to be zero, and remission was allowed to vary from 0 to 1. For benign and in situ cervical and uterine neoplasms, in the DisMod model, excess mortality rate was specified to be zero, and remission was allowed to vary from 0 to 0.75. For other benign and in situ neoplasms, in the DisMod model, excess mortality rate was specified to be zero, and remission was allowed to vary from 0 to 1.
All three of these benign and in-situ neoplasms are by definition benign, localized, and not malignant. As such, no deaths or disability were attributed to their occurrence in GBD 2017. 5. Provide information about all included data sources and their main characteristics. For each data source used, report reference information or contact name/institution, population represented, data collection method, year(s) of data collection, sex and age range, diagnostic criteria or measurement method, and sample size, as relevant.
Appendix: "Bias of categories of input data" For data inputs that contribute to the analysis but were not synthesized as part of the study: 7. Describe and give sources for any other data inputs. http://ghdx.healthdata.org/gbd-2017 For all data inputs: 8. Provide all data inputs in a file format from which data can be efficiently extracted (e.g., a spreadsheet rather than a PDF), including all relevant meta-data listed in item 5. For any data inputs that cannot be shared because of ethical or legal reasons, such as third-party ownership, provide a contact name or the name of the institution that retains the right to the data.
http://ghdx.healthdata.org/gbd-2017 DATA ANALYSIS 9. Provide a conceptual overview of the data analysis method. A diagram may be helpful.
• Appendix Figure 1: Flowchart GBD cancer mortality, YLL estimation • Appendix Figure 2: Flowchart GBD cancer incidence, prevalence, YLD estimation 10. Provide a detailed description of all steps of the analysis, including mathematical formulae. This description should cover, as relevant, data cleaning, data pre-processing, data adjustments and weighting of data sources, and mathematical or statistical model(s).
Appendix: "Data Analysis" 11. Describe how candidate models were evaluated and how the final model(s) were selected.