The American Medical Association1 and National Academies of Sciences, Engineering, and Medicine2 have emphasized the importance of appropriately distinguishing between sex and gender in research and medicine. Many US federal and state databases supply the US Congress, state leadership, and academic researchers with information upon which policies are developed. Therefore, ensuring that sex and gender are characterized appropriately in these databases is fundamental to population health and policy guidelines. We investigated whether databases use these terms in accordance with current recommendations and represent the intended individual demographic variable(s) collected.
This cross-sectional study followed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guideline. The study was exempt from institutional review board approval because all data were publicly available, and it did not involve patient or protected information. Two authors (J.W.J. and L.A.B.) assessed 75 databases, reports, and surveys (eTable in the Supplement) published by US federal, state, and local entities via the US government’s open-data website.3 Analysis was limited to 3395 databases containing a gender and/or sex variable and 512 databases that explicitly stated the variable(s) queried and options or responses reported. Of these databases, 10% were selected using a randomization method in Microsoft Excel with additional oversampling of national health-related databases. The terms gender and sex were evaluated to determine appropriate use.
The American Medical Association1 defines sex on the basis of biological differences, including reproductive anatomy and chromosomes, with demographic variables including female, male, and intersex. Gender is self-defined, relates to one’s societal roles and expectations, and encompasses the totality of one’s internal psychological, emotional, social, and overall sense of identity with numerous demographic variables, including woman, man, and nonbinary, among others. Because the optimal categorization methodology remains debated, we considered appropriate those databases that reported female and male as sex variables and man and woman as gender variables. Although this terminology is consonant, we acknowledge that it is also limiting.
A total of 75 databases were assessed. Of 40 databases that reported the variable sex, 36 (90%) characterized it appropriately (female and male). Of 38 databases that reported the variable gender, 5 (13%) characterized it appropriately (man and woman). Of the 75 analyzed databases, 37 (49%) used gender and sex terminology inappropriately (Table).
No database that assessed the variable sex reported the option intersex. One database (2%) distinguished between cisgender and transgender when gender was assessed, and 8 of 38 (21%) provided additional gender options. Two databases (3%) defined or explicitly delineated the terms sex and gender; however, 1 still conflated the terms.
Databases maintained by US federal, state, and local organizations continue to misuse gender and sex terminology. This incorrect usage poses substantial medical, social, and political consequences4-6 because conflation of these terms in databases may lead to inherently flawed results and erroneous conclusions in medical and population health studies. Primary database flaws included interchangeable use of sex and gender and collection of only 1 variable, as evidence has demonstrated that each variable has unique associations with health.6 For example, although the optimal method for measuring gender has not been determined and is an active area of investigation, research demonstrates that gender is associated with poor outcomes in acute coronary syndrome, independent of sex.4 Therefore, these databases must be intentionally designed to obtain the desired information, and further research into the most appropriate gender-collection methodology (eg, man, woman, nonbinary, fill-in option) is warranted.
A limitation of this analysis is that the results pertain to data already collected and not the collection instrument going forward. Nonetheless, these findings demonstrate flaws in US data collection systems, with various organizations failing to reach modern medical standards or meet contemporary societal expectations. Therefore, we advocate for the improvement of future data collection systems to ensure optimal data quality and reduce the misappropriation of sex and gender as demographic variables2 because this may contribute to the implementation of suboptimal governmental policies and medical practice.
Accepted for Publication: April 19, 2022.
Published Online: June 13, 2022. doi:10.1001/jamainternmed.2022.2026
Corresponding Author: Jeremy W. Jacobs, MD, MHS, Department of Laboratory Medicine, Yale School of Medicine, 55 Park St, New Haven, CT 06520 (jeremy.jacobs@yale.edu).
Author Contributions: Dr Jacobs had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: Jacobs, Bibb, Booth.
Acquisition, analysis, or interpretation of data: All authors.
Drafting of the manuscript: All authors.
Critical revision of the manuscript for important intellectual content: All authors.
Statistical analysis: Jacobs, Bibb.
Administrative, technical, or material support: Shelton, Booth.
Supervision: Jacobs, Booth.
Conflict of Interest Disclosures: None reported.
2.National Academies of Sciences, Engineering, and Medicine.
Measuring Sex, Gender Identity, and Sexual Orientation. The National Academies Press; 2022. doi:
10.17226/26424.
3.US General Services Administration. Data.gov. Published September 14, 2020. Accessed January 19, 2022.
https://data.gov/