An intercept-only linear model was fitted to sex bias values from before and during 1993 and subsequently in 5-year increments. Estimated sex bias intercept coefficients were plotted against time for studies (blue) and participants as measurement unit (orange), with error bars representing 95% confidence intervals for the mean coefficients. The points for total at the right of each graph represent the mean sex bias totals for each category. Sex bias was defined as female participant fraction (determined separately for studies and participants as measurement unit) minus female prevalence fraction (values for sex bias ranged from −1 to 1, with 0 indicating no bias; negative sex bias indicates that female participants were represented less than male participants).
aDifference between sex bias value vs 0; P < .001 for studies as measurement unit.
bDifference between sex bias value vs 0; P < .001 for participants as measurement unit.
An intercept-only linear model was fitted to sex bias values from before and during 1993 and subsequently in 5-year increments. Estimated sex bias intercept coefficients were plotted against time for studies (blue) and participants as measurement unit (orange), with error bars representing 95% confidence intervals for the mean coefficients. For HIV/AIDS before or during 1993, sex bias values for studies (−0.40) and participants (−0.42) were not plotted because they were based on only 3 articles (total, 138 participants). Sex bias was defined as female participant fraction (determined separately for studies and participants as measurement unit) minus female prevalence fraction (values for sex bias ranged from −1 to 1, with 0 indicating no bias; negative sex bias indicates that female participants were represented less than male participants).
An intercept-only linear model was fitted to sex bias values from before and during 1993 and subsequently in 5-year increments. Estimated sex bias intercept coefficients were plotted against time for studies (blue) and participants as measurement unit (orange), with error bars representing 95% confidence intervals for the mean coefficients. The total number of published articles (all categories combined) increased from before or during 1993 (total, 482 articles) to 2014 to 2018 (18 627 articles). Sex bias in articles for all categories combined was unchanged over time with studies as measurement unit (range, −0.15 [−0.16 to −0.13] to −0.10 [−0.14 to −0.06]), but improved from before 1993 (−0.11 [−0.16 to −0.05]) to 2014 to 2018 (−0.05 [−0.09 to −0.02]) with participants as measurement unit. Sex bias was defined as female participant fraction (determined separately for studies and participants as measurement unit) minus female prevalence fraction (values for sex bias ranged from −1 to 1, with 0 indicating no bias; negative sex bias indicates that female participants were represented less than male participants).
Each point represents 1 article. A, With studies as the measurement unit of sex bias, each study point has equal intensity of blue shade and contribution to the overall estimate of sex bias. B, With participants as the measurement unit of sex bias, study point orange shade intensity is proportional to the number of participants; small studies are essentially invisible and contribute little to the overall sex bias estimate.
eAppendix. Quantifying Sex Bias in Clinical Studies at Scale With Automated Data Extraction
eTable 1. Previous Studies About Enrollment Sex Bias in Clinical Research
eTable 2. Global Disease Category Prevalence
eTable 3. Categories in the XML PubMed Field <PublicationTypeList> Used to Identify Study-Related Articles in PubMed
eTable 4. Example Table Used for Illustration of Sex Data Extraction Using PubMed-Extract
eTable 5. Relation Between Disease Category and Medical Subject Headings Terms
eTable 6. Linear Model for Number of Participants vs Enrollment Sex Bias, Controlling for Publication Year and Disease Category
Customize your JAMA Network experience by selecting one or more topics from the list below.
Identify all potential conflicts of interest that might be relevant to your comment.
Conflicts of interest comprise financial interests, activities, and relationships within the past 3 years including but not limited to employment, affiliation, grants or funding, consultancies, honoraria or payment, speaker's bureaus, stock ownership or options, expert testimony, royalties, donation of medical equipment, or patents planned, pending, or issued.
Err on the side of full disclosure.
If you have no conflicts of interest, check "No potential conflicts of interest" in the box below. The information will be posted with your response.
Not all submitted comments are published. Please see our commenting policy for details.
Feldman S, Ammar W, Lo K, Trepman E, van Zuylen M, Etzioni O. Quantifying Sex Bias in Clinical Studies at Scale With Automated Data Extraction. JAMA Netw Open. 2019;2(7):e196700. doi:10.1001/jamanetworkopen.2019.6700
What is the magnitude of female underrepresentation in clinical studies?
In this cross-sectional study, machine reading to extract sex data from 43 135 published articles and 13 165 clinical trial records showed substantial underrepresentation of female participants, with studies as measurement unit, in 7 of 11 disease categories, especially HIV/AIDS, chronic kidney diseases, and cardiovascular diseases. Sex bias in articles for all categories combined was unchanged over time with studies as the measurement unit but improved with participants as measurement unit.
This study suggests that sex bias against female participants in clinical studies persists, but results differ when studies vs participants are the measurement units.
Analyses of female representation in clinical studies have been limited in scope and scale.
To perform a large-scale analysis of global enrollment sex bias in clinical studies.
Design, Setting, and Participants
In this cross-sectional study, clinical studies from published articles from PubMed from 1966 to 2018 and records from Aggregate Analysis of ClinicalTrials.gov from 1999 to 2018 were identified. Global disease prevalence was determined for male and female patients in 11 disease categories from the Global Burden of Disease database: cardiovascular, diabetes, digestive, hepatitis (types A, B, C, and E), HIV/AIDS, kidney (chronic), mental, musculoskeletal, neoplasms, neurological, and respiratory (chronic). Machine reading algorithms were developed that extracted sex data from tables in articles and records on December 31, 2018, at an artificial intelligence research institute. Male and female participants in 43 135 articles (792 004 915 participants) and 13 165 records (12 977 103 participants) were included.
Main Outcomes and Measures
Sex bias was defined as the difference between the fraction of female participants in study participants minus prevalence fraction of female participants for each disease category. A total of 1000 bootstrap estimates of sex bias were computed by resampling individual studies with replacement. Sex bias was reported as mean and 95% bootstrap confidence intervals from articles and records in each disease category over time (before or during 1993 to 2018), with studies or participants as the measurement unit.
There were 792 004 915 participants, including 390 470 834 female participants (49%), in articles and 12 977 103 participants, including 6 351 619 female participants (49%), in records. With studies as measurement unit, substantial female underrepresentation (sex bias ≤ −0.05) was observed in 7 of 11 disease categories, especially HIV/AIDS (mean for articles, −0.17 [95% CI, −0.18 to −0.16]), chronic kidney diseases (mean, −0.17 [95% CI, −0.17 to −0.16]), and cardiovascular diseases (mean, −0.14 [95% CI, −0.14 to −0.13]). Sex bias in articles for all categories combined was unchanged over time with studies as measurement unit (range, −0.15 [95% CI, −0.16 to −0.13] to −0.10 [95% CI, −0.14 to −0.06]), but improved from before or during 1993 (mean, −0.11 [95% CI, −0.16 to −0.05]) to 2014 to 2018 (mean, −0.05 [95% CI, −0.09 to −0.02]) with participants as the measurement unit. Larger study size was associated with greater female representation.
Conclusions and Relevance
Automated extraction of the number of participants in clinical reports provides an effective alternative to manual analysis of demographic bias. Despite legal and policy initiatives to increase female representation, sex bias against female participants in clinical studies persists. Studies with more participants have greater female representation. Differences between sex bias estimates with studies vs participants as measurement unit, and between articles vs records, suggest that sex bias with both measures and data sources should be reported.
For proper application of clinical study results, enrolled participants should represent the populations for which treatments are intended. When female patients receive treatment based on the results of studies of male participants, unanticipated adverse events may occur because of sex-specific differences in disease patterns, metabolism, and drug pharmacokinetics and clearance.1,2 Health risks were greater in female patients than in male patients for 8 of 10 prescription drugs withdrawn from the US market from 1997 to 2000.3 The slower metabolism of the insomnia drug zolpidem in female patients than in male patients may have contributed to multiple zolpidem-related motor vehicle crashes before the recommended dose was decreased in female patients by 50%.4-6 Female patients may experience more adverse drug reactions, more disease and disability, later diagnosis, less aggressive treatment, and lower case survival rates for some diseases than male patients.7-9
The National Institutes of Health Revitalization Act of 1993 established legal requirements and guidelines to ensure the inclusion of female participants and racial/ethnic minority participants in clinical research.10 However, underrepresentation of female participants in studies relative to disease prevalence (known as enrollment sex bias or sex bias) persists.11,12 In treatment trials of 11 non–sex-specific cancers (9671 patients), underrepresentation of female participants was noted in trials of 3 cancer types.13 In 120 randomized clinical trials (total, 160 801 participants) in 12 specialties, 24.6% of participants were female, with no improvement observed in sex-balanced enrollment or sex-specific analyses.11 From 2000 to 2002, female participants had lower enrollment fraction—defined as the number of trial participants divided by the estimated number of cancer cases in the population—than male participants for colorectal (total, 8434 participants) and lung cancer (4297 participants) trials.12 A literature search for 1999 to 2018 showed 13 major analyses of sex bias in clinical studies, but these analyses were limited in size (range, 36-865 studies and 2339-398 801 participants) and disease categories and were performed with manual methods or analysis of isolated data sets (eAppendix and eTable 1 in the Supplement).12-24
Computerized, automated data extraction (also known as machine reading) of published research articles enables the development of large, complex systems to organize, integrate, and communicate information from numerous studies.25-29 However, a literature search did not show previous studies of machine automation for quantifying sex bias in clinical studies at the national or global scale.
The purpose of this study was to develop a scalable automated machine reading method to extract sex data from numerous clinical studies and analyze sex bias in published articles and clinical trial records at scale.30,31 We hypothesized that computerized data extraction from numerous articles and records may provide comprehensive and longitudinal information about sex bias in clinical studies at scale.
We analyzed the number of male and female participants in clinical studies that were identified and extracted in electronic searches from 2 sources on December 31, 2018: (1) published articles from the search engine Semantic Scholar, which had 41 million articles indexed, including more than 20 million full-text articles and all articles in PubMed Central from 1966 to 2018,30,32 and (2) clinical trial records in the Aggregate Analysis of ClinicalTrials.gov (AACT) database, which contained metadata for 288 515 studies registered at ClinicalTrials.gov in 205 countries from 1999 to 2018.33,34
Global disease prevalence data for male and female participants were obtained from the Global Health Data Exchange (GHDx), a database synthesized from multiple data sources, including scientific literature and population representative surveys.35,36 Prevalence values for selected disease categories defined by GHDx were obtained from an online catalog of health-related data (eTable 2 in the Supplement).35
This study was not considered human subjects research according to the Federal Policy for the Protection of Human Subjects because it was a secondary analysis of data from published articles and trial records. Therefore, the study was not submitted for institutional review board approval.
We identified all articles related to clinical studies in PubMed using article categories selected from the XML PubMed publication type attribute <PublicationTypeList> (1 038 324 articles) (eTable 3 in the Supplement).37 Semantic Scholar accessed the full text of 388 227 articles (37%). We restricted the analysis to articles about medical disorders by including only articles labeled with any Medical Subject Headings (MeSH) terms under “disease,” “vaccination,” “disorder,” “pathological,” or “neoplasms” in the MeSH taxonomy tree, and processed these articles with optical character recognition (OmniPage; Nuance Communications) (295 139 articles). As the analysis was based on automated extraction of male and female participant numbers from tables, we included articles with at least 1 table extracted (249 845 articles).
We developed an algorithm (PubMed-Extract) to extract articles and sex data from tables of articles in portable document format (eTable 4 in the Supplement). PubMed-Extract was designed to parse the tables, identify relevant semantics of rows and columns by matching patterns, and aggregate information across table rows and columns (eAppendix in the Supplement). We limited the analysis to 11 GHDx disease categories for which morbidity frequency data were available in GHDx and more than 1000 articles were identified: cardiovascular diseases, diabetes, digestive diseases, hepatitis (types A, B, C, and E), HIV/AIDS, kidney diseases (chronic), mental disorders, musculoskeletal disorders, neoplasms, neurological disorders, and respiratory diseases (chronic). We mapped articles to disease categories using the MeSH terms associated with each article (eTable 5 in the Supplement). In the 249 845 articles that were processed by optical character recognition and had at least 1 table extracted, 147 807 articles (59%) were mapped to at least 1 disease category, from which PubMed-Extract extracted male and female participant numbers in 43 135 articles (17%).
We developed another algorithm (AACT-Query) to extract sex data from tables in AACT records that could be queried with Structured Query Language. We identified AACT records of 33 361 studies that had male and female participant numbers. After excluding incomplete studies, there were 28 187 studies. After mapping records to disease categories using MeSH terms, we retained 13 165 records (47%) that mapped to at least 1 disease category, and used AACT-Query to extract male and female participant numbers.
Female prevalence fraction (F-Prev) for each disease category was defined as the fraction of female participants in the disease category and was estimated by dividing the global morbidity count for female participants by global morbidity count for both male and female participants using GHDx data. Female participant fraction (F-Particip) was defined as the fraction of female participants among all participants who were included in the studies, and was estimated 2 ways: with (1) studies as measurement units, by computing the ratio of female participants to all participants for each study and determining the simple average of this ratio for all studies without any weighting by study size and (2) participants as measurement units, by dividing the total number of female participants in all studies by the total number of male and female participants in all studies combined. The female participant fraction was estimated from articles using PubMed-Extract and records using AACT-Query. The primary outcome variable was enrollment sex bias in clinical studies, defined as F-Particip minus F-Prev (values for sex bias ranged from −1 to 1, with 0 indicating no bias; negative sex bias indicates that female participants were represented less than male participants).
We evaluated the accuracy of sex bias estimates from PubMed-Extract by comparing them with the true F-Particip that was determined from manually extracted numbers of male and female participants from 100 randomly selected articles. Mean absolute error was calculated by averaging the absolute difference between the PubMed-Extract estimates and true value of F-Particip in individual articles.
We evaluated the recall of PubMed-Extract, defined as the percentage of articles for which PubMed-Extract produced the exact number of male and female participants as manually extracted in another random set of 100 articles on cardiovascular diseases. Mean absolute error was sensitive to severity of estimation errors, whereas recall equally penalized all estimation errors.
To evaluate differences between sex bias estimated with PubMed-Extract vs AACT-Query, we analyzed studies that were represented in both estimates. We identified 1400 articles for which (1) PubMed-Extract produced numerical estimates of sex bias, (2) the articles were linked each to exactly 1 AACT record, (3) the AACT record included numbers of male and female participants, and (4) the full text of the articles was available through PubMed. We compared the numbers of male and female participants between these articles and records and manually inspected a sample of 50 discordant articles and records to determine the reasons for discrepancies. We contacted study authors for comments when we were unable to determine reasons for discrepancies.
For each disease category, we computed 1000 bootstrap estimates of sex bias by resampling individual studies with replacement. Sex bias was reported as mean and 95% bootstrap confidence interval, determined from the bottom 2.5% and top 97.5% of bootstrap estimate percentiles. The P value for the null hypothesis of zero sex bias was equal to the probability of type I error corresponding to the widest confidence interval that contained zero. We calculated P values under the null hypothesis by repeating the bootstrap confidence interval procedure over a fine grid of confidence levels (decreasing from 99.999%), taking the smallest confidence level whose interval contained zero; the P value was the probability of type I error = 2 × (1 − confidence level). For each disease category and time period, statistical significance for a hypothesis test for sex bias was defined by P ≤ .001 using 2-tailed tests.
For analysis of sex bias in articles vs time, we fitted an intercept-only linear model to sex bias values before or during 1993 and subsequent 5-year increments separately with studies and participants as measurement unit and plotted estimated intercept coefficients vs time with error bars representing 95% confidence intervals for the mean coefficient. We assumed Gaussian distribution because bootstrapping was precluded by dividing the data into 5-year increments.
The association between estimated sex bias and number of participants in each study was evaluated with fixed-effects linear regression, with number of participants defined as a categorical variable with 10 equal-sized bins (eTable 6 in the Supplement). We controlled for publication year (continuous variable) and disease category (categorical variable). Analyses were performed with the statistical functions of the Python programming language, version 3.6 (Python Software Foundation).
There were 792 004 915 participants, including 390 470 834 female participants (49%), in articles and 12 977 103 participants, including 6 351 619 female participants (49%) in records. The F-Prev was highest for digestive diseases and lowest for hepatitis (Table). With studies as measurement unit, substantial female underrepresentation (sex bias ≤ −0.05) in articles and records was observed in 7 of 11 disease categories, including HIV/AIDS (mean for articles, −0.17 [95% CI, −0.18 to −0.16]), kidney diseases (chronic) (mean, −0.17 [95% CI, −0.17 to −0.16]), cardiovascular diseases (mean, −0.14 [95% CI, −0.14 to −0.13]), neoplasms, digestive diseases, neurological disorders, and hepatitis (Table). The only category with female overrepresentation was musculoskeletal disorders (Table).
With participants as measurement unit, sex bias against female participants in articles was highest for chronic kidney diseases and lowest for musculoskeletal disorders and HIV/AIDS, and in records was highest for HIV/AIDS, chronic kidney diseases, and cardiovascular diseases. Sex bias usually was less negative when the measurement unit was participants vs studies (eg, for articles about cardiovascular disease with participants as the measurement unit, mean sex bias was −0.02 [95% CI, −0.06 to −0.01]; with studies as the measurement unit, mean sex bias was −0.14 [95% CI, −0.14 to −0.13]) (Table). Most articles and records mapped to a single disease category (Table).
With studies as measurement unit, sex bias was stable from before or during 1993 to 2018 for most disease categories (Figure 1, Figure 2, and Figure 3). With participants as measurement unit, sex bias improved (became less negative by ≥0.10) over time for cardiovascular diseases, HIV/AIDS, neoplasms, and neurological disorders (Figure 1, Figure 2, and Figure 3). Sex bias in articles for all categories combined was unchanged over time with studies as measurement unit (range, −0.15 [95% CI, −0.16 to −0.13] to −0.10 [95% CI, −0.14 to −0.06]), but improved from before 1993 (mean, −0.11 [95% CI, −0.16 to −0.05]) to 2014 to 2018 (mean −0.05 [95% CI, −0.09 to −0.02]) with participants as the measurement unit.
The mean absolute error between true F-Particip from data extracted manually vs automatically (PubMed-Extract) was 0.008. Errors made by PubMed-Extract were caused when (1) the table varied from typical table organization, (2) there were 2 or more columns for total counts and no single column for grand total, and (3) there were optical character recognition errors such as incorrect merging of multiple columns or splitting of single columns (eTable 4 in the Supplement). Manual analysis of automatically extracted participant numbers showed that 14 of 100 articles evaluated did not report the number of male and female participants, PubMed-Extract returned correct numerical estimates for 43 of the other 86 articles (recall, 50%), and mean precision for exact row extraction of male and female numbers was 0.75.
Comparison of the 1400 studies that had both articles and records showed that 675 studies (48%) had numbers of male and female participants that differed between articles and records, with magnitude of the difference between studies ranging from a minimum of 35 participants (52% of participants in the AACT record) to a maximum of 15 746 participants (92%). In 50 studies selected randomly from the 675 discordant studies, manual evaluation showed that discrepancies between articles and records were caused because the article was based on a subset of the trial data in the record (19 studies), PubMed-Extract extractions were incorrect or from the wrong table (14 studies), the article reported the number of participants who completed the trial vs the record that included enrolled participants who did not complete the trial (7 studies), the article was published before completion of the trial (3 studies), there was author error (1 study), and the article included patients from multiple trials (1 study); in 5 studies, the causes of discrepancies were unknown despite contacting authors for comments. In 6 of the 50 studies, the reasons for discrepancies were provided through email communication with study authors.
Linear regression with fixed effects to evaluate the association between publication year, disease category, and study size and sex bias in articles showed that the coefficients for number-of-participants deciles were positive and different from zero for the fifth decile (121-188 participants) through 10th decile (≥2990 participants), indicating that larger study size was associated with greater female representation (eTable 6 in the Supplement).
Using a large amount of data from articles and records, we observed substantial female underrepresentation in studies for diverse disease categories, especially HIV/AIDS and chronic kidney diseases. There was little increase in female representation in studies from before or during 1993 to 2018 using studies as measurement unit but improved female representation with participants as measurement unit (Figure 1, Figure 2, and Figure 3). Most disease categories were not evaluated previously (eTable 1 in the Supplement). The algorithms provided an effective and accurate automated scalable method for extracting male and female participant numbers and enabled expansion of analyses about sex bias to varied disease categories and integration of new data.
Previous studies of sex bias used studies or participants, but not both, as measurement unit (eTable 1 in the Supplement). With studies as measurement unit, each study has an equal contribution to the overall sex bias estimate, regardless of study size, providing a study-by-study evaluation of sex bias (Table, Figure 4). In contrast, with participants as measurement unit, participants may have an equal contribution to the overall sex bias estimate, providing a population estimate; however, larger studies contribute proportionally more, and smaller studies have a nearly invisible contribution to overall sex bias estimates (Figure 4). The marked difference in sex bias in articles with studies vs participants as measurement unit for cardiovascular diseases (−0.14 vs −0.02) and neoplasms (−0.11 vs −0.03) is evidence that sex bias determined with both measurement units should be reported, and that sex bias results may be less sensitive to female underrepresentation with participants than studies as measurement unit (Table, Figure 1, Figure 2, and Figure 3). The use of studies as measurement unit may ensure that small studies of less prevalent diseases receive equal representation in estimates of overall sex bias (Figure 4). The limited change in sex bias over time for all categories combined with studies as measurement unit (Figure 3) may be addressed with policy and funding initiatives that focus on sex bias regardless of proposed study size. Furthermore, the importance of study size was underscored by the relation between study size and female representation in articles (eTable 6 in the Supplement).
With studies as measurement unit, sex bias estimates from articles and records were consistent in polarity and magnitude for diabetes, HIV/AIDS, kidney diseases, mental disorders, neoplasms, neurological disorders, and respiratory diseases but differed in magnitude for digestive diseases and musculoskeletal disorders (Table). Differences in sex bias estimates may, in part, be due to having fewer records than articles (digestive diseases, 348 records vs 1282 articles), and AACT data may have been biased geographically because trial registration requirements for ClinicalTrials.gov may apply only to US clinical trials.34 Geographic differences may be important because of marked variation in regional disease profiles, such as differences in HIV/AIDS incidence between sub-Saharan Africa vs East Asia.38 Future studies may include machine reading algorithms to evaluate study location.
Differences in sex bias estimates between articles vs records also may be due to discrepancies in male and female participant numbers between articles and records observed in 48% of studies. Manual evaluation of these discrepancies was limited to 50 studies because it was time-consuming and associated with delays inherent with email queries to authors when reasons for discrepancies could not be ascertained from the article and record. A previous comparison of randomized clinical drug trials in ClinicalTrials.gov vs counterpart published articles concluded that trial results should be evaluated systematically from both sources because of important differences, including more complete reporting in records than articles, variation in reporting between articles from specialty vs general journals, and absence of an article corresponding to 50% of trials posted on ClinicalTrials.gov (so-called abandoned trials).39,40 Trial registration and reporting on ClinicalTrials.gov may vary between studies funded by industry or government sources, and the requirement of mandatory posting of trial results on ClinicalTrials.gov within 1 year of completion of data collection is adhered to infrequently and may promote the posting of cursory reports that may include inaccurate or incomplete data that are not peer reviewed.6,41-43 Journal publication may be associated with partial and altered reporting (so-called filtered data) due to space limitations, publication bias, revised analyses and data exclusion due to suggestions from peer reviewers, and delays inherent in journal submission and peer review.40,41 The observation of sex bias differences between articles and records is further evidence to support the need for greater transparency and accuracy in trial reporting in both media.
The comparison of data from articles vs records may have been affected by our decision to include data from articles about studies other than clinical trials, such as observational studies, case series studies, and quality improvement analyses. Although a focus on trials alone may provide a more direct comparison between data from articles vs records, the inclusion of all published articles may provide a more realistic description of current sex bias in funded and nonfunded clinical research. Observational studies may be considered lower in evidence quality than trials but remain important because they provide valuable context for trial results and data in areas with limited trials.44-46 Furthermore, randomized trials may not necessarily represent general disease populations because of participant exclusion criteria.47 Nevertheless, sex bias estimates for trials alone may be determined in future work by applying different filters to the data extraction algorithms.
In selecting disease categories that previously were defined in GHDx, we recognized potential overlap between categories, such as cardiovascular, kidney, or neurological diseases in studies of patients who had diabetes. Nevertheless, the disease categories were used because they represented large, important, clinically relevant categories. Most studies were limited to only 1 of the 11 disease categories, and only 11% of articles and 4% of records contributed to sex bias estimates for more than 1 disease category (Table). The attribution of cost and resource allocation to overlapping disease categories is an inherent issue in epidemiology and public health that we addressed by specifying the sources of disease category definitions and data and quantifying the number of studies that mapped to more than 1 category.48
Limitations of the present study include the analysis of sex bias without other variables. Sex bias may vary with age for colorectal and lung cancer12; further evaluation using our algorithms may enable robust analysis of the interaction between sex, age, and race in study enrollment. We did not evaluate diagnoses that have marked variation of sex prevalence within disease categories, such as different types of cancer (eg, breast vs prostate cancer), because our goal was to provide a broad overview about sex bias for different disease categories; in future work, filters added to the data extraction code may enable more focused sex bias data for specific diseases. In addition, we included participant counts from primary studies and secondary analyses such as meta-analyses and systematic reviews, but in estimating sex bias, we did not account for multiple inclusion of the same primary study participants in the secondary analyses; therefore, estimates of sex bias from articles may have been affected preferentially by primary studies that were included in secondary analyses, and the magnitude of this effect is unknown. The total number of more than 792 million participants may seem unrealistically high because it may imply that 10% of the 7.7 billion people globally were involved in a clinical study; the large number of participants may have been affected by large population-based studies including a survey from China (381 million participants) and study of death records from the United States, England, and Wales (almost 86 million participants) that accounted for 467 million participants (53%).49,50 In future big data studies that are based on articles, it may be advisable to modify the data extraction coding to exclude duplicate use of studies and analyze large outlier studies separately. For the time series, we used publication date of articles and did not extract information about the time range of study execution; that may be considered in future work.
Automated extraction of participant numbers in clinical reports provides an effective alternative to manual analysis of demographic bias and may expedite analyses for multiple diseases globally. Our findings indicate that studies with more participants have greater female representation. However, sex bias against female participants in clinical studies persists despite legal and policy initiatives to increase female representation.
Accepted for Publication: May 17, 2019.
Published: July 3, 2019. doi:10.1001/jamanetworkopen.2019.6700
Open Access: This is an open access article distributed under the terms of the CC-BY License. © 2019 Feldman S et al. JAMA Network Open.
Corresponding Author: Sergey Feldman, PhD, Allen Institute for Artificial Intelligence, 2157 N Northlake Way, Ste 110, Seattle, WA 98103 (firstname.lastname@example.org).
Author Contributions: Dr Feldman had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: Feldman, Ammar, Etzioni.
Acquisition, analysis, or interpretation of data: Feldman, Ammar, Lo, Trepman, van Zuylen.
Drafting of the manuscript: Feldman, Ammar, Lo, Trepman, van Zuylen.
Critical revision of the manuscript for important intellectual content: Feldman, Ammar, Lo, Trepman, Etzioni.
Statistical analysis: Feldman, Lo.
Administrative, technical, or material support: Ammar, Trepman, Etzioni.
Supervision: Feldman, Ammar, Etzioni.
Conflict of Interest Disclosures: Dr Feldman reported serving as a consultant for the Bill & Melinda Gates Foundation outside the submitted work. No other disclosures were reported.
Funding/Support: This study was funded by the Allen Institute for Artificial Intelligence.
Role of the Funder/Sponsor: The authors did this work as part of their work duties at the Allen Institute for Artificial Intelligence, including the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Additional Contributions: We thank the Clinical Trials Transformation Initiative for access to the Aggregate Analysis of ClinicalTrials.gov database and the Institute for Health Metrics and Evaluation for the Global Health Data Exchange. We also thank the authors of studies who were contacted to help determine reasons for discrepancies between articles and records. Ruth Etzioni, PhD (Fred Hutchinson Cancer Research Center), Brad H. Pollock, MPH, PhD (University of California Davis), Margaret Rosenfeld, MD, MPH (University of Washington), Lucy Lu Wang, MS (University of Washington), and Dan Weld, PhD (University of Washington and Allen Institute), performed manuscript review and provided comments. Luca Weihs, BA (University of Washington), contributed to helpful discussion about statistical methods. David Orentlicher, MD, JD (UNLV William S. Boyd School of Law), Craig Shapiro, MD (retired, US Public Health Service), and Carole Stipelman, MD, MPH (University of Utah School of Medicine), provided helpful discussion and comments. No compensation was received by the acknowledged individuals.
Create a personal account or sign in to: