Contemporary Views of Research Participant Willingness to Participate and Share Digital Data in Biomedical Research

Key Points Question Are people willing to participate in research advertised on the internet, and is willingness to participate associated with type of study sponsor? Findings This mixed-methods survey and qualitative study of 914 respondents indicated that they were more likely to participate and share their social media data with researchers in university-led research studies than in studies conducted by the US federal government or pharmaceutical companies. However, only 49.3% indicated they would share their social media data at all. Meaning These findings indicate that researchers may face challenges in recruiting representative samples when recruiting from internet platforms.


Introduction
With 9 in 10 US adults seeking information on the web 1 and 7 in 10 using social media platforms, 2 the use of online mediums to recruit and to collect research data from diverse populations has become a common and cost-effective practice in health sciences research over the last 5 years. [3][4][5][6] This form of recruitment and data collection is currently in use in large-scale biomedical research projects, such as the National Institute of Health's Precision Medicine Initiative, 7 which plans to recruit a diverse sample of 1 000 000 Americans through social media campaigns. Such projects also intend to collect digital information (electronic health records information, data from fitness devices, and even social media and web searches) to enhance our understanding of early risk factors for different disease states. Even social media companies are using digital data to inform better outcomes; for instance, Facebook has been able to use social media data to identify suicide risk in their users and, as a result, has formed a Compassion Team to address these issues. 8 Recent data privacy violations [9][10][11] potentially threaten the ability for biomedical researchers to recruit participants through online platforms and collect digital data from participants. Paramount to recruitment and subsequent participation in biomedical research is participant trust in science, the investigative team, and the management of personal information. Generations of biomedical research misconduct such as the Tuskegee syphilis experiment have influenced the public's trust in biomedical research. 12 A recent Pew Charitable Trust survey 13 of trust in the internet found that even experts in digital security were mixed in their impressions that the general population will continue to share personal data online, with less than 50% of experts saying trust will improve with new regulations, and the remainder indicating that it will stay the same or erode over time. Another study 14 from Australia found that while patients still feel that sharing personal information is important for biomedical research, there are considerable concerns voiced about how the data will be managed and that willingness to share such data is dependent on who is collecting the data. Lack of trust in studies advertised via the internet and social media and concerns about data security may bias samples collected in this manner. 15 As a result, the use of these platforms for recruitment and data collection for biomedical research raises significant data privacy, ethics, ownership, and stewardship challenges 16 for institutional review boards, researchers, and participants.
The purpose of this mixed-methods study was to ascertain (1) the general population's willingness to participate (WTP) in biomedical research advertised on different digital platforms, (2) whether the study sponsor further modified the decision and WTP, (3) whether people are willing to share digital data in biomedical research, and (4) whether WTP improves in association with announcements regarding new data privacy laws. 17

Recruitment and Eligibility
Participants were recruited using Amazon's Mechanical Turk (MTurk), 18 an online crowdsourcing platform where workers are paid to complete tasks such as data processing, problem-solving, and surveys. The platform is regularly used in health research 19 and allows investigators to sample study participants from a larger, more representative, and more diverse population 20 than typically seen in an in-person study at a fraction of the cost and time.
To be eligible, participants had to live in the United States, be aged 18 years or older, and use at least 1 social media platform. To ensure we were recruiting appropriate participants from the United States, we set the MTurk survey criteria to only include workers who lived and graduated high school in the United States (see eAppendix 1 in the Supplement for screening questions). The participant recruitment was stratified to match race/ethnicity proportions to that of the 2010 US Census data. 21 2018. Participants were given up to 3 reminders to complete the second survey. Figure 1 gives an overview of the study procedures and eAppendix 2 and eAppendix 3 in the Supplement include the Screening, T1, and T2 surveys. This study followed the American Association for Public Opinion

Demographic Information
Demographic data (sex, race/ethnicity, age, and education) and social media use were self-reported by participants. Participants were also asked whether they had ever volunteered for an online study before and whether they had ever shared social media data for research purposes.

Survey Questions
The survey was developed by the authors and pilot tested to ensure clarity and understanding. The   Initial Survey, April 2018 The initial survey was deployed a month after the news about Facebook-Cambridge Analytica data privacy violations surfaced. The second survey was sent 5 months later in September 2018 to all the participants who responded to the first survey. A total of 3 reminders were sent to participants for completing the second survey. GDPR indicates General Data Protection Regulation; and mTurk, Amazon's Mechanical Turk. were given an opportunity to explain the reason for their answers in an open field text box. See eAppendix 2 and eAppendix 3 in the Supplement for a copy of the T1 and T2 surveys.

Statistical Analysis
Participants' responses to structured survey questions were summarized using summary statistics.
Differences in demographic characteristics between the participants who completed T1 and the subset who responded to T2 were assessed using a χ 2 test. We have used a conservative minimum response rate based on AAPOR reporting guidelines 24 to report the participant response rate for the T2 survey. Participant responses to the main outcomes of interest (WTP and willingness to share social media data) across the 2 survey points were evaluated using a logistic regression model based on generalized estimating equations. 25 Briefly, the generalized estimating equation approach is a semiparametric method to estimate population-averaged effects by accounting for correlations in time-invariant data (that is, participant responses over time T1 and T2) using robust and unbiased standard errors. We also accounted for differences in participant responses due to demographic characteristics such as age, sex, and race/ethnicity. Due to small subgroup sample sizes, race/ ethnicity was collapsed into a binary variable of minority or nonminority and participants within age groups 55 to 69 years and older than 70 years were collapsed into 1 age group of 55 years and older.
To assess the stability of response over time, an interaction term indicating survey time (T1 vs T2) was included for each covariate in the generalized estimating equation model. We also assessed the combined association of the recruitment platform and study sponsors with WTP and data sharing using an interaction term. The significance (P values) of the model estimates were corrected for multiple testing using the false-discovery rate method. Two-tailed false-discovery rate-corrected P < .05 was considered statistically significant.
A mixed-methods approach combined quantitative and qualitative data with the function of expansion, 26 allowing inductive qualitative data to provide the "why" to questions uncovered by the quantitative data. Missing data were not included in the analysis. Qualitative data were imported into Dedoose 27 and analyzed using thematic analysis. 28

Willingness to Share Social Media Data
Willingness to share social media data decreased significantly for all but university-led studies. While 43.1% of T2 respondents were willing to share social media with university-led studies, willingness to share with pharmaceutical companies decreased 6.84% to 29.5% (OR, 0.50; 95% CI, 0.44-0.56; P < .001) and decreased 7.32% to 35.3% with federally led research studies (OR, 0.65; 95% CI, 0.58-0.72; P < .001) ( Figure 2B). Continued privacy and data security concerns reported in the news were noted as a problem in the qualitative data. a Odds ratios were determined using logistic regression based on the method of generalized estimating equations including assessing the association of participants' demographic characteristics, study sponsor, and recruitment platform with willingness to share social media data.
b Statistically significant at false discovery ratecorrected P < .001.
c Statistically significant at false discovery ratecorrected P < .05.

European GDPR Law
Four hundred thirteen participants (63.0%) reported seeing GDPR-related emails and/or advertisements by T2. No significant difference in WTP or willingness to share social media data was found between participants who reported seeing the GDPR-related emails and those who did not.
112 participants (27.1%) said that the GDPR-related messages made them feel more secure about their data and provided proof that the organization was working on its data security. As 1 respondent explained, "It shows me that they make notice of our concerns and are fixing them." However, 301 participants (72.9%) felt the GDPR-related messages did not regain their trust. As 1 respondent stated, "I think the ads are just aimed at fixing a public relations problem. They still make their money from collecting our data and selling it and they aren't going to stop."

Discussion
This mixed-methods study found that trust in the use of digital platforms, such as Google search and This could have a significant impact on the generalizability of outcomes from biomedical research.
Although our thematic analysis indicated that better data security measures and transparency of data use may mitigate concerns regarding participation, less than a quarter of our sample indicated that they were reassured by recent attempts at regulation such as the GDPR policies. The findings from this study are understandable in light of growing evidence that data privacy policies available on digital platforms do not accurately disclose how that information is used. One recent article 33 found that many health apps share digital data with companies like Facebook and Google but fail to disclose this in their data privacy policies. A qualitative study 32 of participants' willingness to share research data reported similar findings, with trust in the research team and fears related to misuse arising as major concerns by potential participants. Our findings, combined with others, suggest that social media campaigns and policies to address how privacy and data security will be improved may not be sufficient to address WTP in online research and share digital data. As our survey results showed, participants remained mistrustful of these platforms several months after the platforms had sent out messages addressing their data security problems. However, partnership with universities and other trusted entities to develop better policies may be a useful solution, given how consistently our participants expressed trust in university-led research. As a number of studies indicate, participants' trust in research is closely linked to the institution conducting the research. 34

Limitations
The findings from this survey would benefit from further research and should be viewed with the following limitations in mind. First, this is a general population survey of participant impressions about WTP in research and willingness to share personal data. To confirm our findings, a study specifically comparing recruitment avenues would need to be conducted. Second, participants were identified through MTurk, and therefore the representativeness of the findings may be influenced by our sample selection method, even though participants were recruited to match the race/ ethnicity of the 2010 US Census data. 21 Although MTurk participants are likely to be more aware of data sharing policies and more comfortable with online research than the general public, recent studies suggest that for research of this nature, these samples tend to be as good as, or better than, in-person surveys. 35 Third, our sample was not recruited specifically to test hypotheses about racial/ ethnic, sex, or age differences. Particularly in regard to our findings that WTP seemed to improve in older populations over time, we believe this to be an artifact of the small sample of older adults who participated in this survey, and that in all likelihood these are not participants who are representative of older adults in the general population. Thus, to truly understand demographic differences in WTP, a large study that oversamples participants from different demographic groups will need to be conducted. In addition, the phrasing of survey questions listing depression as a health condition could also have negatively affected study participants' WTP, given the stigma associated with the disorder. 36 Despite these limitations, the data are still useful for both informing recruitment practices and providing information about the concerns people have regarding the secure management of social media data for research purposes, particularly at this time.

Conclusions
In conclusion, WTP in biomedical research advertised on social media platforms and search engines, as well as the willingness to share digital data with researchers, have been affected by recent news on the misuse of such data. Although university-led research is seen as more trustworthy than federally led or pharmaceutical company-led research, WTP is still affected. Despite these concerns, social media provides opportunities for conducting biomedical research at scale, 37 including enrolling minority populations, 5 and could help improve diversity in clinical trials, many of which are discontinued early due to recruitment challenges. 38 It will be important for researchers and research organizations to work more closely with participant communities to address concerns about data sharing and privacy.