Organizational Factors in Clinical Data Sharing for Artificial Intelligence in Health Care

Key Points Question Are organizational factors associated with the motivation of health organizations to share clinical data for artificial intelligence (AI) development? Findings In this qualitative study, 27 leaders from 18 health organizations were interviewed, and a predominant concern among them was data privacy risks. Most stakeholders viewed these as a substantial barrier for public health data sharing due to potential liability and reputational consequences; however, they identified external incentives as key factors for enhancing organizational motivation and fostering both within and across-sector data-sharing collaborations for AI development. Meaning The findings of this study suggest that data-sharing policies should be rooted in feasibility and incentivization strategies to promote responsible and equitable AI development in the health care sector.


Introduction
[7][8][9] Current research highlights problematic trends in health data sharing.A systematic review by Kaushal et al 10 highlighted that large data-sharing initiatives are often performed by large academic institutions from a few geographic locations with access to funding and the technical expertise to transform and share well-curated health data sets responsibly.While these data sets have been foundational for AI development, they often lack comprehensive representation across geographic, racial, and socioeconomic dimensions.This underrepresentation in health data sets precludes effectively generalizing algorithms to the wider population. 1,9though previous studies 11,12 have acknowledged the presence of organizational factors, there remains a gap in understanding how these factors specifically impact organizational readiness for sharing health data for AI development.While insights from the biomedical and genetic realms have highlighted challenges, such as data standardization, ethical dilemmas, and reidentification risks, AI introduces additional barriers to health data sharing that remain unexplored. 13,14Thus, studying the organizational factors affecting the flow of health data in the health care sector is crucial to the future of AI development.
To address this knowledge gap, we designed a qualitative case-based study to explore health data-sharing behavior across 18 organizations in the academia, government, nonprofit, and private health sectors.Our primary study objectives addressed (1) factors that motivate data sharing in health organizations, (2) organizational factors facilitating or hindering data-sharing efforts, (3) how the organizational structure (by sector) relates to motivation for clinical data sharing, and (4) factors that affect cross-sector data-sharing collaboration.

Methods
This qualitative study was approved by the Stanford School of Medicine Institutional Review Board.
All participants read waiver of consent forms and provided verbal consent to be interviewed; no financial compensation was provided.This study followed the Consolidated Criteria for Reporting Qualitative Research (COREQ) reporting guideline for qualitative research.

Study Design, Participants, and Recruitment
This study used a qualitative case-based design, an embedded multiple case study, to investigate organizational factors affecting data-sharing behaviors of organizations across 4 sectors (academia, government, nonprofit, and private) 15,16 (Figure 1).The academic sector included academic health centers.Government organizations comprised foundations and funding agencies that engage with health data, overseeing and funding health data-sharing initiatives, the nonprofit sector included data brokers and developers engaged in data sharing for public good, and the private sector included data brokers and developers engaged in data sharing for monetary gains.The selection of sectors and organizations for inclusion in this study was informed by a comprehensive literature review that identified entities actively involved in AI development or possessing the ability to share healthrelated data.Invitations were extended to individuals in leadership roles guiding data-sharing initiatives in their respective organizations.
A purposive, nonprobabilistic sampling strategy was used to maximize organization diversity and identify 78 individuals from 52 health organizations. 17Individuals who agreed to participate in this study were invited to complete an in-depth semistructured interview conducted in the English language (August 29, 2022, to January 9, 2023).All interviews were recorded with participant permission for publication of findings and quotations and transcribed verbatim by a professional transcription service (Athreon).According to constructivist-grounded theory techniques, interview sampling was iterative and informed by parallel data analysis that continued until additional interviews did not yield novel data or nuances to the emerging themes, indicating theoretical saturation at both the conceptual and thematic levels. 18ring the interview, participants completed a brief demographic survey that captured the participant's role and organization's data-sharing activity (eAppendix in Supplement 1).This information guided the content and direction of the discussions in the semistructured interviews.
Interviews were facilitated using an interview guide that the research team (A.Y., M.Y.N., J.L., A.M., D.L., and C.P.L.) iteratively developed (eAppendix in Supplement 1).The interview guide was further refined based on feedback from an external advisory group (identified in the Additional Contributions section) comprising experts in machine learning development, data set sharing, clinical informatics, data harmonization, curation, AI bias, organizational leadership, ethics, and qualitative research.All interviews were led by a researcher with more than 7 years' experience in qualitative AI health research (A.Y.).Interviews lasted between 30 and 60 minutes and were conducted in English through Zoom (Zoom Video Communications).In each interview, participants provided insights into their positions and an overview of the data-sharing procedures of their respective organizations.
Subsequent open-ended and probing questions encouraged them to elaborate on their motivations for engaging in data-sharing efforts, identify factors that facilitated or hindered such efforts, and discuss the outcomes observed with organizational norms and policies regarding organizational data-sharing decisions.

Data Analysis
Quantitative data from these surveys were analyzed using Microsoft Excel (Microsoft Corporation, version 16.78.3)to compute descriptive statistics.Qualitative interview transcripts were analyzed from a constructivist paradigm, informed by the constructivist-grounded theory approach, allowing for interpretive understandings and iterative logic to question the data and advance conceptual development.First, the interview transcripts were inductively coded by one of us (A.Y.) to generate a preliminary codebook.This codebook was subsequently used by the study members (M.Y.N. and additional contributor D.S.) who independently deductively coded the interview transcripts to validate coding credibility.The study team (A.Y., M.Y.N., and D.S.) met and reached consensus on any coding disagreements-a process that achieved more than 80% agreement.
9][20] The entire team (A.Y., M.Y.N.,  J.L., T.H.B., A.M., D.L., and C.P.L.) met on a weekly basis to deliberate on developing conceptual categories, ensuring the triangulation of themes by drawing from their diverse expertise in machine learning, organizational leadership, data sharing, clinical informatics, and qualitative research.These collaborative discussions contributed to conceptual refinement, ensuring that the emerging themes were both representative of the participants' perspectives and grounded in the data.All qualitative data were coded using analytical software (AXQDA Analytics Pro, Release 22.2.0).Throughout the research, one of us (A.Y.) engaged in memo-writing and maintained an audit trail of analytical decisions to promote reflexivity and transparency.

Conceptual Framework and Theory Development
A conceptual framework that explains how specific organizational factors may be associated with data-sharing behavior was developed through inductive and deductive approaches in 4 phases: (1)   identify the generalizable key factors explaining data sharing, (2) generate and refine emerging hypotheses through data collection and analysis, (3) map the variables and outcomes using the abstraction hierarchical approach, and (4) triangulate the emerging framework through iterative team analysis and establish organizational theories in health care. 11,12,21This framework is considered a determinant framework through the implementation science typology. 22

Organizational Motivation
Organizational motivation refers to the degree to which data sharing and its perceived value align with organizational values, missions, and priorities.Although organizational motivation is a crucial aspect of organizational readiness to engage in data-sharing initiatives, it can greatly vary between sectors.For example, some organizations may be willing to support data-sharing efforts, even without direct financial gains, if they align with their mission.However, private organizations may be more motivated by financial gains and remaining competitive in the data-sharing market.
"Government organizations and academia ultimately have the good of the public in mind…we are meant to be a neutral entity that evaluates science and medicine in a way that isn't biased by profit.The principles of our mission and nonprofit status really do guide us here.That's why we can put governance in place that prioritizes ethics over profit" (academia, chief operating officer).
"As a company, we have pressure from our venture capitalists-our investors say you need to make money…."(private, chief executive officer).

Organizational Capabilities
The second theme centers around organizational capabilities, or the infrastructure, resources, people, and data-sharing governance required to develop and ensure responsible data sharing.In other words, this theme is rooted in whether an organization has the capacity to engage in and oversee data sharing, because transforming clinical data to be AI ready is complex, especially when data aggregated from multiple sites have disparate data formats and content. "

Data-Sharing Behavior by Sector
Within the above-described framework, we found that sector-specific differences related to the greater data-sharing ecosystem that comprises these entities (Figure 3).For example, a key capability of academic medical centers is accessibility to relevant clinical data, along with other capabilities that promote data-sharing champions, garner public trust, and attract extramural funding.A strength of government organizations is the ability to create incentives for other organizations to share data, helping align an organization's mission with its prioritization for data sharing.
"What makes an institution share data when they wouldn't ordinarily share it?Sometimes it's just other peer institutions who are sharing when they're not.If they feel as though they're going to miss out on important research, that can be an incentive" (data privacy expert, academia).
These collaborations are often bolstered by nonprofit and private intermediaries serving as trusted data brokers, facilitating access to deidentified ealth data that complies with privacy laws.For example, private organizations have begun to enable large-scale data sharing by providing cloud storage and environments that improve access to AI-ready health data.
"Industry originally helped us design our platform….We didn't want to reinvent the wheel.Why would you reinvent the wheel when there's already security and other great features that we could make HIPAA-compliant, FISMA [Federal Information Security Management Act]-low certified as well.
We improved the security, and they provided us with research credits because we're a large organization" (nonprofit, data science lead).
Nevertheless, organizations within the academic sector often display risk-averse tendencies to sharing data, primarily due to reputational and public trust concerns.To minimize these risks, these

Factors influencing data-sharing
The black upward arrowheads symbolize increase (facilitator) whereas red arrowheads represent barrier associated with decreased data sharing.Sideways arrowheads represent a variable factor.Shaded cells distinguish unique strengths or challenges for a sector compared with the others.
a Academia has distinct access to health data.
b Private sector faces challenges in public data-sharing collaborations.
c Private sector is notably less risk averse to datasharing compared with the other 3 sectors.

JAMA Network Open | Health Policy
Organizational Factors in Clinical Data Sharing for Artificial Intelligence in Health Care organizations tend to favor a conservative approach to data deidentification, which incurs cost and labor, making data sharing burdensome.
"[Patients have] entrusted us with that data to make decisions regarding their care, and we want to make sure that again, with our collaborations, whether it is research related or we're doing something to advance knowledge, to work with one of our business partners….We want to have assurances that while we're in that partnership or that collaboration, if we were to choose to go with another vendor… we have a way to get our data back" (academia, regulatory privacy expert).
Similarly, although academic and government organizations aim to promote public good through scientific discovery and even if they are motivated to share data, how much each organization engages with such efforts can be associated with additional extrinsic incentives to make this process sustainable.These incentives, encompassing both financial and reputation benefits, can catalyze data-sharing behavior in organizations otherwise unmotivated to participate in such efforts.
"I'm all for the public good.I love the public good, but I think we need to define public good in ways that are sharper.Like, if you have specific costs attached to making data available, you can't just say, believe in public good (capital P and capital G).That's not going to work" (academic and nonprofit, organizational leader).
To fulfill their missions, academic and government organizations have developed a symbiotic relationship based on their complementary capabilities and aligned motivations.For example, government organizations rely on academic institutions to contribute valuable data to their repositories, while academic institutions capitalize on government funding to build the technical infrastructure required to enable responsible data sharing across multi-institutional systems.Such relationships are a key accelerator of public and research network collaborations across these sectors.
Cross-sector collaborations are also essential for nonprofit and private organizations.However, a common barrier here is building trustworthy relationships with academic medical centers and integrated health systems (Figure 3).Because academic medical centers have higher standards for data stewardship, private organizations struggle to establish data-sharing agreements, particularly with large academic medical centers.This is where nonprofit organizations can act as data intermediaries.For example, professional societies, which fall under the umbrella of nonprofit organizations, can act as trusted data brokers facilitating data-sharing partnerships among the academic, government, and private sectors.
Overall, organizational readiness across sectors is dependent on the unique capabilities of each organization and the extent to which extrinsic incentives can stimulate organizational motivation by aligning data sharing with the organization's mission.

Discussion
The flow of health data between data providers and users is essential for AI development.However, the factors affecting data-sharing behavior have not been extensively studied.To address this gap, we conducted a qualitative cross-sectional study of factors shaping data-sharing behaviors in the academic, government, nonprofit, and private sectors.This investigation generated 3 key findings with implications for accelerating responsible data sharing for AI in health care.Second, our cross-sector analysis found that organizational capabilities surrounding data sharing varied greatly across sectors.While government, nonprofit, and private organizations exhibited greater resource availability, academic organizations held distinct advantages in access to health data, granting them more control on data-sharing agreements and data availability as a whole.
Because of this variation, collaborations across sectors are symbiotic, helping to compensate for sector-specific barriers and representing a key approach to ensuring equitable data sharing and more generalizable AI algorithms. 10ird, in-depth understanding of organizational structure and roles enabled us to identify the diverse approaches of each sector to the regulations surrounding data sharing.For example, the conservative approach of academic organizations stemmed from the necessity for increased compliance and mitigation strategies to safeguard data privacy and ensure proper use of the shared data.Interviews with stakeholders showed data ownership and stewardship to be major concerns, often stifling data-sharing partnerships, with privacy issues being the primary concern.Thus, collaborations both within and across sectors were more likely to be successful if the organizations involved demonstrate aligned values and complementary capabilities to overcome regulatory hurdles.
Overall, the degree to which an organization's values align with its prioritization of data sharing appears to define its motivation to engage in data-sharing efforts (Figure 2).However, organizations with a low degree of alignment, and thus low motivation, can be persuaded by extrinsic financial or reputational incentives.For example, extramural funding from government organizations can compel academic institutions to participate in data-sharing collaborations, and private organizations may agree to partnerships with intermediary nonprofit data brokers to bolster their public perception.

Limitations
This study has limitations.Despite extensive efforts to reach private data brokers and integrated health systems, the response from these sectors was low, which may alter the overarching narratives on data sharing.By focusing on organizations already engaged in data sharing, this study may have inherent limitations in capturing the broader challenges and reservations faced by organizations not represented in this study.Additionally, most (88.9%) of the studied organizations were US based, potentially limiting the generalizability of our findings to non-US organizations, particularly given differences in data privacy regulations.
This exploratory study has shed light on the factors affecting the readiness of organizations for health data sharing for AI development.The study offers a plausible explanation to recognize the relationship between organizational motivation, capabilities, and context.However, there remains a gap in understanding private health care organizations possibly less inclined toward data sharing to shed light on the organizational barriers from this perspective.Future research should explore how extrinsic incentives can boost organizational motivation and address competing priorities that stifle cross-sector data-sharing collaborations.Addressing these barriers will be pivotal to ensure a sustainable flow of health data through the ecosystem to enhance AI development in the health care sector.

Conclusions
Our research suggests that an organization's readiness for health data sharing for AI development is determined by its motivation and capabilities.We identified that sector-specific factors play a pivotal role in organizations' data-sharing behavior and that extrinsic incentivization can substantially bolster collaborations to overcome these sector-specific barriers.To improve the sharing of health data, aligning health data sharing policies with incentives that motivate organizations is crucial.This alignment may help increase the sharing of health data in a way that is sustainable, responsible, and fair.

Figure 1 .
Figure 1.Overview of Study Design

Figure 3 .
Figure 3. Strengths and Weaknesses in Data-Sharing Factors Across Academia, Government, Nonprofit, and Private Sectors

First, organizational motivation
and capabilities are the core factors affecting organizational behavior surrounding data sharing.Although previous studies have proposed alternative approaches and frameworks that attempt to relate the involved parties and stakeholders,6 they have mostly focused on specific factors, without elucidating the dynamic interactions among the identified factors.Our conceptual framework more broadly categorizes organizational readiness for data sharing into organizational motivation and capabilities before further exploring how these constructs affect each organization's behavior and role in the data-sharing ecosystem.Recognizing and differentiating the components within these constructs is paramount to addressing the various barriers impacting organizations' readiness for clinical data sharing.

Table . Demographic
Characteristics and Breakdown of the 18 Organizations and 27 Leaders Interviewed USAbbreviation: EHR, electronic health record.
If you've got dozens and dozens of sites that are streaming data into you, then you need to have people that are going to manage that data and organize it as it's coming in.It's a huge effort…and everybody has a certain level of expertise for doing each piece of it" (government, data curation expert).Interviewees described that, despite willingness to participate in data-sharing efforts, the appropriate infrastructure and people to both manage and process the data are key requirements.
Clinical experts are critical to ensuring appropriate context and quality for data to be meaningful for AI research."…Itrequiresbackground contextual knowledge….For every single variable that you're going to curate out of EHR [electronic health record] data into a feature table or any sort of downstream transformation, that contextual knowledge is absolutely key in order to make sure that what you get downstream isn't a total misrepresentation of what really happens with a patient" (government, data informatics expert).Another dimension is the regulation surrounding data sharing.The authorization process for sharing data is arduous and considers the distinct use case of the data, the data type, and contextual factors.Moreover, given the various risks and consequences of data deidentification and inappropriate handling, organizations tend to incorporate a variety of safeguards to ensure data privacy."There are a lot of agreements in place when it comes to institutional sharing….If it is private funded, sponsor funded, sponsored research, again, there's going to be contractual obligations between the sponsor and the participating institutions.The federal government also oversees, they want to make sure if it is human subjects research, there are regulations that the investigators, the institutions have to meet and be in compliance, otherwise there's fines and penalties.Just like if