Breadth and Exclusivity of Hospital and Physician Networks in US Insurance Markets

Key Points Question How does the breadth of health care networks and the degree to which they overlap vary within and across specialties and insurance markets? Findings In this cross-sectional study of 1192 health care networks, large-group employer networks were broader than small-group employer, marketplace, Medicare Advantage, and Medicaid managed care networks. In many states, narrower networks had as much, if not more, overlap across different insurers’ networks than the broadest networks; areas with less concentrated insurance, physician, and hospital markets had narrower and more exclusive networks. Meaning These findings suggest that the structure of plan networks may be a factor in determining care affordability and continuity in the United States, particularly given how frequently individuals change insurance plans.


Introduction
This document provides methodological details and supplemental analyses for the manuscript. This report was compiled on 2020-10-16 using R version 4.0.2 (2020-06-22). Replication code is available on github.

Geographic Measures Defining Geographic Accessibility
Our primary measure of geographic accessiblity was based on a driving-time based isochrone centered on the population-weighted centroid of ZIP-code tabulation areas (ZCTAs). We identified centroids based on 2010 Census block population data using the Geographic Correspondance Engine at the Missouri Census Data Center. We constructed thirty-and sixty-minute isochrones for each ZIP using the Mapbox Application Protocol Interface (API).
Isochrones for an Example ZIP Code eExhibits 1 and 2 map the isochrones for ZIP 53005 (blue polygon) near Milwaukee, WI. Also plotted in eExhibit 2 is the ZIP's surrounding county (solid black line) and the locations of primary care physicians (blue dots) and hospitals (pink squares) within the 30-minute isochrone.
eExhibit 1: 30-and 60-Minut e Isochrones for Example ZIP As eExhibit 2 shows, isochrone-based definitions of geographic accessibility do not (arbitrarily) limit measures of access to only those providers located within geo-political boundaries (e.g., the county). Indeed, eExhibit 1 demonstrates (for the 60-minute isochrone) that geographic access can often reach into neighboring states.
For our primary results we utilize a 60-minute isochrone for every ZIP code, however in sensitivity analyses we considered a 30-minute isochrone for ZIPs located within metropolitan core-based statistical areas (i.e., non-rural areas).

Validation of Hospital and Physician Data
While the Vericred data included the specialty (for physicians) and geocoded address location for each clinician and hospital facility, we included additional sample inclusion safeguards to ensure that the "denominator" of geographically accessible clinicians/facilities from a given ZIP code included only those currently practicing in the area.

Common Reasons for Network Errors
According to CMS audits, the most common reason for provider network errors is incorrect information on clinic location and contact information for in-network clinicians (74% of all errors; see table below). A frequent reason for the other 26% of errors (e.g., physician should not be listed as in-network) is retirement and moving from the area or clinic/facility. This often occurs because 53005 eExhibit 2: 30-Minut e I sochrone and Surrounding County Boundary Point s for Example ZIP, wit h Geographic Locat ion of Primary Care Physicians (circles) and Hospit als (squares) insurance carriers have historically been fairly passive about updating provider network data (i.e., they do not routinely canvass the directory to ensure that every provider is still practicing at each location).
To address this limitation we augmented the insurance network data with a comprehensive validation process to ensure that the information we used reflected the best possible information on the specialty and active status of physicians and hospitals in the U.S. We describe our validation process separately for hospitals and physicians below.

Hospital Data Validation Process
One challenge with hosptial network data is that often, only a single national provider identifer (NPI) is provided for facilities in a given network. However, hospitals often have multiple NPIs registered for different buildings and units. For example, our final sample includes 4,127 unique facilities that, collectively, have 14,680 NPIs associated with them. This creates opportunities for data errors because the NPI number listed by one carrier may not be the same NPI provided by another. Without further adjustment we might incorrectly determine that the same facility does not appear in both insurers' networks, which would invalidate our measures of exclusivity.
We addressed this challenge by constructing a master hospital NPI crosswalk that identified all NPIs associated with a hospital. This crosswalk was constructed using American Hospital Association (AHA) and National Plan and Provider Enumeration System (NPPES) data. Following the methodology outlined by Cooper et al (Quart J. Economics 2019), the following steps were used to construct the crosswalk: 10. Some hospitals in the NPI Registry were not in the AHA survey data files. For these hospitals, we pick one NPI as 'PRIMARY' and, using the match steps outlined above, add an 'X' to the AHA ID column and append the 'PRIMARY' NPI to all matched rows.

Physician Data Validation Process
To validate physician specialty, active status, and practice location information we drew on 2019 data from IQVIA and Hospital Compare. IQVIA routinely canvasses office-based physician clinics nationwide to collect information on specialty and organizational relationships, among other things. Because the IQVIA data are primarily sold for marketing purposes, the data contain up-todate contact information (including clinic ZIP code) for nearly all active office-based (as well as some facility-based) physicians nationwide.
One downside to the IQVIA data is that the canvassing frame (mostly office-based physician clinics) undercounts certain physician specialties and types. The Medicare Payment Advisory Commission, for example, has found that IQVIA data do not cover roughly 30 percent of physicians who bill Medicare in a given year. Not surprisingly given its sampling frame, the IQVIA undercount is concentrated among certain hospital-based specialties (e.g., radiologists, pathologists, and anesthesiologists) and among physicians working predominantly in hospitals and other facilities (e.g. emegency medicine physicians and general internists working as hospitalists and intensivists).
To address these data gaps and to further validate the information contained in the IQVIA and Vericred data we drew on additional December 2019 data from Physician Compare. Physician Compare is updated twice monthly by CMS and captures current information on clinic addresses, primary specialty, and licensure data. Critically, Physician Compare captures all active physicians who submitted a Medicare claim within the last 12 months of data collection, or who newly registered within the Medicare Provider Enrollment, Chain, and Ownership System (PECOS) within 6 months of data collection. Thus, it is effectively a continually-updated census of physicians billing Medicare. That said, Physician Compare also has its own limitations since it will not capture physicians who do not bill Medicare.
We geocoded the location of all physicians using address data from these three data sources (Vericred, IQVIA and Physician Compare) based on the following process. Generally speaking, this process utilized the most detailed address information available from the data sources with the broadest agreement on clinic/facility location. Our process also reflects the assumption that any address information in IQVIA and Hospital Compare was more current than information in the Vericred data. This assumption rests on the observation (relayed to us by Vericred) that their address data was largely based on the NPPES (which often list a home address if the physician has licensure data sent there, and is less frequently updated to reflect changes in address, residency/training status, and clinic location) and network scrapes from the web. Likewise, we assumed that if a physician did not appear as active in either IQVIA or Hopstial Compare data, then he or she was likely not directly engaged in patient care in 2019. Finally, our process ensured that we identified all clinics where a physician practices.
1. We first compared unique NPIs in either IQVIA and Physician Compare to get a "master" listing of active physicians in the U.S.
2. We then took all NPIs, their specialty, and the geographic location of all listed clinics from this master listing to the Vericred data.
3. We next determined whether the clinic ZIP code listed in the Vericred data matched any of the ZIP codes associated with the NPI in the IQVIA and Hospital Compare data. 6. With this master data created, we then geocode each address (or population-weighted ZIP centroid, in cases where IQVIA data were used). These geocoded coordinates are then used to isolate the physicians within each ZIP code's isochrone.
To ensure consistency, we utilized primary specialty information from the data source used to inform the clinic address locations. For example, if Hospital Compare was used to determine the address or addresses for a given NPI, then we utilized the primary specialty information from Hospital Compare for that NPI. Generally speaking the data sources agreed on specialty, though there was some disagreement between IQVIA and Hospital Compare for certain sub-specialties. For example, among physicians identified as emergency medicine in either IQVIA or Hospital Compare, 8% had a different primary specialty in one of the data sources. Likewise, there was 6.9% disagreement for Cardiology. All other subspecialties considered had much lower rates of disagreement (e.g., 0.3% for anesthesiology, 0.7% for radiology, 0.3% for behavioral health, 0.9% for othopedic surgery, and 3.6% for primary care).
Based on this process we then compared counts of active physicians by specialty to another data source: the 2015 American Medical Association Master File. While these comparisons were somehat applies-to-oranges-for example, the AMA data were from 2015 and our data reflected 2019 counts, and the AMA data did not include clinical psychologists in its definition of behavioral eExhibit 3: Tot al and Unique Count s of Hospit als and Physicians by Input Dat a Source health physicians-our counts aligned well with counts of active physicians in the masterfile.

Network Landscape Files
Our analyses also requried identification of the set of plan networks available in each ZIP. For some markets (marketplace, small-group) this was straighforward beacuse CMS and Vericred had crosswalks that allowed us to map each plan marketed in the ZIP to its network. However, for other markets (Medicaid managed care, large group, and Medicare Advantage) we had to infer the set of networks using additional data sources. Below, we describe the process of identifying provider networks available and/or marketed in each ZIP.

Marketplace and Small Group Plans
To identify marketplace and small group plans available in a ZIP code, we first mapped each (population-weighted) ZIP centroid to its county and 3-digit ZIP code. States, in conjunction with CMS, define health insurance rating areas for marketplace and small group plans based on clusters of contiguous counties or 3-digit ZIP codes. This mapping allowed us to map each ZIP code to its geographic rating area.
Crosswalks provided by CMS and Vericred (including HIX Compare, which Vericred curates) facilitated matching of each rating area to the set of marketplace and small group plans marketed in the area. We then mapped each of these plans to its network using Vericred crosswalks. The resulting output provided us with the set of networks available in the ZIP.
Based on this process, of all possible ZIP-network matches, we had network data for 99.72% for marketplace and 97.7% for small-group markets.

Medicaid Managed Care Plans
Identifying the set of available Medicaid managed care plans required the use of enrollment data for individual insurance carriers. These data were obtained for January 2019 from Decision Resources Group (DRG). Specifically, the DRG data contained county-level enrollment (based on enrollee For each county we identified the set of carriers with non-zero Medicaid managed care enrollment based on beneficiary residence. We then mapped each ZIP's population-weighted centroid to the underlying county to match this set of carriers to every ZIP code in the county. We then matched this set of carriers to the Vericred data to identify the Medicaid networks available. Notably, certain states (AL, AK, AR, CT, ID, ME, MT, NC, OK, SD, VT) do not utilize Medicaid managed care and consequently were excluded from our analyses of Medicaid networks. In addition, Vericred did not capture network information for all Medicaid carriers nationwide. However, our data matching process determined that the Vericred data captured networks for 76.8% of Medicaid managed care enrollment nationwide. Table below shows, however, that the fraction of matched networks varied across states.

Commercial (Large-Group) Networks
We identified the set of commercial (large-group self insured) carriers using a similar method as for Medicaid managed care. Specifically, we used the county-level DRG data to isolate the set of carriers with non-zero enrollment (based on beneficiary residence). We then mapped this set of carriers to each ZIP code with a population-weighted centroid in the county, as well as to the set of networks associated with those carriers in the Vericred data. The Vericred data also included a market identifier for each network, allowing us to identify only those large-group networks associated with carriers in the ZIP.
Because the DRG data included enrollment data on all carriers in the ZIP we were able to estimate, to a rough approximation, what fraction of carriers matched between the DRG data and the Vericred large group networks. Based on the above process we matched networks to carriers with approximately 64% of large-group self insured enrollment nationwide.
One challenge to identifying large group networks is that we only knew enrollment at the carrier level, not the plan level. Moreover, even if we knew plan-level enrollment figures for each ZIP, we lacked a crosswalk mapping each network to each plan. Therefore, our analysis of large group newtorks has several limitations: (1) we were not able to capture large-group networks for carriers covering 36% of enrollment; (2) we could not verify that, among the 64% for whom we did have networks, those networks were exhaustive of the networks offered by the carriers. In other words, unlike all other markets considered, while we might match at the carrier level, we lacked data to verify that we matched at the network level.

Medicare Advantage
To isolate the MA plans available in the area we used the July 2019 enrollment by contract/plan/state/county files constructed by CMS. We first mapped each ZIP centroid to its county, then matched this county information with MA enrollment data in the county. Because some individuals live in multiple locations and thus may buy their MA plans from a different residence, we restricted the set of MA networks to only those with at least 2% enrollment market share in the county. We then used a crosswalk mapping each MA plan ID to its network ID in the Vericred data using a crosswalk provided to us by Vericred. This process resulted in matching networks to plans covering 97% of Medicare Advantage enrollment in July 2019.

Measures of Market Concentration
Our analysis relies on measures of market concentration for hosptials, physicians and insurers based on the Hirschman-Herfindahl index, or HHI. HHI measures are commonly used to quantify the degree of concentration among market participants, and are constructed based on the sum of squared market shares (expressed as a percentage) within a defined market. Values closer to zero indicate markets in which market shares are more evenly distributed among multiple market participants. By comparison, an HHI value of 10,000 (i.e., 100 ) indicates a completely consolidated market.
The geographic market definition we used for all HHI measures was the commuting zone.
Commuting zones are comprised of geographically contiguous counties with strong within-area clustering of commuting ties between residental and work county, and weak across-area ties. Thus, this set of approximately 600 commuting zones nationwide approximates areas of economic activity over which it is reasonable to consider measures of market concentration. We used commuting zones based on Census data from 2010 constructed by researchers at Penn State.
While we provide basic details on our HHI measures here, we provide extensive documentation, code, and analyses of HHI methods in our github document Defining Markets for Health Care Services.

Physician HHI
Our measure of physician HHI relies on the methodology outlined in Richards et al (Health Serv Res 2017). As described in that study: [HHI measures reflect] the allocation and organization of all physician specialists within a given geographic area. In other words, we capture if an insurer would have relatively few or many physician practices to negotiate with in regard to enrollee access and payments (as well as other contractual terms).
To construct our HHI measure by commuting zone we utilized detailed information on organizational relationships reported in the IQVIA data. By assigning each physician/NPI to her organization, we were able to construct count measures of the total number of physicians per organization in an area. These count measures were the basis for the market shares that fed into standard HHI equations (i.e., the sum of squared market shares).

Hospital HHI
Our measures of hosptial HHI draw on the 2016-2017 CMS hospital service area files that capture ZIP-level utilization of each general acute care hosptial. We use these data to identify the set of hospitals that treat Medicare patients who reside in each ZIP. We calculate an HHI value for each ZIP code and then aggregate up to the commuting zone level by taking a weighted (by 2010 Census population) average across ZIPs with centroids located within the commutizing zone.

Insurer HHI
To construct measures of insurer HHI we calculated total enrollment by insurance carrier in each county, then aggregated these enrollment totals up to the commuting zone level. These market shares then fed through standard HHI formulas. We did not calculate HHIs separately by market (e.g., large group, small group, etc.) under the assumption that insurers use the full weight of their enrollment totals as leverage when negotiating networks.

Network Measures Network Definitions
We measure provider network connections using binary bipartite networks that capture innetwork relationships between the ( ) individual hospitals or physicians of a given specialty ( ) and the ( ) provider networks available in a geographic market ( ). That is, these × networks capture binary information on whether each physician/hospital is in-network for each provider network available in the geographic market. As described above, the rows of these matrices are determined by the total number of physicains/hospitals within the isochrone, while the columns of the matrix are determined by the number of networks marketed as part of plans available in the ZIP. Thus, each combination of ZIP and specialty type receives its own bipartite matrix.
Physicians and hospitals are identified using their national provider identifier (NPI). In the case of hospitals (which, as noted above, can have multiple NPIs), we utilize a single NPI so we do not overcount hosptials in the bipartite networks.
We further define another set of binary matrices to identify networks offered by the same insurer, and networks for different insurance markets (large-group, small-group, Medicaid managed care, Medicare Advantage, and marketplace). These matrices are utilized to construct measures separately by market, and to construct measures of the degree of connections across insurers.
Example Network eExhibit 6 below provides an example bipartite network matrix that will be used throughout this section. In this example, there are 10 clincians (NPIA-NPIJ), 10 provider networks (A1-G4), and 7 insurers (A-G). One insurer (G) offers 4 separate networks (G1, G2, G3, and G4). Each cell contains binary information on whether the clinician is in-network the network.
While the network breadth measure provides information on the overall size of a provider network, it does not capture information on how connected the provider network is to other networks in the area. That is, provider networks for two competing insurers could be relatively broad but each may have exclusive contracts with its in-network physicians and/or hospitals.
We thus construct a measure, the normalized strength of the network. This measure is defined as the total number of connections to other networks divided by the total possible of connections. Thus, a completely exclusive network (i.e., no connections with other insurers) will receive a value of 0, while a network with many connections will receive a value closer to 1.
It is easiest to build intuition for the normalized strength measure using the bipartite (i.e., providernetwork) matrix. eExhibit 10 highlights how we measure strength for a single network (G1).
In this example we are interested in the degree to which a given insurer's network is connected to other insurers' networks. Thus, a count of the total number of shared connections with other insurers' networks (shown in red in eExhibit 10) is the numerator for measuring strength. In the case of network G1 the total number of shared connections is 5.
The denominator for the normalized strength measure is a count of the total number of possible connections a given network could have with other networks. For the example network, there are 4 in-network clinicians and 6 networks offered by other insurers. Thus, if those 4 in-network clinicians for G1 also are in-network for the other 6 networks, there are a total of 24 possible connections. Thus, the normalized strength value for network G1 is 0.2083 (5/24).
Our primary results rely on strength measures as described above-that is, measures constructed by considering only connections with other insurers' networks. However, we also considered a total strength measure whereby we allowed for connections across all networks in the area. eExhibit 11 below adds in our grouped strength ratio measure as hollow "rings" at each node. As can be seen in eExhibit 11, networks only loosely connected to other insurers' networks receive small normalized strength values.