Each data point represents the adjusted mean rate of completeness by condition across all virtual visit companies (A) and by virtual visit company across all conditions (B). The error bars indicate 95% CIs; dotted line, the aggregate mean across conditions or virtual visit companies. Variations in completeness by condition and by virtual visit company were statistically significant (P < .001). UTI indicates urinary tract infection.
Rates of naming the correct diagnosis for each visit are based on whether the physician stated the correct diagnosis for each encounter. Each data point represents the adjusted mean rate of naming the correct diagnosis by condition across all virtual visit companies (A) and by virtual visit company across all conditions (B). The error bars indicate the 95% CIs; dotted line, the aggregate mean across conditions or virtual visit companies. Variations in naming the correct diagnosis by condition and by virtual visit company were statistically significant (P < .001). UTI indicates urinary tract infection.
Each point represents the adjusted mean rate of adherence by condition across all virtual visit companies (A) and by virtual visit company across all conditions (B). The error bars indicate 95% CIs; dotted line, the aggregate mean across conditions or virtual visit companies. Variation in guideline adherence was statistically significant by condition (P < .001) and virtual visit company (P = .009). UTI indicates urinary tract infection.
Each point represents the adjusted mean rate of adherence to guidelines in key management decisions for streptococcal pharyngitis and low back pain (best adherence [A]), for ankle pain and recurrent female urinary tract infection (UTI) (lowest adherence [B]), and for viral pharyngitis and acute rhinosinusitis (intermediate adherence [C]) for each virtual visit company. The error bars indicate 95% CIs; dotted line, the aggregate mean across virtual visit companies. Lower rates indicate lower adherence to guidelines in management decisions. Variation between virtual visit companies in adherence to guidelines was not statistically significant for streptococcal pharyngitis and low back pain (P = .29) or for ankle pain and UTI (P = .33); variation was significant for viral pharyngitis and acute rhinosinusitis (P < .001).
eMethods. Identification of Companies and Power Analysis
eTable 1. Distribution of Cases by Condition and Company
eTable 2. Brief Summary of Cases and Correct Diagnoses
eTable 3. Rates of Completeness of History and Physical Examination by Company
Schoenfeld AJ, Davies JM, Marafino BJ, Dean M, DeJong C, Bardach NS, Kazi DS, Boscardin WJ, Lin GA, Duseja R, Mei YJ, Mehrotra A, Dudley RA. Variation in Quality of Urgent Health Care Provided During Commercial Virtual Visits. JAMA Intern Med. 2016;176(5):635-642. doi:10.1001/jamainternmed.2015.8248
Commercial virtual visits are an increasingly popular model of health care for the management of common acute illnesses. In commercial virtual visits, patients access a website to be connected synchronously—via videoconference, telephone, or webchat—to a physician with whom they have no prior relationship. To date, whether the care delivered through those websites is similar or quality varies among the sites has not been assessed.
To assess the variation in the quality of urgent health care among virtual visit companies.
Design, Setting, and Participants
This audit study used 67 trained standardized patients who presented to commercial virtual visit companies with the following 6 common acute illnesses: ankle pain, streptococcal pharyngitis, viral pharyngitis, acute rhinosinusitis, low back pain, and recurrent female urinary tract infection. The 8 commercial virtual visit websites with the highest web traffic were selected for audit, for a total of 599 visits. Data were collected from May 1, 2013, to July 30, 2014, and analyzed from July 1, 2014, to September 1, 2015.
Main Outcomes and Measures
Completeness of histories and physical examinations, the correct diagnosis (vs an incorrect or no diagnosis), and adherence to guidelines of key management decisions.
Sixty-seven standardized patients completed 599 commercial virtual visits during the study period. Histories and physical examinations were complete in 417 visits (69.6%; 95% CI, 67.7%-71.6%); diagnoses were correctly named in 458 visits (76.5%; 95% CI, 72.9%-79.9%), and key management decisions were adherent to guidelines in 325 visits (54.3%; 95% CI, 50.2%-58.3%). Rates of guideline-adherent care ranged from 206 visits (34.4%) to 396 visits (66.1%) across the 8 websites. Variation across websites was significantly greater for viral pharyngitis and acute rhinosinusitis (adjusted rates, 12.8% to 82.1%) than for streptococcal pharyngitis and low back pain (adjusted rates, 74.6% to 96.5%) or ankle pain and recurrent urinary tract infection (adjusted rates, 3.4% to 40.4%). No statistically significant variation in guideline adherence by mode of communication (videoconference vs telephone vs webchat) was found.
Conclusions and Relevance
Significant variation in quality was found among companies providing virtual visits for management of common acute illnesses. More variation was found in performance for some conditions than for others, but no variation by mode of communication.
Commercial virtual visits are a new form of physician-patient interaction in which patients use websites to request synchronous (live) consultation—via videoconference, telephone, or webchat—with a physician whom they have not met previously. Commercial virtual visit companies have no in-person care option.1 Instead, advertisements for commercial virtual visit companies emphasize easy access to care, especially acute care, over the web.2 Virtual visits may be appealing because difficulty accessing timely care for acute problems from local brick-and-mortar health care providers (primary care practices, retail clinics, and urgent care centers) is common. In 2013, less than half of US adults reported being able to get same- or next-day appointments with their physicians and less than 40% reported being able to get care after hours without going to the emergency department.3
Commercial virtual visit companies have experienced rapid growth. One company reports a user base of more than 6 million people.2 Acceptance by payers is also rising; one of the nation’s larger insurers, Anthem, has launched its own national virtual visit initiative.4 Moreover, the percentage of large employers offering virtual visits tripled from 2010 to 2012,5 and the number of virtual visits is projected to continue to grow rapidly in the near future.6
In response to this change, some state medical boards have placed limits on how virtual visits can be performed. Some states only allow telemedicine in situations in which an ongoing physician-patient relationship exists. Others require that a virtual visit occur by videoconference (rather than telephone or webchat), despite a lack of evidence regarding the optimal mode of virtual visit communication.7- 10 Recently, the Federal Trade Commission commented not only on the growth of interstate virtual visits but also on the uncertainty about which government agencies should oversee them.11 In addition, the industry organization for commercial virtual visit companies, the American Telemedicine Association, is developing voluntary standards for its members to consider.1
The urgency of the need to develop a regulatory framework or industry-promulgated standards will depend, in part, on how much quality of care varies among virtual visit companies. If the variation is large, then characteristics of the companies or their processes of care likely influence the quality of the care that patients receive. This situation would constitute a rationale to consider standards or regulations to protect patients.
To measure the variation in performance among commercial virtual visit companies, we used an audit method to evaluate the quality of care provided by the companies with the highest volume of web traffic. We selected 6 conditions that the companies advertised that they treat, that have evidence-based guidelines, and that have been used in previous studies to measure quality.12- 19 We trained standardized patients to present as mystery shoppers with these conditions and gathered information on processes of care and decision making. In addition, because some states require virtual visits to occur by videoconference,7- 10 we compared the quality of care by mode of communication (videoconference vs telephone vs webchat).
Question What is the quality of care provided by companies that offer virtual visits for management of common acute illnesses?
Findings In this audit study using standardized patients, significant variation in quality was found among companies providing virtual visits for management of common acute illnesses. A statistically significant variation was found in guideline adherence among virtual visit companies and by condition but not by mode of communication (videoconference vs telephone vs webchat).
Meaning This study provides, to date, the first evaluation of the variation in quality of care by companies that offer virtual visits for management of common acute illnesses.
We decided a priori to study the 8 most frequently visited companies (as determined by Alexa Rankings; http://www.alexa.com) that met our eligibility criteria (eMethods in the Supplement). These companies included Ameridoc, Amwell, Consult a Doctor, Doctor on Demand, MDAligne, MDLIVE, MeMD, and NowClinic. This study was approved by the institutional review board of the University of California, San Francisco.
All visits were initiated by a standardized patient (described below) visiting a company website. Most companies offered encounters only through videoconferencing or telephone. When given a choice of visit modality, standardized patients were instructed to use a coin flip to choose between telephone and videoconference. On occasion, the virtual visit physician would override the standardized patient’s choice and proceed with a different mode (videoconference, telephone, or webchat) for technical or convenience reasons.
Six clinical case vignettes were designed in consultation with a panel of 4 board-certified physicians representing pediatrics (N.S.B.), emergency medicine (R.D.), internal medicine (G.A.L.), and pulmonary medicine (R.A.D.). Among the set of conditions the websites advertised that they treated, the panel chose situations in which a widely recognized and used guideline applicable to the situation existed. We then wrote the scenarios to have as many vignettes in which guidelines recommended an action (prescribing or ordering tests) as vignettes in which guidelines recommended no action. We were limited to 6 total vignettes and the desire to have adequate power to assess care at the top 8 websites (eMethods in the Supplement). The cases involved low back pain, recurrent female urinary tract infection (UTI), acute rhinosinusitis, viral pharyngitis, streptococcal pharyngitis, and ankle pain. Vignettes were written to represent typical cases that might be seen in an urgent-care setting. In some vignettes, testing, imaging, or treatment is recommended in guidelines. In others, testing, imaging, or treatment is specifically noted as not necessary in guidelines (Table).12- 19 For each vignette, a key management decision (undergoing testing or not or prescribing or not) was identified a priori by the physician team.
Sixty-seven individuals who served as standardized patients were recruited from the following 2 groups: (1) actors with prior training as standardized patients at the University of California, San Francisco, Kanbar Center for Stimulation, Clinical Skills and Telemedicine Education standardized patient program and (2) students currently enrolled in an accredited US medical school. Each vignette was first taught to experienced standardized patient trainers from the Kanbar Center. These trainers then instructed the standardized patients (while being observed by ≥1 of the study physicians) in the technical and clinical aspects of the study, including typical manifestations of the condition, standardized interaction techniques, and scoring criteria for study variables. During training, standardized patients role-played characteristic encounters, and checklists were scored for data collection reliability. Particular focus ensured that the standardized patients knew not to suggest any specific diagnoses, tests, or treatments. All standardized patients exceeded the prespecified 90% accuracy threshold by the end of the training and were audited by supervising physicians throughout the study.
Standardized patients performed the virtual visits from May 1, 2013, to July 30, 2014. Details of each encounter were recorded immediately after the virtual visit. Using a data collection form, standardized patients recorded physician and company names, whether specific elements of the history and physical examination were performed, the diagnosis (if a diagnosis was named), tests ordered, and prescriptions provided. The websites were paid their usual charges for the study virtual visits. If the websites did not respond to encounter requests in a timely fashion (1-2 days), the standardized patients were unable to complete the visit. Therefore, the completed numbers of cases varied by website (eTable 1 in the Supplement).
The 3 primary outcomes for each virtual visit in our study were performance of a complete history and physical examination, the correct diagnosis, and adherence to the relevant guideline in the key management decision. The completeness of the history and physical examination was scored using items referenced in guidelines as important for the diagnosis and/or treatment decision. Because we anticipated that not all physical examination maneuvers could be performed remotely, we gave credit on the physical examination if the physician sought the relevant information by asking the patient to perform the maneuver. For example, in the case of streptococcal pharyngitis, we gave credit for assessing for tonsillar exudate if the physician asked the patient to look in the back of his or her own throat.
Naming the correct diagnosis was coded as a binary variable, based on whether the physician told the standardized patient the correct diagnosis. For each diagnosis, the physician would be given credit for naming the correct diagnosis if he or she mentioned any one of a list of terms (eTable 2 in the Supplement). For example, if a physician gave the patient a more general diagnosis (UTI for recurrent UTI), they were given credit for a correct diagnosis. If, however, the physician diagnosed what was actually viral pharyngitis as a bacterial infection, the diagnosis was considered incorrect. Standardized patients were instructed not to ask physicians for a diagnosis. If a physician gave no diagnosis at all, this was considered a failure to name the correct diagnosis.
Adherence to guidelines was coded as a binary variable for each visit. The score was based on whether the physician’s key management decision agreed with the relevant guideline (Table).
We evaluated the variation in performance among individual companies at different points along the performance spectrum by grouping conditions based on the mean percentage of adherence to key management decision guidelines across all companies. We assessed the between-company variation within these groups. We used 3 pairs of conditions (the 2 conditions with the highest overall performance, the 2 conditions with the lowest overall performance, and the 2 conditions with intermediate performance). We also assessed whether performance on the 3 primary outcomes was associated with the communication modality (videoconference vs telephone vs webchat).
The secondary outcome was the frequency of referrals to a local brick-and-mortar health care provider for an in-person visit or test. Standardized patients recorded the rationale the physician offered for any such referral.
Data were analyzed from July 1, 2014, to September 1, 2015. We used mixed-effects models to account for clustering by condition, website, and physician. The condition and website effects were treated as fixed effects, whereas the physician-level effects were treated as random. For binary outcomes, we used a mixed-effects logistic regression model. For continuous outcomes, we used a mixed-effects linear regression model. The rates for all binary outcomes are presented as the predicted marginal probabilities, and the rates for all continuous outcomes are presented as the predicted marginal means from these models using the margins command. We used STATA (version 12.1; StataCorp) to perform all statistical analyses.
Sixty-seven standardized patients completed 599 virtual visits (Table) with 157 internal medicine, emergency medicine, or family practice physicians. The median number of visits per site was 77 (interquartile range, 63.5-88.5). These included 372 videoconference encounters (62.1%), 170 telephone encounters (28.4%), and 57 webchat encounters (9.5%). The median number of visits per physician was 1 (interquartile range, 1-4).
Virtual visit physicians asked all recommended history questions and performed all recommended physical examination maneuvers in 417 visits (69.6%; 95% CI, 67.7%-71.6%). Physicians named the correct diagnosis in 458 visits (76.5%; 95% CI, 72.9%-79.9%). Physicians gave the wrong diagnosis in 89 visits (14.8%; 95% CI, 12.0%-17.9%) or provided no diagnosis in 52 visits (8.7%; 95% CI, 6.6%-11.3%).
Completeness of histories and physical examinations and the correct diagnosis varied by condition and virtual visit company (P < .001 for the statistical significance of the variation by condition and by company; Figure 1, Figure 2, and eTable 3 in the Supplement). For low back pain, 72 of 90 histories and physical examinations (adjusted for condition, 80.0%; 95% CI, 74.2%-85.8%) were complete, compared with only 58 of 101 (adjusted for condition, 57.8%; 95% CI, 52.3%-63.4%) for ankle pain. The rate of physicians naming the correct diagnosis also varied by condition, from 110 of 121 (adjusted for condition, 91.3%; 95% CI, 86.1%-96.5%) for recurrent UTI to 75 of 105 (adjusted for condition, 70.9%; 95% CI, 61.0%-80.2%) for rhinosinusitis.
When evaluated by company and adjusted for condition, the percentage of virtual visits with complete histories and physical examinations ranged from 51.7% to 82.4%. The percentage of virtual visits with correct diagnoses named ranged from 65.4% to 93.8%.
Across all conditions at all companies, key management decisions were guideline adherent in 325 visits (54.3%; 95% CI, 50.2%-58.3%). We found substantial variation among conditions and among companies (P < .001 and P = .009, respectively; Figure 3). For example, physicians ordered urine cultures for recurrent UTI in only 41 of 121 visits (adjusted for condition, 34.2%; 95% CI, 24.5%–43.8%) and guideline-recommended radiographs for ankle pain in only 17 of 101 visits (adjusted for condition, 15.5%; 95% CI, 7.9%-23.2%), whereas they (appropriately) did not order a radiograph for low back pain in 84 of 90 visits (adjusted for condition, 93.1%; 95% CI, 87.7%-98.5%). Across virtual visit companies, adjusted adherence of key management decisions to guidelines ranged from 34.4% to 66.1%.
The pattern of variation in virtual visit companies’ performance differed by condition (Figure 4). For the 2 conditions (low back pain and streptococcal pharyngitis) with the highest overall adjusted rate of adherence to guidelines (ranging among companies from 74.6% to 96.5%), no statistically significant variation in virtual visit companies’ performance was found (P = .29; Figure 4A). Similarly, for the 2 conditions (ankle pain and recurrent UTI) with the lowest overall performance (3.4% to 40.4%), we found no statistically significant variation in virtual visit companies’ performance (P = .33; Figure 4B). For the 2 remaining conditions (viral pharyngitis and acute rhinosinusitis), however, we found statistically significant variation in performance (P < .001; Figure 4C), with a range among websites from 12.8% to 82.1%.
In 83 patient encounters (13.9%), physicians made a referral to local brick-and-mortar health care providers. The most common stated reasons for referral were that the physician considered the case out of the scope of care that could be provided online or that the case required additional follow-up that could not be provided online.
For naming the correct diagnosis, videoconference (85.8%; 95% CI, 77.6%-93.9%) and telephone encounters (77.7%; 95% CI, 70.8%-84.7%) were superior to webchat (66.1%; 95% CI, 52.2%-80.1%) (P = .01). We found no significant difference between videoconference vs telephone in rate of naming the correct diagnosis (P = .26). We found no statistically significant differences between modes of communication in completeness of history and physical examination (P = .41) or adherence to guidelines (P = .66).
To our knowledge, this study is the first to evaluate variation in the quality of medical encounters provided by commercial virtual visit companies. We found substantial and statistically significant variation in guideline adherence among virtual visit companies and that the variation differs by condition. We found no significant difference in guideline adherence by mode of communication (videoconference vs telephone vs webchat). In some ways, our finding that care varies online is consistent with findings of prior literature about traditional care settings,20- 23 where extensive evidence exists of failure to follow guidelines and of variation in quality of care.
In particular, the rate of antibiotic prescribing in commercial virtual visits that we observed is similar to the rate seen nationally in traditional (in-person) settings. For instance, a prior study24 found that antibiotics were prescribed to approximately 60% of patients seen at primary care practices and emergency departments with sore throat nationally, whereas others25,26 have documented 80% prescribing to patients with upper respiratory tract infections.
Previous literature has also compared antibiotic prescribing patterns during virtual visits (with health care professionals who also offered in-person care) to visits in traditional settings. Courneya et al27 found lower prescription rates for acute bronchitis during an online interactive algorithmic visit than for traditional visits, whereas Mehrotra et al28 found that antibiotics were prescribed for presumed acute sinusitis at higher rates during virtual visits than traditional visits. Thus, antibiotic prescribing for viral illnesses appears to be an area needing attention in all care settings.
Conversely, our study demonstrates that the rates of testing in situations in which testing is not recommended may be lower in virtual care than traditional settings, but the rates of obtaining tests that are recommended are also lower. For low back pain, health care professionals in virtual visits adhered to guidelines and did not order radiographs in 93.1% of visits, whereas Rosenberg et al26 found that health care professionals in brick-and-mortar settings ordered additional imaging approximately half the time. In the case of ankle injury, prior studies17 suggest that most patients who present to brick-and-mortar practices receive imaging, whereas only 15.5% of patients were recommended imaging in our study. Avoiding additional testing is appropriate in some cases, but the uniformly low rates of testing in the virtual visits may actually reflect the logistical challenges of ordering or following up on tests to be performed near where the patient lives or concern about the out-of-pocket costs for additional testing. These hypotheses need to be tested in future studies. Because appropriate use of testing is critical to the delivery of medical care, identification and reduction of barriers to testing will be important.
The evidence we found does not appear to support the limitation of virtual visits to videoconferencing. Debate is ongoing about what modes of communication constitute a safe and appropriate telemedicine encounter.7- 9 In Texas, for example, the state medical board recently ruled to restrict telemedicine encounters to videoconference owing to safety concerns,10 and the Federation of State Medical Boards has excluded audio-only and webchat visits from the definition of telemedicine. However, with regard to guideline adherence, we found no statistically significant difference by mode of communication.
The fact that some companies can perform considerably better than others suggests that this variation could be addressed if performance leaders were willing to share their best practices with other virtual visit companies. Further research is required to evaluate whether better-performing virtual visit companies have adopted some company-wide policy or protocol(s) that increase guideline adherence.
This study has some limitations. First, we do not know whether virtual visits are superior to or inferior to in-person visits. Second, the market is evolving, and some companies had to be excluded early from the study because they ceased operations. Third, we do not know the exact market share of each company included in our study. However, the companies that remained were the most trafficked, and thus our study presumably captures the major companies in the current market. Finally, we looked at only 8 virtual visit companies and 6 conditions, and sample size is a potential limitation. However, we have adjusted for company and condition in our statistical analysis and thus clustering and colinearity are not responsible for observed differences by company, condition, or modality. Further, we studied the companies that receive the most traffic and studied their care for common conditions that they treat according to their advertising.
Our study provides the first evaluation, to our knowledge, of the variation in quality of care currently being provided during commercial virtual visits. We found a significant variation across companies and by condition. The patterns of variation we observed imply an opportunity to improve and point toward approaches to determine how to make these improvements.
Corresponding Author: Adam J. Schoenfeld, MD, Philip R. Lee Institute for Health Policy Studies, University of California, San Francisco, 3333 California St, Ste 265, PO Box 0936, San Francisco, CA 94118 (email@example.com).
Accepted for Publication: December 12, 2015.
Published Online: April 4, 2016. doi:10.1001/jamainternmed.2015.8248.
Author Contributions: Drs Schoenfeld and Davies share first authorship. Drs Schoenfeld and Marafino had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Schoenfeld, Davies, Dean, Lin, Duseja, Dudley.
Acquisition, analysis, or interpretation of data: Schoenfeld, Davies, Marafino, Dean, DeJong, Bardach, Kazi, Boscardin, Lin, Mei, Mehrotra, Dudley.
Drafting of the manuscript: Schoenfeld, Davies, Marafino, Dean, Mei, Dudley.
Critical revision of the manuscript for important intellectual content: Schoenfeld, Davies, Marafino, Dean, DeJong, Bardach, Kazi, Boscardin, Lin, Duseja, Mehrotra, Dudley.
Statistical analysis: Schoenfeld, Davies, Marafino, Boscardin, Dudley.
Obtained funding: Schoenfeld.
Administrative, technical, or material support: Schoenfeld, Davies, Marafino, Dean, DeJong, Mehrotra, Dudley.
Study supervision: Schoenfeld, Dean, Duseja, Dudley.
Conflict of Interest Disclosures: None reported.
Funding/Support: This study was supported by the Changes in Health Care Financing and Organization Program of the Robert Wood Johnson Foundation, the Innovation Fund of the Philip R. Lee Institute for Health Policy Studies, grant R25MD006832 from the National Institute of Minority Health and Health Disparities, National Institutes of Health, and the Grove Foundation.
Role of the Funder/Sponsor: The funding sources had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.