This t-distributed stochastic neighbor embedding, which is a technique that assists in visualizing high-dimensional data in 2 dimensions, depicts the deep neural network’s internal representation of the data derived from the last recurrent layer of the neural network. Each point represents a 10-minute segment of data from our validation (cardioversion) data set; orange points represent atrial fibrillation segments (precardioversion) and blue points represent normal sinus rhythm segments (postcardioversion). The neural network has largely clustered atrial fibrillation from normal sinus rhythm segments, as depicted when plotted on 2 dimensions (axes) that were chosen arbitrarily. Most points classified as normal sinus rhythm are in the upper part of the visualization, while atrial fibrillation points are separated in alternate clusters. The upper inset shows an example of raw smartwatch heart rate data associated with normal sinus rhythm, and the lower inset shows raw atrial fibrillation smartwatch data; each vertical bar represents a 5-second average heart rate color-coded by beats per minute (BPM; blue, <60; orange, 60-99; red, ≥100).
A, Receiver operating characteristic curve among 51 individuals undergoing in-hospital cardioversion. The curve demonstrates a C statistic of 0.97 (95% CI, 0.94-1.00), and the point on the curve indicates a sensitivity of 98.0% and a specificity of 90.2%. B, Receiver operating characteristic curve among 1617 individuals in the ambulatory subset of the remote cohort. The curve demonstrates a C statistic of 0.72 (95% CI, 0.64-0.78), and the point on the curve indicates a sensitivity of 67.7% and a specificity of 67.6%.
eFigure 1. Three cohorts used to develop and validate a deep neural network for detecting atrial fibrillation.
eFigure 2. Geographical distribution of remote cohort participants.
eFigure 3. Deep neural network architecture.
Customize your JAMA Network experience by selecting one or more topics from the list below.
Tison GH, Sanchez JM, Ballinger B, et al. Passive Detection of Atrial Fibrillation Using a Commercially Available Smartwatch. JAMA Cardiol. 2018;3(5):409–416. doi:10.1001/jamacardio.2018.0136
How well can smartwatch sensor data analyzed by a deep neural network identify atrial fibrillation?
In this cohort study of 51 participants presenting for cardioversion, a commercially available smartwatch was able to detect atrial fibrillation with high accuracy. Among 1617 ambulatory individuals who wore a smartwatch, those with self-reported atrial fibrillation were correctly classified with moderate accuracy.
These data support the proof of concept that a commercially available smartwatch coupled with a deep neural network classifier can passively detect atrial fibrillation.
Atrial fibrillation (AF) affects 34 million people worldwide and is a leading cause of stroke. A readily accessible means to continuously monitor for AF could prevent large numbers of strokes and death.
To develop and validate a deep neural network to detect AF using smartwatch data.
Design, Setting, and Participants
In this multinational cardiovascular remote cohort study coordinated at the University of California, San Francisco, smartwatches were used to obtain heart rate and step count data for algorithm development. A total of 9750 participants enrolled in the Health eHeart Study and 51 patients undergoing cardioversion at the University of California, San Francisco, were enrolled between February 2016 and March 2017. A deep neural network was trained using a method called heuristic pretraining in which the network approximated representations of the R-R interval (ie, time between heartbeats) without manual labeling of training data. Validation was performed against the reference standard 12-lead electrocardiography (ECG) in a separate cohort of patients undergoing cardioversion. A second exploratory validation was performed using smartwatch data from ambulatory individuals against the reference standard of self-reported history of persistent AF. Data were analyzed from March 2017 to September 2017.
Main Outcomes and Measures
The sensitivity, specificity, and receiver operating characteristic C statistic for the algorithm to detect AF were generated based on the reference standard of 12-lead ECG–diagnosed AF.
Of the 9750 participants enrolled in the remote cohort, including 347 participants with AF, 6143 (63.0%) were male, and the mean (SD) age was 42 (12) years. There were more than 139 million heart rate measurements on which the deep neural network was trained. The deep neural network exhibited a C statistic of 0.97 (95% CI, 0.94-1.00; P < .001) to detect AF against the reference standard 12-lead ECG–diagnosed AF in the external validation cohort of 51 patients undergoing cardioversion; sensitivity was 98.0% and specificity was 90.2%. In an exploratory analysis relying on self-report of persistent AF in ambulatory participants, the C statistic was 0.72 (95% CI, 0.64-0.78); sensitivity was 67.7% and specificity was 67.6%.
Conclusions and Relevance
This proof-of-concept study found that smartwatch photoplethysmography coupled with a deep neural network can passively detect AF but with some loss of sensitivity and specificity against a criterion-standard ECG. Further studies will help identify the optimal role for smartwatch-guided rhythm assessment.
Atrial fibrillation (AF) affects up to 34 million people worldwide, and patients with AF exhibit a higher risk of severe health consequences, including death and stroke.1-3 Atrial fibrillation is often asymptomatic and thus can remain undetected until a thromboembolic event occurs.1,4 Earlier detection of AF would enable the use of anticoagulation therapy known to mitigate the risk of stroke and other thromboembolic complications.1,4-9 Data from implantable cardiac monitors has demonstrated that years of monitoring may be required to detect clinically significant AF.10,11
An accessible means to continuously monitor for AF could have a valuable clinical effect by enabling AF detection, such as in patients with recurrent AF after ablation or pharmacologic cardioversion.12,13 Various types of wearable sensors, ranging from fitness trackers to smartwatches, can measure heart rate and have exhibited rapid adoption among the general population, providing an opportunity for highly scalable AF detection.10,14
Deep neural networks are a type of machine learning algorithm that has shown high accuracy in performing pattern recognition from noisy, complex inputs, such as speech15 and image recognition,16 including in medical applications, such as detection of diabetic retinopathy17 and skin cancer.18 We sought to develop and validate a deep neural network to detect AF from smartwatch data.
This study used 3 cohorts to achieve 3 goals: (1) algorithm development and training using a remote cohort; (2) external validation of AF detection in an in-person cardioversion cohort; and (3) an exploratory analysis to perform ambulatory AF detection in a third cohort. All participants provided written informed consent prior to enrollment. This study was approved by the University of California, San Francisco, institutional review board.
We used the publicly available Cardiogram mobile application (Cardiogram Inc) to access data from standard, commercially available Apple Watches (Apple Inc). Heart rate data obtained from the photoplethysmography sensor and step count data from the accelerometer were accessed by the Cardiogram application and input into a deep neural network. The frequency of heart rate recordings depended on whether the watch was in standby mode, during which heart rates were obtained every 5 minutes, or Workout mode, during which heart rates were continuously recorded and obtained as 5-second averages.
A large, geographically diverse remote cohort was used to perform deep neural network development and training (eFigures 1 and 2 in the Supplement). Data gathering and study management were performed using the Health eHeart Study infrastructure (http://www.health-eheartstudy.org), an online cohort study coordinated at the University of California, San Francisco. The Health eHeart Study enrolls English-speaking adults 18 years and older with an active email address recruited through academic institutions, lay press, social media, promotional events, and, for the current study, the Cardiogram mobile application. From March 2, 2016, to March 15, 2017, Health eHeart Study participants with an Apple Watch were sent an invitation to participate, and individuals providing informed consent were enrolled. Demographic information was self-reported.
We used a purpose-built neural network consisting of 8 layers, each of which had 128 hidden units, for a total of 564 227 parameters (eFigure 3 in the Supplement), that transformed raw sensor measurements—heart rate and step counts—into a sequence of scores corresponding to probabilities that a participant was in AF at each time interval. This deep neural network was trained sequentially using a semisupervised approach, described below. Our primary classification task was to passively detect AF from Apple Watch data while in Workout mode. The network was trained using an unsupervised approach, which we call heuristic pretraining, using Google’s TensorFlow framework.19 This training used 57 675 person-weeks of unlabeled data from the remote cohort (n = 6682) to compute several representations approximating R-R intervals (ie, time between heartbeats). Modeled after a heuristic previously used for AF detection,20 we calculated the average absolute difference between successive heart rate measurements across window sizes of 5 seconds, 30 seconds, 5 minutes, and 30 minutes.
To validate the neural network, we enrolled consecutive consenting patients with AF presenting to the University of California, San Francisco, for electrical cardioversion or pharmacologic cardioversion (using dofetilide during hospitalization with continuous monitoring) between March 24, 2016, and February 24, 2017. Patients with atrial arrhythmias other than AF at the time of enrollment, ventricular pacing, or a ventricular-assist device were excluded.
A 12-lead electrocardiogram (ECG) was obtained, and an Apple Watch (paired with an iPhone [Apple Inc] preloaded with the Cardiogram application) was applied to the participant’s wrist for at least 20 minutes in Workout mode. Patients undergoing electrical cardioversion remained supine during the study. A study coordinator time-stamped the moment of cardioversion and removed the Apple Watch after at least 20 minutes following cardioversion. Patients receiving dofetilide were monitored for conversion to sinus rhythm on cardiac telemetry, after which a 12-lead ECG and at least 20 minutes of Apple Watch data were obtained. Apple Watch data were used as inputs to the neural network, and rhythm diagnoses were determined by 12-lead ECGs overread by board-certified cardiac electrophysiologists.
Using a subset of the remote cohort, we performed an exploratory analysis for a second classification task: detecting AF from ambulatory data. In this remote cohort subset, the reference standard AF diagnosis was limited to self-reported persistent AF. The heuristic pretrained network underwent a second, supervised training phase using labels of AF or no AF obtained from AliveCor devices. The AliveCor Kardia (AliveCor Inc) is a smartphone-based device equipped with 2 electrodes that enables remote participants to obtain a single-lead ECG. The first 200 remote cohort participants who self-reported an arrhythmia diagnosis were mailed a device and instructed to use it at least once per day while wearing the Apple Watch in Workout mode; 183 (91.5%) were ultimately successfully connected. A validated AliveCor algorithm approved by the US Food and Drug Administration demonstrating high specificity for AF detection21,22 was used to label AliveCor recordings. Supervised training of the network used 6338 mobile ECGs from the 183 participants, including 625 recordings from 100 participants who exhibited AF. Inputs included more than 139 million Apple Watch sensor measurements; unlabeled inputs were retained using output masking.
For this ambulatory validation, we held out a subcohort of participants randomly selected from the remote cohort not used for algorithm development. Those with pacemakers and implanted cardioverter/defibrillators, those who reported that their AF “comes and goes,” and those with inadequate heart rate measurements (averaging 8 hours or less per day) were excluded. Participants completed online questionnaires regarding demographic characteristics and medical history. The self-reported diagnosis of persistent AF by a health care professional was the primary outcome for this exploratory analysis.
Normally distributed continuous variables are presented as means with standard deviations and were compared using t tests, and continuous variables with skewed distributions are presented as medians with interquartile ranges and were compared using the Wilcoxon rank-sum test. Categorical variables were compared using the χ2 test or Fisher exact test. Test characteristics with 95% CIs and receiver operating characteristic curves with C statistics were calculated using standard techniques. Unadjusted and multivariable logistic regression was used to determine odds ratios.
In the validation cardioversion cohort, we estimated that 76 participants would provide a margin of sampling error of 5% for a sensitivity and specificity of 95%. We used an early stopping rule for efficacy, testing the joint null hypothesis that either the sensitivity or specificity would be less than 85%. The P value for the joint test was calculated by comparing the minimum of the sample sensitivity and specificity to its simulated null distribution, with both parameters set to 0.85. We planned to compare the resulting 2-tailed P values after 50 participants were enrolled and, if needed, after 76 participants to critical α values (as specified by the O’Brien-Fleming stopping rule23) of .0031 and .0490, respectively. Otherwise, for all other aspects of the study, a 2-tailed P value < .05 was considered statistically significant.
For comparison with the deep neural network, we concurrently analyzed input heart rate data using 2 statistical techniques used previously for AF detection: root mean square of successive difference (RMSSD),20 reflecting heart rate variability, and Shannon entropy (ShE),24 which characterizes rhythm complexity. Odds ratios were computed in R, version 3.4.0 (The R Foundation), and area under the receiver operating characteristic curve was computed using the scikit-learn, version 0.19.1, package in Python (Python Software Foundation).
We enrolled 9750 Health eHeart Study participants with an Apple Watch who downloaded the Cardiogram mobile application, completed an intake survey, and linked their Cardiogram account (eFigures 1 and 2 in the Supplement). Table 1 shows the baseline demographic characteristics of the remote cohort. Those with AF were more likely to be older, male, and white and were more likely to have completed more education and exhibited concurrent clinical comorbidities.
After 50 patients were enrolled, the early stopping rule was applied for efficacy given evidence that neither sensitivity nor specificity was less than 85% (P < .001). One additional patient was enrolled during this interim analysis and was included in the final data set. Of these 51 patients, 43 (84%) underwent electrical cardioversion (all successful) and 8 (16%) converted with dofetilide (Table 2). Fifty-three total hours of Workout mode data were obtained.
Figure 1 shows a t-distributed stochastic neighbor embedding for AF and sinus rhythm.25 The C statistic for AF detection for the deep neural network was 0.97 (95% CI, 0.94-1.00; P < .001) (Figure 2A). At an operating threshold set to the high sensitivity of 98.0%, the specificity was 90.2% to detect ECG-diagnosed AF. In comparison, the C statistic for RMSSD using the same data was 0.91 (95% CI, 0.85-0.97) and for ShE was 0.86 (95% CI, 0.79-0.93). Normalizing all tracings by heart rate did not meaningfully reduce the accuracy of the neural network, yielding a C statistic of 0.96 (95% CI, 0.94-0.97).
In this ambulatory subcohort, 1617 participants were available, 64 (4%) of whom reported having persistent AF. From these participants, 18.5 million of 27.9 million heart rate measurements (66.3%) were obtained while in Workout mode. The C statistic for the deep neural network to detect an individual with persistent AF was 0.72 (95% CI, 0.64-0.78) (Figure 2B). The C statistics for RMSSD using the same data was 0.45 (95% CI, 0.38-0.52) and for ShE was 0.48 (95% CI, 0.40-0.55). Using an operating cut point for the neural network that maximizes the sum of sensitivity (67.7%) and specificity (67.6%), those with neural network–predicted AF had an unadjusted odds ratio of 3.95 (95% CI, 3.02-5.17; P < .001) for persistent AF. After adjusting for age, sex, race/ethnicity, hypertension, diabetes, heart failure, and coronary artery disease, neural network–predicted AF remained significantly associated, with an adjusted odds ratio of 1.98 (95% CI, 1.48-2.65; P = .02) for persistent AF. Table 3 shows the performance characteristics of the algorithm in both validation cohorts. In the setting of the low (4%) AF prevalence in the remote cohort, the positive predictive value was low.
We demonstrate that a commercially available smartwatch can passively detect AF using a readily available mobile application using a deep neural network. In external validation using the standard 12-lead ECG as the reference, algorithm performance achieved a C statistic of 0.97. The passive detection of AF from free-living smartwatch data has substantial clinical implications. Importantly, the accuracy of detecting self-reported AF in an ambulatory setting was more modest (C statistic of 0.72). Although the deep neural network’s AF classification in the exploratory analysis exhibited higher odds ratios than other measured AF risk factors, this proof-of-concept experiment likely demonstrates the challenges of accurately detecting ambulatory arrhythmia among constantly mobile individuals in natural environments.
Atrial fibrillation is the leading cause of stroke, and its detection is difficult because of its often asymptomatic nature and paroxysmal frequency.1,4,10,11,14 Readily accessible means to detect and screen for silent AF are needed. Even though monitors with automated capabilities, such as implantable loop recorders, can be used to detect AF, they are invasive, expensive, and inconvenient.26,27 The ideal instrument for AF detection would be noninvasive and provide real-time, accurate AF detection in a passive fashion—specifically, not requiring the user to remember to perform some action and not limited to any one snapshot in time. Smartwatches are well positioned to accomplish these goals in a cost-efficient and resource-efficient fashion. Wearable technology has shown a steady increase in global usage,28 and the smartwatch, most popular among all wearable sensors, is projected to reach 55 million global shipments by 2020.29
Prior efforts to automatically detect AF among free-living participants have predominantly used ambulatory blood pressure monitors,30 although some recent studies21,31 have used smartphones and wearable devices. Two studies20,21 showed that AF can be detected using a photoplethysmography waveform obtained via the iPhone camera. Similar to the limitations of ambulatory blood pressure cuffs, smartphone-based data collection is limited in requiring active participation from the participant (dependent on user adherence) and by the episodic nature of data obtained. A Samsung Simband (Samsung) exhibited high sensitivity and specificity for AF detection among 46 individuals.32 However, validation in an external cohort was not performed, and these findings are tied to a single stand-alone device used for research that is not commercially available. To our knowledge, our study represents the first to use a deep neural network to passively detect AF using smartwatch data.
When tested against 12-lead ECG–diagnosed AF in our validation experiment, the deep neural network outperformed 2 conventional methods for the detection of AF.20 Although the mean heart rate may differ between those in AF and sinus rhythm, our results were not meaningfully changed after heart rate data were normalized. This external validation demonstrates that the neural network can passively detect AF from smartwatch data with excellent performance characteristics obtained in sedentary individuals captured at high temporal resolution (ie, Workout mode). Even within these constraints, public health implications for AF screening may be broad because periods of sleep can provide long, uninterrupted periods of sedentary data, and it is technically feasible to enable high temporal resolution data collection at scheduled periods.
In light of the relative frequency of subclinical AF detected by implanted devices among patients at risk of stroke,33-35 it is very likely that the widespread use of an accurate algorithm to detect AF among the large population continuously wearing smartwatches would result in a substantial increase in new AF diagnoses. While there may be increased costs associated with the care of those patients, the potential reduction in stroke could ultimately provide cost savings.
Several factors make detection of AF from ambulatory data an inherently more difficult classification task: (1) the predominance of ambulatory heart rates are represented by non–Workout mode data sampled every 5 minutes, which translates to a significant loss of temporal resolution; (2) the variability in heart rate in an ambulatory population is significantly increased by a wide range of activities; and (3) heart rate sensor noise is increased with movement. Because the remote cohort data set was limited to using self-reported diagnoses of persistent AF rather than ECG-diagnosed AF as in our validation cohort, we considered this analysis exploratory. We did not know how many participants were actually in AF at the time the Apple Watch measurements were taken. Acknowledging these limitations, when using conventional algorithms to analyze the ambulatory data (RMSSD and ShE), C statistics were similar to predictors based on chance alone. In contrast, even after adjustment for conventional risk factors, the neural network’s algorithm classified individuals with persistent AF. As technology develops, expected improvements in sensors and battery life will likely improve the temporal resolution of ambulatory heart rate measurements, enabling enhanced algorithm performance for disease detection.
In contrast to methods used in other recent medical applications of deep neural networks,17,18 we developed a semisupervised heuristic pretraining procedure that is not dependent on manual physician annotation of training data. Here, the deep neural network automatically learned features representative of a heuristic relevant to our specific task of AF detection, namely an approximation of the R-R interval.20 Deep neural networks pretrained using this approach can subsequently be trained in a supervised manner using a relatively small amount of labeled data to perform more specialized classification tasks, exemplified in our exploratory analysis. Deep neural networks are data-hungry algorithms, typically requiring tens of thousands of labeled examples for optimal performance.17 However, labeled data at this scale is not readily available for most medical applications and is costly to label when it is possible to obtain. Our semisupervised method can be generalized to train data-efficient deep neural networks for other medical tasks, requiring significantly less labeled data than previously required.
Our study has several limitations. Many participants initially contacted for the study did not complete surveys and link Cardiogram accounts, which may result in selection bias. This would likely not invalidate positive associations but would limit generalizability. All participants already owned a smartwatch or, among the patients undergoing cardioversion, had a coordinator provide assistance; therefore, it is possible these results would not generalize to less tech-savvy individuals. However, as demonstrated by the growing majority that now use the internet and own smartphones,28,36 the regular use of smartwatches may become more mainstream with time. The training using the AliveCor devices relied on an automated algorithm. Although the algorithm is approved by the US Food and Drug Administration and has exhibited reasonable accuracy in previous studies,21,22 it is possible that manual overreads may have enhanced algorithm performance in the exploratory ambulatory analysis. These data focused on individuals with a known history of AF. Therefore, we did not demonstrate an ability to identify a new diagnosis of the disease. Finally, despite the excellent test characteristics observed among sedentary patients undergoing cardioversion, the modest performance in the ambulatory scenario, a context more representative of the ultimate application of this technology, suggests that these data should be primarily interpreted as a proof of concept.
In conclusion, we showed that a commercially available smartwatch can passively identify AF against the criterion-standard 12-lead ECG among sedentary individuals when heart rate data are collected at a high temporal resolution. We also present an exploratory analysis demonstrating that our deep neural network substantially outperforms standard techniques to detect self-reported persistent AF from ambulatory data, albeit with modest accuracy in these free-living natural environments. Given the broad and growing use of smartwatches and ready accessibility of downloadable mobile applications, this approach may ultimately be applied to perform AF detection at large scale, ultimately leveraging common wearable devices to guide AF management and rhythm assessment.
Accepted for Publication: January 17, 2018.
Corresponding Author: Gregory M. Marcus, MD, MAS, Division of Cardiology, Department of Medicine, University of California, San Francisco, 505 Parnassus Ave, M1180B, San Francisco, CA 94143-0124 (firstname.lastname@example.org).
Published Online: March 21, 2018. doi:10.1001/jamacardio.2018.0136
Author Contributions: Drs Tison and Marcus had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. Drs Tison and Sanchez contributed equally to this manuscript as cofirst authors.
Study concept and design: Ballinger, Marcus.
Acquisition, analysis, or interpretation of data: All authors.
Drafting of the manuscript: Tison, Sanchez, Ballinger, Singh, Fan, Marcus.
Critical revision of the manuscript for important intellectual content: Tison, Sanchez, Ballinger, Singh, Olgin, Pletcher, Vittinghoff, Fan, Gladstone, Mikell, Sohoni, Hsieh.
Statistical analysis: Tison, Sanchez, Ballinger, Singh, Vittinghoff, Hsieh.
Obtained funding: Ballinger, Marcus.
Administrative, technical, or material support: Tison, Ballinger, Singh, Fan, Marcus.
Study supervision: Marcus.
Conflicts of Interest Disclosures: All authors have completed and submitted the ICMJE Form for Disclosure and Potential Conflicts of Interest. Dr Tison is an advisor to Cardiogram Inc. Messrs Ballinger, Singh, Sohoni, and Hsieh are employees of Cardiogram Inc. Dr Marcus has received research funding from Medtronic and Cardiogram Inc, is a consultant for Lifewatch and InCarda, and holds equity in InCarda. No other disclosures were reported.
Funding/Support: This study was funded in part by Cardiogram Inc.
Role of the Funder/Sponsor: The funder contributed to the design and conduct of the study, collection and analysis of the data, and review of the manuscript. The funder did not play a direct role in the management and interpretation of the data, preparation or approval of the manuscript, and decision to submit the manuscript for publication.