AF, the most common clinically relevant cardiac arrhythmia, is expected to affect >5 million Americans by 2050.1 The diagnosis is frequently associated with other cardiovascular comorbidities, including hypertension, obesity and sleep apnoea, and AF is an important risk factor for stroke, heart failure and overall mortality.2–4 Although treatment is available, many people remain untreated because the diagnosis can be subclinical or elusive.5,6
Significant efforts are geared towards improving AF screening, early detection and treatment.7–9 Among these efforts, artificial intelligence (AI) technology has demonstrated clinical utility and promise.10 In recent years, AI has enabled wearable technology to aid in the passive detection of cardiac arrhythmias and enhanced the diagnostic power of ECG.11–13 AI has improved the prediction and detection of AF and may ultimately help direct treatment.
In this review, we delineate how AI has enhanced the prediction and detection of AF, both inside and outside of the medical setting (Figure 1). We also highlight how AI has prognosticated treatment success and enhanced intraprocedural techniques. Finally, amidst the great success of AI application in the diagnosis and treatment of AF, we outline limitations and critical considerations of these newly applied technologies, identifying what barriers and future directions exist in this rapidly growing field of AI-enhanced medical care.
Artificial Intelligence Prediction of AF from Clinical Characteristics
Various patient clinical characteristics (e.g. age, sex, medical comorbidities, etc.) are known to predict the risk of developing AF. Multivariable prediction models, such as the CHARGE-AF score, integrate these risk factors to generate risk estimates and have been extensively validated.14–16 These models may be further refined by applying AI and machine learning (ML) approaches to electronic health record (EHR) data, although the improvement in model performance is variable.17 For example, a large study of 2 million patients from the University of Colorado used an ML model to evaluate >200 health record features potentially associated with AF.18 From this process, an AF-detection model demonstrated an area under the curve (AUC) of 0.79 for the detection AF over 6 months of follow-up, but this performance was comparable with several recent non-AI-based clinical AF risk scores with AUCs ranging from 0.71 to 0.78.18–21
In contrast, two larger studies from the UK used similar ML techniques to produce models that outperformed the CHARGE-AF score (AUC 0.827 versus 0.725, respectively).22,23 The authors speculated that the improved model performance could have been related to higher-quality data sets with longer follow-up or to inclusion of various analytical approaches (e.g. considering time-varying covariates).22–24 Investigators are interested in not only predicting AF, but also describing its natural history over time.25 These AI-EHR-based prognosticating tools may eventually find utility in clinical practice for refining screening and treatment protocols.
Artificial Intelligence-enhanced ECG Prediction and Detection of AF
The AI-enhanced ECG (AI-ECG) has recently offered a new pathway for predicting and identifying AF, even while patients are in normal sinus rhythm.26 Due to the slowly progressive and subtle nature of cardiac structural changes, and their associated manifestations on ECG, AI algorithms theoretically are well suited to detect these findings, which might go unnoticed by untrained and trained human observers.27
An initial AI-ECG algorithm was developed from nearly 650,000 ECGs at the Mayo Clinic to predict paroxysmal AF on ECGs from patients in sinus rhythm.26 Single AI-ECG recordings identified patients with underlying paroxysmal AF with an AUC of 0.87, which increased to an AUC of 0.90, when all ECGs during the first month of each patient’s ‘window of interest’ were included (e.g. 31 days prior to first recorded AF event).26 A follow-up study analysed the value of AI-ECG in the prediction of future AF in patients without a prior diagnosis among participants in the Mayo Clinic Study of Ageing, and compared AI-ECG future AF prediction with the CHARGE-AF score.28 When an AI-ECG predicted a >50% probability for AF, the 2- and 10-year cumulative incidence of AF was 21.5% and 52.2%, respectively, which was similar in predictive value to CHARGE-AF.28
A separate deep-learning AI-ECG risk model predicted 5-year AF-free survival with an AUC of 0.823 in three independent cohorts (n>80,000). Interestingly, corresponding saliency maps identified the P-wave segment as the algorithm’s region of interest – a biologically plausible finding.29 Wu et al. used the Physionet database of 100 patients (50 with underlying paroxysmal AF) with 30-minute recordings during sinus rhythm to develop AI models with three different learning methods, and compared them.30 Bagging, AdaBoost and stacking were compared, and stacking was found to have the best AUC of 0.911 (all three AUCs range 0.88–0.91), although the model has not undergone external validation.30
In addition, the AI-ECG for AF prediction was studied in the application of patients with embolic stroke of unknown source, where underlying, silent AF is frequently the suspected culprit.31 While AI-ECG-determined AF probability did not differ in the group with embolic stroke of unknown source compared with patients with strokes from other mechanisms, patients with embolic stroke of unknown source who did show a high AI-ECG probability of AF demonstrated a significantly higher future detection of AF on ambulatory monitoring.31
More recently, the clinical application of AI-ECG for the detection of AF was demonstrated in a prospective interventional trial. Investigators combined AI-ECG probability of paroxysmal AF alongside EHR-based patient characteristics to determine which patients would benefit most from prolonged ECG monitoring to detect AF.32 Patients who qualified for the study had no prior AF diagnosis, with both elevated probability of AF on sinus rhythm AI-ECG and relevant clinical risk factors indicating anticoagulation need if AF were diagnosed (e.g. elevated CHA2DS2-VASc score). Notably, clinical risk components were identified through both structured and unstructured EHR data via natural language processing.33 Eligible patients who were at elevated AI-ECG risk were fivefold more likely to have a diagnosis of AF on 30-day monitoring compared with their ‘low AI-ECG probability for AF’ control counterparts.32 This study implicates the significant, potential impact of clinically applied AI for targeted AF screening strategies.
Overall, there is a growing body of evidence demonstrating the effectiveness of AI-ECG in detecting underlying paroxysmal AF patients presenting in normal sinus rhythm and with associated clinical risk factors. With these recent groundbreaking advances in clinically applied AI, one can expect to see applied AI-ECGs in clinical practice complimenting the traditional methodology of care.
Artificial Intelligence to Detect AF by Photoplethysmography
While a standard ECG is the most widely accepted methodology of AF diagnosis, it is now commonplace for patients to report AF diagnosis from their Apple Watch or other wearable device. Photoplethysmography (PPG) has allowed for real-time, automated detection of AF in non-clinical settings from wearable devices and smartphones.34,35 Notification of potential AF from these consumer-based PPG technologies (Apple, FitBit, Huawei Watch, etc.) is based on several temporal and morphological features to differentiate the arrhythmia from sinus rhythm.
The Apple Heart study used a decentralised recruitment process to enrol patients in a watch-based remote-monitoring study.11 In this study, participants downloaded a study app that would subsequently passively monitor the regularity of heart rate via PPG. Patients with irregular pulse notifications would subsequently be considered for remote monitoring to formally monitor and potentially diagnose AF. A total of 34% of patients who were notified of having an irregular pulse and underwent testing were diagnosed with AF, and 84% of these irregular pulse notifications were indicative of AF on ECG.11
The Huawei Heart study similarly exhibited the strength of wearable PPG technology to identify AF in a decentralised study.12 Of participants who received notification of ‘suspected AF’, 87% (n=227) were confirmed to have AF with a positive predictive value approaching 92%.12 Both of these significant studies demonstrated the scalability of AF screening using PPG via consumer-based wearable technology, allowing for a streamlined pathway for clinical data from beyond the clinic walls.
These algorithms have evolved from the use of simpler regression models and feature extraction to more complex ML models that include support vector machines, decision trees and deep learning models.36,37 While more advanced deep learning techniques learn and extract relevant features automatically compared with simpler techniques (which required initial manual data extraction), deep learning approaches require a large amount of data to train and extract features to make decisions. It must be noted that more mainstream algorithms, such as those used in the Apple Watch study and Huawei Heart study, remain proprietary and undefined.12,38
These nuanced technologies are promising, but there are certain factors that must be considered before routine application and use. A problem that surrounds the use of PPG data is significant concerns with artefacts and noise that may contribute false positive test results.39 In turn, some false positives may also be seen as a result of another arrhythmia, such as atrial flutter or atrial multifocal tachycardias, which may not be differentiated on PPG-based devices.
One must also consider the acceptance of these innovative tools by a younger, more tech-savvy population, implicating future research efforts to focus on participants with advanced age at higher arrhythmogenic and thrombogenic risk to assess the accuracy and validity of these novel approaches. An example of this age-targeted evaluation is seen in the eBRAVE-AF trial, which tested PPG wave assessment comparing a smartphone app against routine symptom-based screening in patients aged 50–90 years.35 Digital app-based screening in this population significantly increased the diagnosis of clinically relevant AF compared with usual care (OR 2.12 in Phase I; 2.75 in Phase II).40
With continuous refinement of technology, and validation efforts in clinically relevant populations, novel solutions will improve the performance of these algorithms over time, suggesting the possibility of widely accepted, large-scale AF screening from consumer-based PPG products in the near future.
Artificial Intelligence for AF for Risk Stratification
In the setting of a new AF diagnosis, AI/ML techniques may also offer the opportunity to stratify patients for outcomes, such as stroke risk or the expected success of cardioversion. Investigators in Korea extracted 65 features from 750,000 patients with AF to develop a deep-learning model to determine the risk of ischaemic stroke. This model was subsequently tested on 150,000 patients and demonstrated an AUC of 0.73 for the prediction of ischaemic stroke, as compared with CHA2DS2-VASc, which had an AUC of 0.65.41
Similarly, there have been several efforts to use AI/ML to identify potential clinical characteristics that may predict the success of cardioversion. Vinter et al. evaluated a sex-specific model for the success rate of electrical cardioversion with both ML and logistic regressions.42 Several factors, including comorbidities, echocardiogram information and medications, were included in the model (n=332 women and n=790 men); however, each analysis demonstrated only modest predictive values, with an AUC between 0.56 and 0.6 for both women and men.42
A separate effort validated an ML model to predict cardioversion success from patients referred for electrical cardioversion (n=429), and compared the algorithm predictions with the CHA2DS2-VASc and HATCH scores, which have both been shown to be predictive of AF recurrence following cardioversion in a few studies.43–45 The results from this study were mixed. The ML models were able to better predict 6-month AF recurrence, 6-month rhythm control and success of pharmacological cardioversion better than the CHA2DS2-VASc and HATCH scores; however, the model was less favourable than these scores at predicting electrical cardioversion success. While the results of this study require external validation and further sharpening prior to clinical application, the study did report ‘feature importance’ to help determine the ML-based value of pertinent patient clinical characteristics in prediction strategy.43
These AI/ML risk stratification and treatment success investigations will be a crucial clinical tool following further validation, particularly as the medical management of AF is a multistep, shared-decision pathway to help mediate disease risks and treatment recommendations.
Artificial Intelligence for AF Using Intracardiac Signals
Implantable cardiac devices often misclassify atrial flutter, atrial tachycardia or even premature atrial ectopic beats as AF based on the rate or irregularity of the intracardiac signals. Rodrigo et al. created a deep learning algorithm to distinguish AF from other tachycardias based on intracardiac electrogram (EGM) features.46 This deep learning algorithm demonstrated excellent performance, with an AUC of 0.95–0.97, depending on unipolar or bipolar EGM, compared with traditional single EGM features, which demonstrated an AUC of 0.67–0.75. These results support the continued evaluation of deep learning as a tool to better identify AF from other arrhythmias using EGMs from cardiac implantable electronic devices.
Application of neural networks in electro-anatomic cardiac mapping has similarly been explored. In a study by Lebert et al., a convolutional neural network (CNN) was used to predict phase maps, rotor positions and phase singularities with accuracy approaching 95%. According to this study, the model was less limited by noise and more generalisable across different species compared with classic phase mapping techniques.47 In another study, a deep learning-based approach was applied to unipolar EGM signals to automate focal source detection as targets for ablation with performance similar to practising cardiologists.48 ML algorithms have also been tested to predict the success of catheter ablation procedures using intracardiac EGM and a composite of EGM, ECG and clinical features in a fusion model with strong performance (AUC approaching 0.86).49 It is clear the novel application of neural networks into electrophysiological practice may complement and enhance patient care at the treatment/interventional level.
Artificial Intelligence to Detect AF by Other Means
Chest radiography is a simple and frequently used examination tool in clinical practice. The most common cardiac pathology seen on a chest radiograph is cardiac enlargement, which may be a non-specific marker of underlying cardiac disease.50 Atrial size can be seen on chest radiography, and atrial enlargement could be a radiographic marker of AF.51 However, as AF is typically diagnosed by ECG, diagnosing arrhythmias by radiographic means remains challenging.52 On a posteroanterior image view, the left atrium is located at the most dorsal part of the heart and is often overshadowed by other cardiopulmonary anatomy. However, this anatomic pattern may be useful for AI-based detection of AF, as it is commonly caused by pathological changes involving the areas of the pulmonary veins and the left atrium.53
Matsumoto et al. studied patients with and without AF with corresponding posteroanterior chest radiograph view images (n=13,868). An AI model for determining AF by chest radiographs was developed using deep learning, which demonstrated an AUC of 0.81 (95% CI [0.78–0.85]). Corresponding saliency maps visually indicated that the AI model paid most attention to the upper left segment of the heart shadow, consistent with the left atrial region.52 However, this AI system was not entirely transparent (‘glass box’), as the AI was unable to describe if the area of interest was specifically the left atrium or other structures/features in that area (e.g. descending aorta, cardiac border, etc.). It was also noted by the investigators that the model was much more sensitive in detecting permanent AF compared with paroxysmal AF, likely as a result of varying cardiac anatomy between these AF classifications.52 As mentioned aptly in the writing, this type of AI-based radiographic diagnosis of AF requires external and prospective validation before true clinical application.
Another novel CNN-based AF detection method was described by Yan et al., who used facial PPG signals from digital camera images.54 In this proof-of-concept study, 20 patients with permanent AF were matched with sinus rhythm control patients, and five patients at a time were filmed for 1 minute using a digital camera while sitting 1.5 m away from the lens. Extracted facial PPG from these videos were processed by the previously trained deep CNN to detect AF from PPG waveforms. There was significant agreement between facial CNN prediction of AF and patient ECGs, with an AUC of 0.99, and all five patients’ rhythms were simultaneously correctly identified in nearly 80% of the total videos (n=51/64).54 This work is particularly novel, as the CNN was able to accurately detect AF with simultaneous patient analysis without physical contact.
In each of these studies, AI and CNNs significantly enhance tools that are commonly used in medical practice (chest X-ray) and everyday life (digital videography) to identify clinically relevant AF. These alternative methods will require ongoing testing; however, the consequences of incidental discovery of AF by these methodologies could enhance and guide non-invasive, touch-free screening strategies.
Limitations: Accuracy, Pitfalls and Perceptions
While there has been tremendous success in the application of AI to AF screening, diagnosis and treatment, these systems have their own set of pitfalls and limitations.
Although many of these AI algorithms undergo testing and training in large populations, there is typically a lack of racial diversity in these training/testing cohorts, which may limit use for individuals typically marginalised by the medical system until demographic-specific validations take place.55 On a similar note, while AI applications in themselves are relatively inexpensive, the technology to which the AI is applied (e.g. Apple Watch) may not always be affordable to those who already experience limited access to medical care.56 However, with the potential acceptance of AI-applied technology in medical practice, one could hope these AI-enhanced technologies will be viewed as vital and cost-effective medical tools, allowing for greater access and cost coverage in the future.
Our team, identifying these potential racial and economic barriers to AI-based health technology, performed the first-of-its-kind proof-of-concept AI-ECG study in a community-based participatory research effort.57 Twelve-lead ECGs, collected as part of a church-based, African-American heart health effort, were analysed by our CNNs for AI-ECG determination of age, patient sex and heart failure, validating that our AI-ECG performed well with ECGs collected outside the clinic walls, in a classically underrepresented and medically underserved cohort.57,58 This is the first step that could eventually help mediate healthcare disparities, allowing for cardiac disease screening at the bedside (or via portable/wearable technology) and identifying individuals who may be at elevated need for expedited specialist referral in resource-limited areas.
It must be carefully noted that while many of these algorithms demonstrate excellent performance at the institutions in which they were developed, there must be significant effort spent to externally validate the algorithms at other healthcare systems and environments before widespread application. Our team has worked on external validation efforts with the various AI-ECG algorithms developed at our centre, and we aim to move forward with external validation of our AI-ECG for AF in the near future to thoroughly assess its validity and widespread applicability.59
While AI applications are becoming increasingly more transparent, many systems in clinical practice still operate from a ‘black box’ or ‘grey box’ model, where little to no explanation is offered behind AI-based results.60 For example, the AI-enhanced ECG to predict AF from a sinus rhythm ECG will give a probability based on a CNN analysis trained and tested on >600,000 ECGs.26 However, the current results from this clinical tool is a simple probability (%) that falls above or below a previously established diagnostic threshold for further testing. There have been strides using saliency mapping and other tools to highlight features (in this specific example, ECG features), which give better insight into the AI rationale, although it may not offer a complete explanation.61 There is a continued push for ‘glass box’ AI, which provides rationale or explanation for AI-based decisions, as this would also potentially provide insight for clinicians to identify novel approaches to disease treatment otherwise previously unidentified (e.g. subtle ECG markers for disease otherwise overlooked via general interpretation methodology).62
It must be considered that the majority of the studies mentioned, including our own, focus on performance with respect to discriminative power (e.g. AUC/sensitivity/specificity etc.). However, the calibration of the models being analysed is infrequently reported or available. Moving forward with external validation and application of models in varying environments (e.g. more confounding factors, varying disease prevalence), reporting calibration alongside discriminatory testing must be considered, particularly as algorithmic adjustment may be necessary for varying environments.63
Similarly, these models are always tested on independent samples unrelated to the training/testing data sets (establishing internal validity and guarding against overfitting), and, increasingly, these models are being tested in external data sets (for external validation). In our own work, we aimed to use a simple model with raw ECG features, absent of manual ECG feature selection, on very large cohorts for training and testing to reduce the possibility of overfitting. AI/ML algorithms using smaller data sets or complex feature extraction must undergo thorough validation and calibration to investigate proper statistical fit.
Given these known limitations of AI, there must be considerations for the patients at the receiving end of these therapies. A recent study evaluated patient perceptions of AI application in medical care, which revealed mixed perceptions of excitement and concern.64 Study participants share similar excitement to those developing and implementing these technologies because multifaceted AI technologies may help advance the diagnoses and treatment of diseases, as we exemplify in this review. However, participants aptly remark on the safety and accuracy of these technologies with particular focus on AI data integrity, future autonomy in decision-making, clinically appropriate AI application and expected physician oversight of implemented AI technologies.64
Along these lines, physicians’ perceptions of AI-based technology in healthcare are mixed. While studies have highlighted a potential improvement in diagnostic efficiency and a reduction of provider workload in some practices, the potential risk of misdiagnosis, development of operator dependence and downstream costs or increases in healthcare usage associated with AI remain present concerns with this technological jump.65,66 With the continued growth and development of AI-based tools in healthcare, it is vital these pertinent perceptions and apprehensions be considered before wide-scale use. Similarly, while practising evidence-based medicine with the incorporation of AI, there is still the need for continued validation, checks/balances and calibration as required in each of these AI systems, particularly if there is suspected model/data drift or shift over time.67
With the significant advances in AI/ML technology application for arrhythmia screening and diagnosis, therein lies the question of how does one pragmatically incorporate this new technology into clinical practice.
At the Mayo Clinic, each standard 12-lead ECG is processed through our various AI-ECG CNNs to give the probability of various cardiac pathologies, including AF (e.g. heart failure with reduced ejection fraction, aortic stenosis, cirrhosis, etc.).13 The data from these AI-enhanced ECGs are available in real-time via EHR-based ‘AI-ECG dashboard’ to all clinical staff providing patient care at the Mayo Clinic (Figure 2).13,68 As these data have been made easily accessible through Epic-EHR integration, our team performed a pragmatic clinical trial to assess how these available AI-ECG data may impact clinical care/ordering of diagnostic testing.69
In a large, multisite pragmatic trial conducted within our medical system, primary care practitioners were randomised to use the AI-ECG dashboard as a screening tool for heart failure with reduced ejection fraction.69 This study demonstrated that a positive AI-ECG score, indicating potential heart failure with reduced ejection fraction, can lead to earlier detection of disease in patients with minimal symptoms.69 This pragmatic effort illustrated how these integrated AI-ECG data could impact clinical care at the bedside. Similar case-by-case examples have been reported of how the data available on the AI-ECG dashboard led to or could have led to expedited diagnosis of underlying cardiac disease.68,70 As is reflected at our medical centre, other institutions have integrated AI-based data within their EHR or within reports of AI-enhanced tests (e.g. coronary CT scans).71
As further research clarifies the appropriate clinical implementation of these AI/ML-enhanced screening and diagnostic tools, we anticipate other institutions will continue to integrate these systems within their EHR or diagnostic reporting. However, as discussed previously, there must be thorough investigation and education regarding the use of these systems to avoid inappropriate or spurious use when further diagnostic testing may be inappropriate or potentially a harm.
The application of AI to medicine is beginning to sharpen the diagnosis and treatment of AF. Application of AI and ML techniques to clinical variables, EHR data, diagnostic testing (ECGs, chest radiographs, videography), intracardiac signals (derived from implanted devices or during invasive procedures), and wearable devices has facilitated the prediction and detection of AF in many clinical and non-clinical settings. Despite its early promise, we must remain vigilant to ensure equitable, generalisable, transparent and rigorous work in this emerging field.