Uncategorized

Evaluating atrial fibrillation artificial intelligence for the emergency department, statistical and clinical implications

Journal logoUnlabelled imageAmerican Journal of Emergency Medicine 57 (2022) 98-102

Contents lists available at ScienceDirect

American Journal of Emergency Medicine

journal homepage:

Evaluating atrial fibrillation artificial intelligence for the ED: statistical and clinical implications

Ann E. Kaminski, MD, MS a,?, Michael L. Albus, MD, MS a, Colleen T. Ball, MS b, Launia J. White b, Johnathan M. Sheele, MD, MPH, MHS a, Zachi I. Attia, PhD c, Paul A. Friedman, MD c,

Demilade A. Adedinsewo, MB,ChB d, Peter A. Noseworthy, MD c

a Department of Emergency Medicine, Mayo Clinic, Jacksonville, FL, United States of America

b Division of Clinical Trials and Biostatistics, Mayo Clinic, Jacksonville, FL, United States of America

c Department of cardiovascular medicine, Mayo Clinic, Rochester, MN, United States of America

d Department of Cardiovascular Medicine, Mayo Clinic, Jacksonville, FL, United States of America

a r t i c l e i n f o

Article history:

Received 1 February 2022

Received in revised form 6 April 2022 Accepted 10 April 2022

Keywords:

Artificial intelligence Atrial fibrillation Palpitations Emergency medicine Diagnosis

a b s t r a c t

Objective: An Artificial intelligence algorithm has been developed to detect the electrocardiographic signature of Atrial fibrillation present on an electrocardiogram (ECG) obtained during normal sinus rhythm. We evaluated the ability of this algorithm to predict incident AF in an emergency department (ED) cohort of pa- tients presenting with palpitations without concurrent AF. Methods: This retrospective study included patients 18 years and older who presented with palpitations to one of 15 ED sites and had a 12-lead ECG performed. Patients with prior AF or newly diagnosed AF during the ED visit were excluded. Of the remaining patients, those with a follow up ECG or Holter monitor in the subsequent year were included. We evaluated the performance of the AI-ECG output to predict incident AF within one year of the index ECG by estimating an area under the receiver operating characteristics curve (AUC). Sensitivity, specificity, and positive and negative predictive values were determined at the optimum threshold (maximizing sensitivity and specificity), and thresholds by output decile for the sample.

Results: A total of 1403 patients were included. Forty-three (3.1%) patients were diagnosed with new AF during

the following year. The AI-ECG algorithm predicted AF with an AUC of 0.74 (95% CI 0.68-0.80), and an optimum threshold with sensitivity 79.1% (95% Confidence Interval (CI) 66.9%-91.2%), and specificity 66.1% (95% CI 63.6%- 68.6%).

Conclusions: We found this AI-ECG AF algorithm to maintain statistical significance in predicting incident AF, with clinical utility for screening purposes limited in this ED population with a low incidence of AF.

(C) 2022

  1. Introduction
    1. Background

Palpitations is a common emergency department (ED) chief com- plaint, generating a broad differential diagnosis which includes both cardiac and Noncardiac etiologies. Atrial fibrillation or flutter (AF) affects approximately 5 million people nationally and is increasing in prevalence worldwide [1,2]. AF may be paroxysmal, and patients may

* Corresponding author at: Department of Emergency Medicine, 4500 San Pablo Road, Jacksonville, FL 32224, United States of America.

E-mail addresses: [email protected] (A.E. Kaminski), [email protected] (M.L. Albus), [email protected] (C.T. Ball), [email protected] (L.J. White), [email protected] (J.M. Sheele), [email protected] (Z.I. Attia), [email protected] (P.A. Friedman), [email protected]

(D.A. Adedinsewo), [email protected] (P.A. Noseworthy).

experience intermittent palpitations without evidence of AF at the time of a single electrocardiogram (ECG). Complications of untreated AF include stroke, heart failure and death [3,4]. early interventions in- cluding Oral anticoagulation and initiation of rate or Rhythm control strategies can reduce the risk of these complications [5].

    1. Importance

Patients with palpitations may be discharged from the ED if a resting ECG is not concerning and the remainder of their evaluation is unremarkable. Some will undergo further surveillance for AF or other arrhythmias. Surveillance methods can include Holter monitors, event recorders and patient owned devices (home blood pressure machine or smartwatches) with scheduled follow-up primary care provider or specialty visits [6,7]. Although palpitations may be the first presenting

https://doi.org/10.1016/j.ajem.2022.04.032

0735-6757/(C) 2022

symptom of AF, it is often difficult to determine which patients with this presentation may benefit most from further investigation.

An artificial intelligence (AI) algorithm has been developed to detect the probability of concomitant AF (paroxysmal or intermittent) at the time of an ECG obtained during sinus rhythm [8,9]. This convolutional neural network model was developed using the Keras framework in Python with a TensorFlow (Google) backend. It included 180,922 pa- tients with 649,931 ECGs to estimate the probability of AF within 30 days from an initial ECG in normal sinus rhythm. This model was able to classify patients with and without AF within one month of an index ECG with an area under the receiver operating curve (AUC) of 0.87 (0.86-0.88), sensitivity 79.0% (95% CI 77.5%-80.4%), and specificity 79.5% (95% CI 79.0%-79.9%) [8]. Our objective was to evaluate the ability of the AI-ECG model to predict concomitant or future AF within one year among patients presenting to an ED with a chief complaint of palpita- tions.

  1. Methods
    1. Study design and participant selection

This retrospective study included patients age 18 years and older who presented to one of fifteen emergency departments across four states from August 2017 to January 2020 with the chief complaint of palpitations or similar description, N = 5040 (Fig. 1). Similar descrip- tions for palpitations included ‘irregular heartbeat’, ‘heart racing’, ‘rapid heartbeat’, ‘heart fluttering’, ‘arrhythmia’, and ‘heart pounding.’

We excluded patients with a prior history of AF (n = 1290), patients without a 12 lead-ECG performed during the index ED visit (n = 72), those who had a prior ECG that was used in the AI-ECG model develop- ment (n = 178), patients with a new diagnosis of AF during the index ED visit (n = 440), and patients who had paced rhythms (atrial or ven- tricular pacing) on their index ECG (n = 31). When more than one ECG

was performed during an ED visit, the first ECG performed was selected for analysis. The first ED visit for this concern was included as the index visit for each patient in the selected time frame. Non-sinus atrial rhythms that were not AF were included, as they may reflect structural changes that precede the development of AF [10]. Of the remaining 3027 patients, 394 did not have follow up within the study time frame, and 1230 had a return visit that did not include an ECG or Holter monitor. The final analysis included 1403 patients. This study was deemed exempt by the IRB.

    1. Measurements and outcomes

The primary outcome was a new diagnosis of AF within 1 year of the index ED visit and ECG, as listed in the electronic health record through an ICD-9 and ICD-10 diagnosis search. A physician chart review was conducted on patients identified with new onset atrial fibrillation to confirm the new diagnosis.

Additional variables included demographic information, past medi- cal history of stroke, transient ischemic attack, diabetes, hypertension, and heart failure, and the CHA2DS2-VASc score at time of visit. The CHA2DS2-VASc score estimates thromboembolic risk in patients with AF, and anticoagulation is recommended for stroke/thromboembolic prophylaxis in patients with a score >= 2 [11,12].

All index ECGs that met inclusion criteria were analyzed with the AI- ECG model, which generated probability estimates between 0 and 1.

    1. Analysis

Continuous variables were summarized by interquartile range (IQR) while categorical variables were summarized as frequencies and per- centages. Fisher’s exact test or Welch t-test was performed to compare descriptive variables described in percentages, and age. The Wilcoxon

Study population, patients 18 years of age and over presenting to Mayo Clinic Emergency Departments with a chief complaint of palpitations (N=5040).

5040 for inclusion

Prior history of atrial fibrillation/flutter (AF) (-1292) 3748 for inclusion

No ECG performed during ED visit (-72)

Prior ECG data used in AI model development (-178) Ventricular or paced rhythm on ECG (-31)

New diagnosis of AF during ED visit (-440) No follow up within 365 days (-394)

Follow up without repeat ECG/Holter monitor (-1230) 1403 for inclusion

/ \

New AF diagnosis No AF Diagnosis

43 1360

Fig. 1. Patient flow diagram.

Table 1

Patient baseline characteristics.

Characteristics New AF (n = 43)

No New AF (n = 1360)

Overall

(n = 1403)

P value

original sample for differences using DeLong’s test. All analyses were performed using R version 4.0.4 (R Foundation for Statistical Comput- ing, Vienna, Austria), including the pROC package version 1.18.0.

Age, years 66 (53, 72) 48 (34, 63) 49 (34, 64) <0.001?

Sex, female 23 (53.4%) 917 (67.4%) 940 (67.0%) 0.069

  1. Results

A total of 1403 patients were included in the final analysis (Fig. 1). The median age was 49 years (IQR 34-64), and 67% of patients were

female. The median CHA DS -VASc score was 1 (IQR 0-2), (Table 1).

Race

Black

1 (2.3%)

81 (6.0%)

82 (5.8%)

0.763

White

40 (93.0%)

1193 (87.7%)

1233 (87.9%)

Other

2 (4.7%)

86 (6.3%)

88 (6.3%)

Ethnicity

0.863

Hispanic or

1 (2.3%)

63 (4.6%)

64 (4.6%)

Latino

Not Hispanic or

42 (97.7%)

1274 (93.7%)

1316 (93.8%)

Latino

2 2

The median index AI-ECG probability estimate for AF was 0.01514 (IQR 0.00470, 0.05214). Characteristic comparisons between New onset AF and no new AF diagnosis are additionally described in Table 1. AI-ECG output by decile for this study sample is further

Other

0

23 (1.7%)

23 (1.6%)

described in Table 2.

Hypertension

16 (37.2%)

449 (33.0%)

465 (33.1%)

0.622

During the pre-specified follow up period of 365 days, 43/1403

Diabetes Coronary Artery

11 (25.6%)

14 (32.6%)

213 (15.7%)

212 (15.6%)

224 (16.0%)

226 (16.1%)

0.090

0.006?

(3.1%) patients obtained a new diagnosis of AF. Six hundred sixty-two

patients in the follow up group underwent Holter monitoring, with

Disease

Heart Failure

5 (11.6%)

98 (7.2%)

103 (7.3%)

0.239

23/662 (3.5%) receiving a new AF diagnosis. Twenty patients out of

Stroke

0

83 (6.1%)

83 (5.9%)

0.002?

741 (2.7%) were newly diagnosed in the group that had repeat ECG

CHA2DS2-VASc

2 (1,3)

1 (0,2)

1 (0, 2)

0.044?

testing without Holter monitoring.

score

The AUC for all patients who received ECG or Holter monitor follow-

Values are median (25th percentile, 75th percentile) or n (%).

Index ECG

0.05486

0.01466

0.01514

<0.001?

AI-ECG AF model

(0.03023,

(0.00457,

(0.00470,

output

0.20403)

0.04935)

0.05214)

Fisher’s exact test or Welch t-test was performed for variables described in percentages, and age. Wilcoxon rank sum test for remaining variables described in quartiles.

* p < 0.05.

rank sum test was performed for remaining demographic variables, with a Type I error rate of 0.05.

Model performance was evaluated using an area under the receiver operating characteristics curve (AUC) formed by modeling the AI-ECG AF probability output in relationship to the new diagnosis of AF within one year of the index ECG. The AUC was additionally constructed for fol- low up ECG and Holter subgroup types, as these follow up groups may represent different levels of concern for atrial fibrillation. The confi- dence interval for each AUC, and a comparison between the AUC of the two subgroups was conducted using DeLong’s test, with a Type I error rate of 0.05 [13,14]. An optimum cutoff point, defined as the threshold maximizing specificity and sensitivity, was calculated using Youden’s index [15]. Diagnostic performance including sensitivity and specificity was calculated for each decile in the distribution of AI-ECG probability scores generated in the study sample. Ninety-five percent confidence intervals were calculated for sensitivity and specificity using 2000 stratified bootstrap replicates. Descriptive variables that dif- fered significantly in patients diagnosed with AF were used to define subpopulations, each with an associated AUC, and compared to the

up was 0.74 (95% CI 0.68-0.80), (Fig. 2). The optimum cutoff (maximiz- ing specificity and sensitivity) was 0.02950, with sensitivity of 79.1% (95% CI 65.1%-90.7%), and specificity of 66.1% (95% CI 63.5%-68.5%).

When comparing the subgroups of patient follow up with Holter or ECG alone, no significant difference was found between their respective areas under the curve, p-value = 0.39. No significant differences were found when comparing the AUC of the study sample to subgroups defined by age greater than 50 years, presence of hypertension, coro- nary artery disease, or CHA2DS2-VASc score greater than 1, p > 0.05. Sensitivity and specificity based on the distribution of AI-ECG outputs generated by the study sample are indicated by decile in Table 2.

  1. Discussion

The AI-ECG model was able to predict incident AF over one year, however our results suggests that the model may have limited use as a clinical screening tool for this population with a low incidence of AF. Sensitivity at the optimum cutoff, 79.1%, (95% CI 65.1%-90.7%) was low. We explored other thresholds that achieved a sensitivity of 90% or greater, however they had accompanying low specificities. The cutoff at the fourth decile, 0.00946, for example, could be considered for supporting a decreased need for specialty follow up or Holter monitor- ing, as sensitivity was 93.0% (95% CI 83.7%-100%), however specificity was limited at 41.0% (95% CI 38.6%-43.8%).

Applying the original AI-ECG model to a patient population different than the one used in its creation may result in sub-optimal model per- formance [16]. This possibility is suggested in our analysis. Our sample was younger and predominantly female compared to the data set

Table 2

AI-ECG performance for predicting new AF within one year by threshold decile of index ECG AI outputs (N = 1403). Threshold score median = 0.01514 (IQR 0.00470, 0.05214).

AI-ECG Probability

Decile

Threshold

Sensitivity (95% CI), %

Specificity (95% CI), %

PPV (95% CI), %

NPV (95% CI), %

10%

0.00162

95.4 (88.4-100)

10.24 (8.6-11.8)

3.2 (3.0-3.4)

98.6 (96.4-1)

20%

0.00365

95.4 (88.4-100)

20.5 (18.3-22.7)

3.7 (3.4-3.9)

99.3 (98.2-1)

30%

0.00583

95.4 (88.4-100)

30.8 (28.4-33.2)

4.2 (3.9-4.4)

99.5 (98.8-1)

40%

0.00946

93.0 (83.7-100)

41.0 (38.6-43.8)

4.8 (4.3-5.1)

99.5 (98.8-1)

50%

0.01514

86.1 (74.4-95.4)

51.1 (48.4-53.8)

5.3 (4.6-5.9)

99.1 (98.5-99.7)

60%

0.02315

81.4 (69.8-93.0)

61.3 (58.8-63.8)

6.2 (5.3-7.1)

99.0 (98.4-99.6)

70%

0.03809

55.8 (41.8-69.8)

70.8 (68.4-73.2)

5.7 (4.1-7.2)

98.1 (97.4-98.7)

80%

0.07062

46.4 (32.6-60.5)

80.8 (78.7-83.0)

7.1 (4.8-0.4)

98.0 (97.4-98.6)

90%

0.19008

24.6 (14.0-39.5)

90.4 (88.8-92.0)

7.8 (4.1-11.8)

97.5 (97.1-98.0)

Abbreviations: AI = artificial intelligence, ECG = electrocardiogram, IQR = interquartile range. PPV = Positive predictive value, NPV = Negative predictive value.

This research did not receive any specific grants or funding from agencies in the public, commercial, or not-for-profit sectors.

AUC: 0.742 (0.676-0.808)

1.0

Author contributions

0.8

AK, MA, DA, JS and PN conceived the study and designed the trial. PN, ZA, and PF provided the algorithm and facilitated artificial intelligence data collection. CB and LW supervised the data collection and analysis. CB provided statistical advice on study design and CB and AK analyzed the data; AK drafted the manuscript, and all authors contributed sub- stantially to its revision. AK and MA take responsibility for the paper as a whole, and contributed equally to this study.

Sensitivity

0.4

0.6

Conflicts of interest

0.2

The authors declare that they have no relevant conflicts of interest.

CRediT authorship contribution statement

0.0

0.0 0.2 0.4 0.6 0.8 1.0

1 – Specificity

Fig. 2. Receiver operating characteristic (ROC) curve for the artificial intelligence algo- rithm and subsequent atrial fibrillation diagnosis (N = 1403). AUC = area under the curve.

used to create the original model. The original AI model utilized a digital data platform which spanned both inpatient and outpatient settings, with a mean age of 60, and a roughly equal distribution of sex. Our endpoint also differed from that used in the original model. One year was selected as the endpoint in our evaluation as follow up from an ED visit often occurred at greater than 30 days, and it may take several follow up visits or a recurrence of symptoms to detect or suspect AF in this population.

As AI research and its clinical application continues to progress we expect models designed for this purpose to evolve. The role of the prac- ticing clinician in its application becomes integral to its effective use for both the provider and patient.

    1. Limitations

New AF may have been under detected in this analysis, as a repeat ECG or Holter monitor may not be sufficient screening for AF. Cardiac etiologies other than atrial fibrillation are often considered in clinical decision making when repeat ECG and Holter monitoring is indicated. The low incidence of AF during the follow up period is noted and influences model performance analysis, including positive and negative predictive values. Differences between this sample and the original pop- ulation used in model creation are included in the discussion section.

  1. Conclusions

The AI-ECG AF algorithm performance shows continued statistical significance, with limited clinical utility when applied as a screening tool to this ED patient population experiencing palpitations.

Financial support

Internal hospital funds were destined to cover for statisticians’ hours and reports. The funding organization had no role in the design and con- duct of the study; collection, management, analysis and interpretation of the data; preparation, review or approval of the manuscript; and de- cision to submit the manuscript for publication.

Ann E. Kaminski: Writing – review & editing, Writing – original draft, Supervision, Methodology, Investigation, Funding acquisition, Formal analysis, Conceptualization. Michael L. Albus: Writing – review & editing, Methodology, Investigation, Formal analysis, Conceptualiza- tion. Colleen T. Ball: Writing – review & editing, Methodology, Formal analysis, Data curation. Launia J. White: Writing – review & editing, Data curation, Conceptualization. Johnathan M. Sheele: Writing – re- view & editing, Supervision, Resources, Funding acquisition, Conceptu- alization. Zachi I. Attia: Writing – review & editing, Conceptualization. Paul A. Friedman: Writing – review & editing, Supervision, Investiga- tion, Conceptualization. Demilade A. Adedinsewo: Writing – review & editing, Methodology, Investigation, Data curation, Conceptualization. Peter A. Noseworthy: Writing – review & editing, Supervision, Project administration, Methodology, Formal analysis, Conceptualization.

References

  1. Dilaveris PE, Kennedy HL. Silent atrial fibrillation: epidemiology, diagnosis, and clin- ical impact. Clin Cardiol. 2017;40:413-8.
  2. Kornej J, Borschel CS, Benjamin EJ, Schnabel LE. Epidemiology of atrial fibrillation in the 21st century: Novel methods and new insights. Circ Res. 2020;127:4-20.
  3. Kirchhof P. The future of atrial fibrillation management: integrated care and strati-

fied therapy. Lancet. 2017;390:1873-87.

  1. Lip GY, Hunter TD, Quiroz ME, Ziegler PD, Turakhia MP. Atrial fibrillation diagnosis timing, ambulatory ECG monitoring utilization, and risk of recurrent stroke. Circ Cardiovasc Qual Outcomes. 2017;10:e002864.
  2. Henriksson KM, Farahmand B, Johansson S, Asberg S, Terent A, Edvardsson N. Sur- vival after stroke-the impact of CHADS2 score and atrial fibrillation. Int J Cardiol. 2010;141:18-23.
  3. Abi Khalil C, Haddad F, Al Suwaidi J. Investigating palpitations: the role of Holter monitoring and loop recorders. BMJ. 2017;358:3128.
  4. Reed MJ, Grubb NR, Lang CC, et al. Multi-centre randomised controlled trial of a smart phone-based event recorder alongside standard care versus standard care for patients presenting to the emergency department with palpitations and pre- syncope-the IPED (investigation of palpitations in the ED) study: study protocol for a randomised controlled trial. Trials. 2018;19:1-9.
  5. Attia ZI, Noseworthy PA, Lopez-Jimenez F, et al. An artificial intelligence-enabled ECG algorithm for the identification of patients with atrial fibrillation during sinus rhythm: a retrospective analysis of outcome prediction. Lancet. 2019;394:861-7.
  6. Siontis KC, Noseworthy PA, Attia ZI, Friedman PAe.. Artificial intelligence-enhanced electrocardiography in cardiovascular disease management. Nat Rev Cardiol. 2021; 18:465-78.
  7. Perez M, Dewey F, Marcus R, et al. Electrocardiographic predictors of atrial fibrilla- tion. Am Heart J. 2009;158:622-8.
  8. January CT, Wann LS, Calkins H, et al. 2019 AHA/ACC/HRS focused update of the 2014 AHA/ACC/HRS guideline for the management of patients with atrial fibrilla- tion: a report of the American College of Cardiology/American Heart Association task force on clinical practice guidelines and the Heart Rhythm Society. J Am Coll Cardiol. 2019;74(1):104-32.
  9. Lip GY, Nieuwlaat R, Pisters R, Lane DA, Crijns HJ. Refining clinical risk stratification for predicting stroke and thromboembolism in atrial fibrillation using a novel risk factor-based approach: the euro heart survey on atrial fibrillation. Chest. 2010; 137:263-72.
  10. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Bio- metrics. 1988;44(3):837-45.
  11. Sun X, Xu W. Fast implementation of DeLong’s algorithm for comparing the areas under correlated receiver operating characteristic curves. Signal Process Lett IEEE. 2014;21(11):1389-93.
  12. Youden WJ. Index for rating Diagnostic tests. Cancer. 1950 Jan;3(1):32-5.
  13. Siontis K, Noseworthy P, Arghami A, et al. Use of artificial intelligence tools across different clinical settings: a cautionary tale. Circ Cardiovasc Qual Outcomes. 2021; 14:e008153.