Including inconclusive smartwatch readings in diagnostic reporting changes how accurately wearable ECGs detect AF. A prospective study published in Heart Rhythm found that real-world testing frameworks yield very different accuracy values compared with traditional validation methods.
The study involved 296 adults at a teaching hospital in Ireland who underwent ECG testing using a consumer smartwatch. Recordings were analyzed with both the native device algorithm and an AI-based neural network. Three reporting frameworks were compared: naive (excluding inconclusive results), pragmatic (counting them as incorrect), and intention-to-diagnose (allowing three attempts).
Under the naive approach, sensitivity and specificity reached 96.1% and 97.9%, while the pragmatic approach showed lower values of 78.1% and 81.0%. Under the intention-to-diagnose model, accuracy reached 92.2% sensitivity and 91.0% specificity. The AI system achieved 98.4% sensitivity and 96.6% specificity, with a 92% reduction in inconclusive outputs. Repeatability was substantial for the smartwatch (κ = 0.77) and near-perfect for AI (κ = 0.96).
These findings highlight the need to use intention-to-diagnose frameworks that include inconclusive readings and repeat testing. This approach provides a more realistic and clinically relevant measure of wearable ECG accuracy for AF detection.