[Paper] ECG-Lens: Benchmarking ML & DL Models on PTB-XL Dataset

Published: (April 17, 2026 at 04:20 AM EDT)
5 min read
Source: arXiv

Source: arXiv - 2604.15822v1

Overview

The paper ECG‑Lens presents a systematic benchmark of six machine‑learning pipelines—three classic classifiers and three deep‑learning architectures—on the large‑scale PTB‑XL 12‑lead electrocardiogram (ECG) dataset. By training the deep models directly on raw waveforms and augmenting the data with a Stationary Wavelet Transform, the authors show that a purpose‑built complex CNN can push classification accuracy to 80 % and ROC‑AUC to 90 %, setting a new practical reference point for automated ECG analysis.

Key Contributions

  • Comprehensive benchmark of traditional ML (Decision Tree, Random Forest, Logistic Regression) vs. deep learning (simple CNN, LSTM, and the proposed “ECG‑Lens” complex CNN) on the same raw 12‑lead PTB‑XL data.
  • End‑to‑end raw‑signal training: No hand‑crafted feature extraction; the networks learn discriminative patterns directly from the waveform.
  • Wavelet‑based data augmentation using Stationary Wavelet Transform (SWT) to increase sample diversity while preserving clinically relevant morphology.
  • Multi‑metric evaluation (accuracy, precision, recall, F1, ROC‑AUC) providing a holistic view of model performance across imbalanced cardiac classes.
  • Open‑source benchmark code & trained weights (released with the paper) to accelerate reproducibility and downstream research.

Methodology

  1. Dataset – PTB‑XL, a publicly available collection of 21,837 12‑lead ECG recordings (10‑second duration) labeled with 5 diagnostic super‑classes (e.g., Normal, Myocardial Infarction, etc.).
  2. Pre‑processing – Signals are resampled to a common frequency, normalized, and split into training/validation/test sets (70/15/15).
  3. Data Augmentation – For each training record, an SWT decomposition is performed; selected sub‑bands are recombined with random scaling to generate synthetic variants that retain QRS complexes, P‑waves, and ST‑segments.
  4. Model families
    • Traditional ML: Feature vectors are built from time‑domain statistics (mean, variance, skewness) and frequency‑domain descriptors (power spectral density). These feed into Decision Tree, Random Forest, and Logistic Regression classifiers.
    • Deep Learning:
      • Simple CNN – 3 convolutional layers + global max‑pooling, trained on raw 12‑lead tensors.
      • LSTM – Two stacked LSTM layers capture temporal dependencies across the 10‑second trace.
      • ECG‑Lens (Complex CNN) – 7 convolutional blocks with residual connections, multi‑scale kernels (3, 5, 7), and squeeze‑and‑excitation modules to adaptively weight lead‑specific information. Ends with a fully‑connected head for multi‑class output.
  5. Training – Adam optimizer, cosine‑annealing learning‑rate schedule, early stopping on validation loss. Class imbalance is mitigated with weighted cross‑entropy.
  6. Evaluation – Confusion matrices, per‑class precision/recall, macro‑averaged F1, and ROC‑AUC (one‑vs‑rest) are reported.

Results & Findings

ModelAccuracyMacro F1ROC‑AUC
Decision Tree58 %0.520.71
Random Forest63 %0.580.77
Logistic Regression61 %0.550.74
Simple CNN71 %0.660.84
LSTM73 %0.680.86
ECG‑Lens (Complex CNN)80 %0.750.90
  • Deep models consistently outperformed classic ML, confirming that raw waveform learning captures richer morphology than handcrafted statistics.
  • ECG‑Lens achieved the best trade‑off across all metrics, especially ROC‑AUC, indicating strong discriminative power even for minority classes.
  • Wavelet augmentation contributed ~3‑4 % absolute gain in accuracy for the deep models, demonstrating its utility for limited‑size medical time‑series.

Practical Implications

  • Rapid prototyping for health‑tech startups – The benchmark shows that a well‑designed CNN can be trained on publicly available data and reach clinically relevant performance without costly feature engineering.
  • Edge deployment – ECG‑Lens’s architecture, while deeper than a simple CNN, remains lightweight enough (≈1.2 M parameters) for inference on modern micro‑controllers or mobile devices, enabling point‑of‑care arrhythmia screening.
  • Model selection guidance – Teams can use the provided performance table to justify a shift from traditional ML pipelines (easier to interpret but less accurate) to deep CNNs when higher diagnostic sensitivity is required.
  • Data‑augmentation recipe – The SWT‑based augmentation can be plugged into existing pipelines to mitigate class imbalance, a common pain point in medical datasets.
  • Regulatory pathways – By benchmarking against a recognized standard (PTB‑XL) and reporting a full suite of metrics, developers gain a baseline that can be referenced in FDA/EMA submissions for AI‑based ECG analysis tools.

Limitations & Future Work

  • Dataset scope – PTB‑XL, while large, contains only 10‑second recordings from a single acquisition protocol; performance on longer Holter or wearable ECG streams remains untested.
  • Interpretability – The paper focuses on accuracy metrics; explainability methods (e.g., saliency maps, attention) are not explored, which are crucial for clinical acceptance.
  • Generalization to rare pathologies – Some diagnostic subclasses have very few examples; even the best model shows reduced recall on these, suggesting a need for targeted data collection or few‑shot learning techniques.
  • Real‑world validation – No external validation on a separate hospital dataset is presented; future work should assess domain shift and robustness to noise/artifact variations.

Bottom line: ECG‑Lens demonstrates that a thoughtfully engineered convolutional network, trained end‑to‑end on raw 12‑lead ECGs and bolstered by wavelet‑based augmentation, can set a new performance bar for automated cardiac diagnosis. For developers building AI‑driven health products, the paper offers a ready‑to‑use architecture, a reproducible benchmark, and practical insights on scaling from research to production.

Authors

  • Saloni Garg
  • Ukant Jadia
  • Amit Sagtani
  • Kamal Kant Hiran

Paper Information

  • arXiv ID: 2604.15822v1
  • Categories: cs.LG, cs.AI, cs.CE, cs.NE, eess.SP
  • Published: April 17, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »