[Paper] Same Brain, Different Prediction: How Preprocessing Choices Undermine EEG Decoding Reliability

Published: 3 days ago (May 7, 2026 at 11:58 PM EDT)

5 min read

Source: arXiv

Source: arXiv - 2605.07212v1

Overview

This paper uncovers a hidden source of unreliability in EEG‑based machine‑learning systems: the way raw signals are pre‑processed can dramatically flip a model’s predictions, even when the underlying deep‑learning architecture stays the same. By treating each preprocessing step as a “counterfactual intervention,” the authors quantify how much prediction outcomes depend on these choices and propose tools to measure, diagnose, and mitigate the problem.

Key Contributions

Formal counterfactual framework for EEG preprocessing, mapping the 2⁷ (128) possible pipeline configurations to a well‑defined intervention space.
Empirical evidence of instability: up to 42 % of trial‑level predictions change solely because of different preprocessing pipelines, across six diverse EEG datasets.
Walsh‑Hadamard decomposition of the pipeline space, showing that the effect of each preprocessing step is almost additive, enabling fast, step‑wise optimization.
Preprocessing Uncertainty (PU): a per‑trial metric that captures how sensitive a prediction is to preprocessing variations, complementing traditional model confidence scores.
Normalized Adaptive PGI (NA‑PGI): a graph‑structured regularizer that leverages the compositional relationships among pipelines to reduce prediction volatility.

Methodology

Pipeline Definition: The authors selected seven common EEG preprocessing operations (e.g., band‑pass filtering, artifact rejection, re‑referencing, epoching). Each operation can be toggled on/off, yielding a binary vector that uniquely identifies a pipeline.
Counterfactual Intervention Space: By treating each binary vector as an intervention, they generated all 128 possible pipelines for every dataset.
Model Training & Evaluation: A standard convolutional neural network (CNN) for EEG decoding was trained once on raw data and then evaluated on each of the 128 preprocessed versions of the test set, keeping the model weights fixed.
Walsh‑Hadamard Decomposition: This mathematical transform breaks down the overall prediction variance into contributions from individual preprocessing steps and their interactions. The near‑additive result means higher‑order interactions are negligible.
Preprocessing Uncertainty (PU): For each trial, PU is computed as the entropy of the prediction distribution across all pipelines, yielding a simple scalar that flags unstable instances.
NA‑PGI Regularizer: During training, a graph is built where nodes are pipelines and edges connect pipelines that differ by a single preprocessing step. The regularizer penalizes large prediction jumps across edges, encouraging smoothness over the pipeline graph.

Results & Findings

Prediction Flips: Across the six datasets (motor imagery, visual evoked potentials, speech perception, etc.), the proportion of trials whose predicted class changed when moving from one pipeline to another ranged from 12 % to 42 %.
Additivity: Walsh‑Hadamard analysis revealed that > 90 % of the variance could be explained by the sum of individual step effects; higher‑order interactions contributed < 5 %.
PU as a Diagnostic: Trials with high PU scores consistently corresponded to low model confidence and higher error rates, suggesting PU can be used to flag “risky” predictions in real‑time systems.
NA‑PGI Effectiveness: Adding the NA‑PGI regularizer reduced the average flip rate by ≈ 15 % (e.g., from 38 % to 23 % on the most volatile dataset) without sacrificing overall accuracy.
Generalizability: The observed instability persisted across different model architectures (CNN, LSTM, Transformer) and across both subject‑dependent and subject‑independent training regimes.

Practical Implications

Robust BCI Deployments: Developers building brain‑computer interfaces should treat preprocessing as a hyperparameter space rather than a fixed step; tools like PU can be integrated into runtime monitoring to abort or request re‑acquisition when uncertainty spikes.
Standardized Reporting: The study highlights the need for papers and open‑source repos to explicitly document every preprocessing choice, enabling reproducibility and fair benchmarking.
Automated Pipeline Search: Because the effect of each step is near‑additive, simple greedy or Bayesian optimization over the binary pipeline space can quickly converge to a low‑PU configuration, saving time compared to exhaustive grid searches.
Regulatory & Clinical Settings: In medical EEG applications (e.g., seizure detection), incorporating PU could satisfy safety requirements by providing an extra layer of confidence that the system’s decision is not an artifact of hidden preprocessing bias.
Tooling Opportunities: The Walsh‑Hadamard decomposition and NA‑PGI regularizer can be packaged as plug‑ins for popular EEG libraries (MNE‑Python, Braindecode), giving developers out‑of‑the‑box stability improvements.

Limitations & Future Work

Scope of Preprocessing Steps: The study examined seven common operations; other domain‑specific steps (e.g., source localization, ICA component selection) may exhibit different interaction patterns.
Fixed Model Weights: The analysis kept the neural network static while varying pipelines; jointly optimizing model parameters and preprocessing could further reduce instability.
Dataset Diversity: Although six datasets were used, all were laboratory‑controlled experiments. Real‑world noisy environments (e.g., wearable EEG) may amplify or alter the observed effects.
Computational Cost: Exhaustively evaluating all 128 pipelines can be prohibitive for very large datasets; future work could explore surrogate models to estimate PU without full enumeration.

Bottom line: preprocessing isn’t just a “nice‑to‑have” data‑cleaning step—it’s a decisive factor that can flip your EEG model’s predictions. By measuring and regularizing this hidden source of uncertainty, developers can build more reliable, transparent, and deployable brain‑computer systems.

Authors

Dengzhe Hou
Zihao Wu
Lingyu Jiang
Zirui Li
Fangzhou Lin
Kazunori D. Yamada

Paper Information

arXiv ID: 2605.07212v1
Categories: cs.LG, cs.AI, cs.HC, cs.NE, eess.SP
Published: May 8, 2026
PDF: Download PDF

[Paper] Same Brain, Different Prediction: How Preprocessing Choices Undermine EEG Decoding Reliability

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Normalizing Trajectory Models

[Paper] Zero-Shot Imagined Speech Decoding via Imagined-to-Listened MEG Mapping

[Paper] GRAPHLCP: Structure-Aware Localized Conformal Prediction on Graphs

[Paper] EmambaIR: Efficient Visual State Space Model for Event-guided Image Reconstruction