[Paper] Evolving Multi-Channel Confidence-Aware Activation Functions for Missing Data with Channel Propagation

Published: 3 days ago (February 14, 2026 at 02:52 PM EST)

5 min read

Source: arXiv

Source: arXiv - 2602.13864v1

Overview

Missing data is a perennial headache for anyone building machine‑learning pipelines, and neural networks are no exception. In a new paper, Shahabi Sani et al. introduce Three‑Channel Evolved Activations (3C‑EA)—a family of activation functions that explicitly ingest not just the raw feature value, but also a missingness indicator and an imputation confidence score. Coupled with a deterministic propagation scheme called ChannelProp, the approach keeps these “reliability signals” alive throughout the network, delivering noticeably better classification results on a variety of incomplete datasets.

Key Contributions

Multi‑channel activation functions: Evolved via Genetic Programming to compute f(x, m, c), where x is the feature, m flags missingness, and c quantifies confidence in any imputed value.
ChannelProp algorithm: A lightweight, linear‑layer‑based method that propagates missingness (m) and confidence (c) forward, using weight magnitudes to decide how much signal to carry.
End‑to‑end evaluation: Systematic experiments on both naturally incomplete benchmarks and synthetically corrupted versions (MCAR, MAR, MNAR) across several missing‑rate regimes.
Open‑source implementation: The authors release the GP‑based activation search and ChannelProp code, making it easy for practitioners to plug into existing PyTorch/TensorFlow models.

Methodology

Data preparation – Each input vector is augmented with two extra channels:
- m ∈ {0,1} (1 = missing, 0 = observed)
- c ∈ [0,1] (higher values mean the imputed value is more trustworthy).
  Standard imputation (e.g., mean, k‑NN) fills the missing entries, producing the x values that the network will actually see.
Genetic Programming (GP) search –
- The search space consists of arithmetic and elementary functions (add, mul, sin, max, etc.) that can combine the three inputs.
- Individuals are tree‑structured expressions; fitness is measured by validation accuracy on a downstream classification task.
- Evolution runs for a fixed number of generations, yielding a Pareto front of compact, high‑performing activations.
ChannelProp propagation –
- After each linear layer, the missingness and confidence channels are updated deterministically:

[ m’ = \sigma\big(|W| \cdot m\big), \qquad c’ = \sigma\big(|W| \cdot c\big) ]

where |W| are absolute weight magnitudes and σ is a soft‑threshold that keeps the signals bounded.

This step ensures that downstream layers receive a graded sense of how reliable each feature is, rather than a binary “present/absent” flag that would be lost after the first hidden layer.

Training – The network (e.g., a 3‑layer MLP or a small CNN) is trained with standard back‑propagation; only the activation functions are fixed after the GP search.

Results & Findings

Dataset (missingness)	Baseline (ReLU)	3C‑EA + ChannelProp	Relative gain
UCI Adult (MCAR 30%)	81.2 % acc	84.5 %	+3.3 %
MNIST (MNAR 40%)	92.1 % acc	94.8 %	+2.7 %
Credit Card (natural)	88.6 % acc	90.9 %	+2.3 %

Consistent improvements across MCAR, MAR, and MNAR regimes, especially as the missing‑rate climbs beyond 30 %.
Ablation shows that using only the missingness flag (f(x,m)) yields modest gains, while adding the confidence channel (c) provides the bulk of the performance lift.
Computational overhead is negligible: the evolved activation trees typically contain ≤ 5 nodes, and ChannelProp adds a single linear pass per layer (≈ 1 % extra FLOPs).

Practical Implications

Plug‑and‑play reliability: Developers can augment any existing feed‑forward or convolutional model with three extra channels and swap in a 3C‑EA activation without redesigning the architecture.
Robustness in production pipelines: Systems that routinely ingest noisy, partially observed data (e.g., IoT sensor streams, medical records, recommender systems) can maintain a quantified confidence signal all the way to the output layer, reducing the risk of over‑confident mispredictions.
Reduced need for sophisticated imputation: Because the confidence channel captures how trustworthy an imputed value is, even simple imputation strategies (mean, median) become viable, saving compute and engineering effort.
Model interpretability: The tree‑based activations are human‑readable, allowing engineers to inspect how missingness and confidence influence neuron activation—a small step toward transparent deep models.

Limitations & Future Work

Scope of architectures: Experiments focus on relatively shallow MLPs and small CNNs; scaling the approach to large transformers or graph neural networks remains an open question.
GP search cost: While the final activations are cheap, the evolutionary search can be time‑consuming on very large datasets; future work could explore reinforcement‑learning‑based or gradient‑aware search methods.
Confidence estimation: The current pipeline relies on external imputation confidence scores; integrating a learned confidence estimator directly into the network could further tighten the feedback loop.
Theoretical guarantees: The paper provides empirical evidence but lacks formal analysis of how the propagated confidence bounds error propagation—an avenue for deeper statistical study.

Bottom line: By treating missingness and confidence as first‑class citizens in the activation function, 3C‑EA + ChannelProp offers a pragmatic, low‑overhead route to more reliable deep learning models when data is imperfect—a scenario that developers encounter far more often than textbook “complete” datasets.

Authors

Naeem Shahabi Sani
Ferial Najiantabriz
Shayan Shafaei
Dean F. Hougen

Paper Information

arXiv ID: 2602.13864v1
Categories: cs.NE, cs.LG
Published: February 14, 2026
PDF: Download PDF

[Paper] Evolving Multi-Channel Confidence-Aware Activation Functions for Missing Data with Channel Propagation

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Symmetry in language statistics shapes the geometry of model representations

[Paper] Long Context, Less Focus: A Scaling Gap in LLMs Revealed through Privacy and Personalization

[Paper] Rethinking Diffusion Models with Symmetries through Canonicalization with Applications to Molecular Graph Generation

[Paper] Generalization from Low- to Moderate-Resolution Spectra with Neural Networks for Stellar Parameter Estimation: A Case Study with DESI