[Paper] Evolving Multi-Channel Confidence-Aware Activation Functions for Missing Data with Channel Propagation
Source: arXiv - 2602.13864v1
Overview
Missing data is a perennial headache for anyone building machine‑learning pipelines, and neural networks are no exception. In a new paper, Shahabi Sani et al. introduce Three‑Channel Evolved Activations (3C‑EA)—a family of activation functions that explicitly ingest not just the raw feature value, but also a missingness indicator and an imputation confidence score. Coupled with a deterministic propagation scheme called ChannelProp, the approach keeps these “reliability signals” alive throughout the network, delivering noticeably better classification results on a variety of incomplete datasets.
Key Contributions
- Multi‑channel activation functions: Evolved via Genetic Programming to compute
f(x, m, c), wherexis the feature,mflags missingness, andcquantifies confidence in any imputed value. - ChannelProp algorithm: A lightweight, linear‑layer‑based method that propagates missingness (
m) and confidence (c) forward, using weight magnitudes to decide how much signal to carry. - End‑to‑end evaluation: Systematic experiments on both naturally incomplete benchmarks and synthetically corrupted versions (MCAR, MAR, MNAR) across several missing‑rate regimes.
- Open‑source implementation: The authors release the GP‑based activation search and ChannelProp code, making it easy for practitioners to plug into existing PyTorch/TensorFlow models.
Methodology
-
Data preparation – Each input vector is augmented with two extra channels:
m ∈ {0,1}(1 = missing, 0 = observed)c ∈ [0,1](higher values mean the imputed value is more trustworthy).
Standard imputation (e.g., mean, k‑NN) fills the missing entries, producing thexvalues that the network will actually see.
-
Genetic Programming (GP) search –
- The search space consists of arithmetic and elementary functions (add, mul, sin, max, etc.) that can combine the three inputs.
- Individuals are tree‑structured expressions; fitness is measured by validation accuracy on a downstream classification task.
- Evolution runs for a fixed number of generations, yielding a Pareto front of compact, high‑performing activations.
-
ChannelProp propagation –
- After each linear layer, the missingness and confidence channels are updated deterministically:
[ m’ = \sigma\big(|W| \cdot m\big), \qquad c’ = \sigma\big(|W| \cdot c\big) ]
where |W| are absolute weight magnitudes and σ is a soft‑threshold that keeps the signals bounded.
- This step ensures that downstream layers receive a graded sense of how reliable each feature is, rather than a binary “present/absent” flag that would be lost after the first hidden layer.
- Training – The network (e.g., a 3‑layer MLP or a small CNN) is trained with standard back‑propagation; only the activation functions are fixed after the GP search.
Results & Findings
| Dataset (missingness) | Baseline (ReLU) | 3C‑EA + ChannelProp | Relative gain |
|---|---|---|---|
| UCI Adult (MCAR 30%) | 81.2 % acc | 84.5 % | +3.3 % |
| MNIST (MNAR 40%) | 92.1 % acc | 94.8 % | +2.7 % |
| Credit Card (natural) | 88.6 % acc | 90.9 % | +2.3 % |
- Consistent improvements across MCAR, MAR, and MNAR regimes, especially as the missing‑rate climbs beyond 30 %.
- Ablation shows that using only the missingness flag (
f(x,m)) yields modest gains, while adding the confidence channel (c) provides the bulk of the performance lift. - Computational overhead is negligible: the evolved activation trees typically contain ≤ 5 nodes, and ChannelProp adds a single linear pass per layer (≈ 1 % extra FLOPs).
Practical Implications
- Plug‑and‑play reliability: Developers can augment any existing feed‑forward or convolutional model with three extra channels and swap in a 3C‑EA activation without redesigning the architecture.
- Robustness in production pipelines: Systems that routinely ingest noisy, partially observed data (e.g., IoT sensor streams, medical records, recommender systems) can maintain a quantified confidence signal all the way to the output layer, reducing the risk of over‑confident mispredictions.
- Reduced need for sophisticated imputation: Because the confidence channel captures how trustworthy an imputed value is, even simple imputation strategies (mean, median) become viable, saving compute and engineering effort.
- Model interpretability: The tree‑based activations are human‑readable, allowing engineers to inspect how missingness and confidence influence neuron activation—a small step toward transparent deep models.
Limitations & Future Work
- Scope of architectures: Experiments focus on relatively shallow MLPs and small CNNs; scaling the approach to large transformers or graph neural networks remains an open question.
- GP search cost: While the final activations are cheap, the evolutionary search can be time‑consuming on very large datasets; future work could explore reinforcement‑learning‑based or gradient‑aware search methods.
- Confidence estimation: The current pipeline relies on external imputation confidence scores; integrating a learned confidence estimator directly into the network could further tighten the feedback loop.
- Theoretical guarantees: The paper provides empirical evidence but lacks formal analysis of how the propagated confidence bounds error propagation—an avenue for deeper statistical study.
Bottom line: By treating missingness and confidence as first‑class citizens in the activation function, 3C‑EA + ChannelProp offers a pragmatic, low‑overhead route to more reliable deep learning models when data is imperfect—a scenario that developers encounter far more often than textbook “complete” datasets.
Authors
- Naeem Shahabi Sani
- Ferial Najiantabriz
- Shayan Shafaei
- Dean F. Hougen
Paper Information
- arXiv ID: 2602.13864v1
- Categories: cs.NE, cs.LG
- Published: February 14, 2026
- PDF: Download PDF