[Paper] Evolving Multi-Channel Confidence-Aware Activation Functions for Missing Data with Channel Propagation

Published: (February 14, 2026 at 02:52 PM EST)
5 min read
Source: arXiv

Source: arXiv - 2602.13864v1

Overview

Missing data is a perennial headache for anyone building machine‑learning pipelines, and neural networks are no exception. In a new paper, Shahabi Sani et al. introduce Three‑Channel Evolved Activations (3C‑EA)—a family of activation functions that explicitly ingest not just the raw feature value, but also a missingness indicator and an imputation confidence score. Coupled with a deterministic propagation scheme called ChannelProp, the approach keeps these “reliability signals” alive throughout the network, delivering noticeably better classification results on a variety of incomplete datasets.

Key Contributions

  • Multi‑channel activation functions: Evolved via Genetic Programming to compute f(x, m, c), where x is the feature, m flags missingness, and c quantifies confidence in any imputed value.
  • ChannelProp algorithm: A lightweight, linear‑layer‑based method that propagates missingness (m) and confidence (c) forward, using weight magnitudes to decide how much signal to carry.
  • End‑to‑end evaluation: Systematic experiments on both naturally incomplete benchmarks and synthetically corrupted versions (MCAR, MAR, MNAR) across several missing‑rate regimes.
  • Open‑source implementation: The authors release the GP‑based activation search and ChannelProp code, making it easy for practitioners to plug into existing PyTorch/TensorFlow models.

Methodology

  1. Data preparation – Each input vector is augmented with two extra channels:

    • m ∈ {0,1} (1 = missing, 0 = observed)
    • c ∈ [0,1] (higher values mean the imputed value is more trustworthy).
      Standard imputation (e.g., mean, k‑NN) fills the missing entries, producing the x values that the network will actually see.
  2. Genetic Programming (GP) search

    • The search space consists of arithmetic and elementary functions (add, mul, sin, max, etc.) that can combine the three inputs.
    • Individuals are tree‑structured expressions; fitness is measured by validation accuracy on a downstream classification task.
    • Evolution runs for a fixed number of generations, yielding a Pareto front of compact, high‑performing activations.
  3. ChannelProp propagation

    • After each linear layer, the missingness and confidence channels are updated deterministically:

[ m’ = \sigma\big(|W| \cdot m\big), \qquad c’ = \sigma\big(|W| \cdot c\big) ]

where |W| are absolute weight magnitudes and σ is a soft‑threshold that keeps the signals bounded.

  • This step ensures that downstream layers receive a graded sense of how reliable each feature is, rather than a binary “present/absent” flag that would be lost after the first hidden layer.
  1. Training – The network (e.g., a 3‑layer MLP or a small CNN) is trained with standard back‑propagation; only the activation functions are fixed after the GP search.

Results & Findings

Dataset (missingness)Baseline (ReLU)3C‑EA + ChannelPropRelative gain
UCI Adult (MCAR 30%)81.2 % acc84.5 %+3.3 %
MNIST (MNAR 40%)92.1 % acc94.8 %+2.7 %
Credit Card (natural)88.6 % acc90.9 %+2.3 %
  • Consistent improvements across MCAR, MAR, and MNAR regimes, especially as the missing‑rate climbs beyond 30 %.
  • Ablation shows that using only the missingness flag (f(x,m)) yields modest gains, while adding the confidence channel (c) provides the bulk of the performance lift.
  • Computational overhead is negligible: the evolved activation trees typically contain ≤ 5 nodes, and ChannelProp adds a single linear pass per layer (≈ 1 % extra FLOPs).

Practical Implications

  • Plug‑and‑play reliability: Developers can augment any existing feed‑forward or convolutional model with three extra channels and swap in a 3C‑EA activation without redesigning the architecture.
  • Robustness in production pipelines: Systems that routinely ingest noisy, partially observed data (e.g., IoT sensor streams, medical records, recommender systems) can maintain a quantified confidence signal all the way to the output layer, reducing the risk of over‑confident mispredictions.
  • Reduced need for sophisticated imputation: Because the confidence channel captures how trustworthy an imputed value is, even simple imputation strategies (mean, median) become viable, saving compute and engineering effort.
  • Model interpretability: The tree‑based activations are human‑readable, allowing engineers to inspect how missingness and confidence influence neuron activation—a small step toward transparent deep models.

Limitations & Future Work

  • Scope of architectures: Experiments focus on relatively shallow MLPs and small CNNs; scaling the approach to large transformers or graph neural networks remains an open question.
  • GP search cost: While the final activations are cheap, the evolutionary search can be time‑consuming on very large datasets; future work could explore reinforcement‑learning‑based or gradient‑aware search methods.
  • Confidence estimation: The current pipeline relies on external imputation confidence scores; integrating a learned confidence estimator directly into the network could further tighten the feedback loop.
  • Theoretical guarantees: The paper provides empirical evidence but lacks formal analysis of how the propagated confidence bounds error propagation—an avenue for deeper statistical study.

Bottom line: By treating missingness and confidence as first‑class citizens in the activation function, 3C‑EA + ChannelProp offers a pragmatic, low‑overhead route to more reliable deep learning models when data is imperfect—a scenario that developers encounter far more often than textbook “complete” datasets.

Authors

  • Naeem Shahabi Sani
  • Ferial Najiantabriz
  • Shayan Shafaei
  • Dean F. Hougen

Paper Information

  • arXiv ID: 2602.13864v1
  • Categories: cs.NE, cs.LG
  • Published: February 14, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »