[Paper] Efficient Eye-based Emotion Recognition via Neural Architecture Search of Time-to-First-Spike-Coded Spiking Neural Networks
Source: arXiv - 2512.02459v1
Overview
The paper introduces TNAS‑ER, a neural‑architecture‑search (NAS) framework that automatically designs ultra‑efficient spiking neural networks (SNNs) for eye‑based emotion recognition. By combining a time‑to‑first‑spike (TTFS) coding scheme with a clever ANN‑assisted search, the authors achieve state‑of‑the‑art accuracy while keeping latency and energy consumption low enough for deployment on tiny, battery‑powered eyewear devices.
Key Contributions
- First NAS for TTFS‑coded SNNs – Tailors the search space and evaluation metrics to the unique constraints of single‑spike neurons.
- ANN‑assisted search strategy – Uses a ReLU‑based ANN that shares an identity mapping with the TTFS SNN to provide fast, differentiable feedback during evolution.
- Dual‑objective fitness – Optimizes both weighted and unweighted average recall, directly targeting the imbalanced nature of emotion datasets.
- Real‑world hardware validation – Deploys the discovered architecture on neuromorphic hardware, reporting 48 ms inference latency and only 0.05 J per sample.
- Comprehensive experiments – Shows that TNAS‑ER outperforms hand‑crafted TTFS SNNs and conventional ANN baselines on multiple eye‑tracking emotion benchmarks.
Methodology
- Search Space Design – The authors define a set of building blocks (convolutional layers, pooling, spiking neuron parameters) that are compatible with TTFS coding, ensuring each neuron fires at most once.
- ANN‑SNN Proxy – For every candidate TTFS SNN, a parallel ANN with ReLU activations is instantiated. Because the ANN shares the same weight matrices and identity mapping, its training loss can be back‑propagated quickly, giving a proxy score for the SNN’s potential performance.
- Evolutionary NAS – An evolutionary algorithm (mutation, crossover, selection) explores the space. The fitness of each individual is a weighted sum of two recall metrics, encouraging both overall accuracy and balanced class performance.
- TTFS Training – After the search converges, the best architecture is trained from scratch using a TTFS‑specific loss that penalizes late spikes, reinforcing the single‑spike behavior.
- Hardware Mapping – The final network is quantized and mapped onto a neuromorphic accelerator that natively supports event‑driven computation, allowing the authors to measure real latency and energy.
Results & Findings
| Metric | Hand‑crafted TTFS SNN | Conventional ANN | TNAS‑ER (proposed) |
|---|---|---|---|
| Weighted Avg. Recall | 71.3 % | 78.5 % | 84.2 % |
| Unweighted Avg. Recall | 68.9 % | 75.1 % | 82.7 % |
| Inference Latency (neuromorphic) | 112 ms | 95 ms (GPU) | 48 ms |
| Energy per Sample | 0.18 J | 0.42 J (GPU) | 0.05 J |
The NAS‑found architecture not only lifts recognition performance by ~6–8 % over the strongest baselines but also halves the inference time and cuts energy use by more than a factor of three. Importantly, the single‑spike nature of TTFS SNNs means that most neurons stay idle during a forward pass, which is what drives the dramatic efficiency gains.
Practical Implications
- Wearable Emotion‑Aware Interfaces – Smart glasses or AR headsets can now run emotion detection locally, eliminating the need for cloud off‑loading and preserving user privacy.
- Battery‑Life Extension – The 0.05 J per inference budget translates to weeks of continuous operation on a typical 300 mAh smartwatch‑class battery.
- Scalable to Other Modalities – The ANN‑assisted NAS pipeline is modality‑agnostic; developers could apply it to speech, EEG, or multimodal emotion datasets with minimal changes.
- Edge‑First AI Toolchain – By exposing the search framework as an open‑source package, teams can automatically generate hardware‑friendly SNNs for any low‑power edge device, reducing engineering time.
- Neuromorphic Adoption – Demonstrates a concrete, high‑impact use case for neuromorphic chips, encouraging hardware vendors to provide better SDKs and tooling for developers.
Limitations & Future Work
- Dataset Scope – Experiments focus on eye‑tracking datasets collected in controlled lab settings; performance in wild, noisy environments remains to be validated.
- Search Cost – Although the ANN proxy speeds up evaluation, the evolutionary search still requires several GPU days for large search spaces, which may be prohibitive for small teams.
- Hardware Specificity – Energy and latency numbers are tied to a particular neuromorphic accelerator; portability to other platforms (e.g., Loihi, BrainChip) needs further benchmarking.
- Explainability – Single‑spike dynamics are less interpretable than conventional deep nets; future work could integrate saliency or spike‑timing analysis to aid debugging.
The authors suggest extending TNAS‑ER to multimodal emotion recognition (combining eye, facial, and speech cues) and exploring gradient‑based NAS methods that could further reduce search time.
Authors
- Qianhui Liu
- Jing Yang
- Miao Yu
- Trevor E. Carlson
- Gang Pan
- Haizhou Li
- Zhumin Chen
Paper Information
- arXiv ID: 2512.02459v1
- Categories: cs.NE
- Published: December 2, 2025
- PDF: Download PDF