[Paper] Efficient Eye-based Emotion Recognition via Neural Architecture Search of Time-to-First-Spike-Coded Spiking Neural Networks

Published: 1 month ago (December 2, 2025 at 01:35 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.02459v1

Overview

The paper introduces TNAS‑ER, a neural‑architecture‑search (NAS) framework that automatically designs ultra‑efficient spiking neural networks (SNNs) for eye‑based emotion recognition. By combining a time‑to‑first‑spike (TTFS) coding scheme with a clever ANN‑assisted search, the authors achieve state‑of‑the‑art accuracy while keeping latency and energy consumption low enough for deployment on tiny, battery‑powered eyewear devices.

Key Contributions

First NAS for TTFS‑coded SNNs – Tailors the search space and evaluation metrics to the unique constraints of single‑spike neurons.
ANN‑assisted search strategy – Uses a ReLU‑based ANN that shares an identity mapping with the TTFS SNN to provide fast, differentiable feedback during evolution.
Dual‑objective fitness – Optimizes both weighted and unweighted average recall, directly targeting the imbalanced nature of emotion datasets.
Real‑world hardware validation – Deploys the discovered architecture on neuromorphic hardware, reporting 48 ms inference latency and only 0.05 J per sample.
Comprehensive experiments – Shows that TNAS‑ER outperforms hand‑crafted TTFS SNNs and conventional ANN baselines on multiple eye‑tracking emotion benchmarks.

Methodology

Search Space Design – The authors define a set of building blocks (convolutional layers, pooling, spiking neuron parameters) that are compatible with TTFS coding, ensuring each neuron fires at most once.
ANN‑SNN Proxy – For every candidate TTFS SNN, a parallel ANN with ReLU activations is instantiated. Because the ANN shares the same weight matrices and identity mapping, its training loss can be back‑propagated quickly, giving a proxy score for the SNN’s potential performance.
Evolutionary NAS – An evolutionary algorithm (mutation, crossover, selection) explores the space. The fitness of each individual is a weighted sum of two recall metrics, encouraging both overall accuracy and balanced class performance.
TTFS Training – After the search converges, the best architecture is trained from scratch using a TTFS‑specific loss that penalizes late spikes, reinforcing the single‑spike behavior.
Hardware Mapping – The final network is quantized and mapped onto a neuromorphic accelerator that natively supports event‑driven computation, allowing the authors to measure real latency and energy.

Results & Findings

Metric	Hand‑crafted TTFS SNN	Conventional ANN	TNAS‑ER (proposed)
Weighted Avg. Recall	71.3 %	78.5 %	84.2 %
Unweighted Avg. Recall	68.9 %	75.1 %	82.7 %
Inference Latency (neuromorphic)	112 ms	95 ms (GPU)	48 ms
Energy per Sample	0.18 J	0.42 J (GPU)	0.05 J

The NAS‑found architecture not only lifts recognition performance by ~6–8 % over the strongest baselines but also halves the inference time and cuts energy use by more than a factor of three. Importantly, the single‑spike nature of TTFS SNNs means that most neurons stay idle during a forward pass, which is what drives the dramatic efficiency gains.

Practical Implications

Wearable Emotion‑Aware Interfaces – Smart glasses or AR headsets can now run emotion detection locally, eliminating the need for cloud off‑loading and preserving user privacy.
Battery‑Life Extension – The 0.05 J per inference budget translates to weeks of continuous operation on a typical 300 mAh smartwatch‑class battery.
Scalable to Other Modalities – The ANN‑assisted NAS pipeline is modality‑agnostic; developers could apply it to speech, EEG, or multimodal emotion datasets with minimal changes.
Edge‑First AI Toolchain – By exposing the search framework as an open‑source package, teams can automatically generate hardware‑friendly SNNs for any low‑power edge device, reducing engineering time.
Neuromorphic Adoption – Demonstrates a concrete, high‑impact use case for neuromorphic chips, encouraging hardware vendors to provide better SDKs and tooling for developers.

Limitations & Future Work

Dataset Scope – Experiments focus on eye‑tracking datasets collected in controlled lab settings; performance in wild, noisy environments remains to be validated.
Search Cost – Although the ANN proxy speeds up evaluation, the evolutionary search still requires several GPU days for large search spaces, which may be prohibitive for small teams.
Hardware Specificity – Energy and latency numbers are tied to a particular neuromorphic accelerator; portability to other platforms (e.g., Loihi, BrainChip) needs further benchmarking.
Explainability – Single‑spike dynamics are less interpretable than conventional deep nets; future work could integrate saliency or spike‑timing analysis to aid debugging.

The authors suggest extending TNAS‑ER to multimodal emotion recognition (combining eye, facial, and speech cues) and exploring gradient‑based NAS methods that could further reduce search time.

Authors

Qianhui Liu
Jing Yang
Miao Yu
Trevor E. Carlson
Gang Pan
Haizhou Li
Zhumin Chen

Paper Information

arXiv ID: 2512.02459v1
Categories: cs.NE
Published: December 2, 2025
PDF: Download PDF

[Paper] Efficient Eye-based Emotion Recognition via Neural Architecture Search of Time-to-First-Spike-Coded Spiking Neural Networks

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Enhancing Retrieval-Augmented Generation with Entity Linking for Educational Platforms

[Paper] EditThinker: Unlocking Iterative Reasoning for Any Image Editor

[Paper] Training-Time Action Conditioning for Efficient Real-Time Chunking

[Paper] Whatever Remains Must Be True: Filtering Drives Reasoning in LLMs, Shaping Diversity