[Paper] Neuromorphic Eye Tracking for Low-Latency Pupil Detection

Published: 2 months ago (December 10, 2025 at 06:30 AM EST)

3 min read

Source: arXiv

Source: arXiv - 2512.09969v1

Overview

The paper introduces a neuromorphic eye‑tracking pipeline that can locate a user’s pupil with sub‑5‑pixel error while consuming only a few milliwatts and delivering sub‑3 ms latency. By converting a state‑of‑the‑art event‑based eye‑tracking network into a spiking neural network (SNN) built from leaky‑integrate‑and‑fire (LIF) layers and depth‑wise separable convolutions, the authors demonstrate that high‑accuracy gaze estimation is possible on ultra‑low‑power hardware—an essential step for truly responsive AR/VR wearables.

Key Contributions

Neuromorphic redesign of a top‑performing event‑based eye‑tracker – replaces heavy recurrent/attention blocks with lightweight LIF layers.
Model compression – achieves a 20× reduction in parameter count and an 850× drop in theoretical FLOPs versus the closest ANN baseline.
Latency‑power trade‑off – projected to run at 3.9–4.9 mW with ~3 ms end‑to‑end latency at a 1 kHz event stream.
Accuracy close to specialized hardware – mean pupil‑center error of 3.7–4.1 px, comparable to the Retina neuromorphic system (3.24 px).
Generalizable design pattern – shows how depth‑wise separable convolutions and LIF neurons can replace complex ANN modules without sacrificing performance.

Methodology

Event‑based input – The system consumes asynchronous events from a Dynamic Vision Sensor (DVS) rather than conventional video frames, preserving microsecond‑level temporal detail and eliminating motion blur.
Network architecture – Starting from a high‑performing ANN eye‑tracker, the authors:
- Swap recurrent and attention modules for stacks of LIF neurons that naturally process spikes over time.
- Replace standard convolutions with depth‑wise separable convolutions, dramatically cutting parameters and multiply‑accumulate operations.
Training pipeline – The SNN is trained using surrogate gradient methods that approximate the non‑differentiable spiking function, allowing back‑propagation on the same labeled event datasets used for the ANN baseline.
Efficiency estimation – Theoretical compute (MACs) is calculated for both ANN and SNN versions, and power/latency are projected using published neuromorphic accelerator specifications (e.g., Intel Loihi, BrainChip Akida).

Results & Findings

Model	Mean Pupil Error (px)	Params (M)	Theoretical MACs (M)	Estimated Power (mW)	Latency (ms)
Original ANN (baseline)	3.5	2.1	1,800	~3,200	6
Neuromorphic SNN (proposed)	3.7‑4.1	0.10	2.1	3.9‑4.9	~3
Retina hardware system	3.24	–	–	–	–

The SNN retains near‑state‑of‑the‑art accuracy while slashing model size by 20× and compute by ~850×.
Power and latency estimates place the SNN comfortably within the milliwatt budget of battery‑powered AR glasses, with a response fast enough to support gaze‑contingent rendering (≈300 Hz effective update rate).

Practical Implications

AR/VR headsets – Real‑time gaze‑aware rendering can now be done on‑device without offloading to a GPU or cloud, reducing bandwidth, preserving privacy, and extending battery life.
Assistive wearables – Low‑power eye‑tracking enables eye‑controlled interfaces for users with limited motor ability, even on compact form factors like smart glasses.
Human‑computer interaction research – Researchers can prototype gaze‑driven UI concepts without needing expensive high‑speed cameras; the event‑based pipeline works robustly under rapid head motion.
Edge AI hardware – The design aligns with existing neuromorphic chips (Loihi, Akida, BrainWave), making it straightforward to integrate into next‑generation edge processors that already support spiking inference.

Limitations & Future Work

Hardware validation – Power and latency numbers are projected; real‑world measurements on a physical neuromorphic accelerator are needed to confirm the gains.
Dataset diversity – Experiments focus on a single event‑based eye‑tracking benchmark; broader testing across lighting conditions, eye shapes, and occlusions would strengthen generalizability.
Robustness to sensor noise – DVS sensors can produce noisy spikes under low‑light; future work could explore adaptive thresholding or noise‑aware training.
Integration with full AR pipelines – Combining the SNN eye‑tracker with downstream gaze‑contingent rendering or foveated rendering modules remains an open systems‑engineering challenge.

Bottom line: By marrying event‑driven vision with spiking neural networks, this work shows that high‑precision, low‑latency eye tracking is no longer a power‑hungry afterthought—it can become a native capability of next‑generation wearable devices.

Authors

Paul Hueber
Luca Peres
Florian Pitters
Alejandro Gloriani
Oliver Rhodes

Paper Information

arXiv ID: 2512.09969v1
Categories: cs.CV, cs.NE
Published: December 10, 2025
PDF: Download PDF

[Paper] Neuromorphic Eye Tracking for Low-Latency Pupil Detection

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Moment-Based 3D Gaussian Splatting: Resolving Volumetric Occlusion with Order-Independent Transmittance

[Paper] V-RGBX: Video Editing with Accurate Controls over Intrinsic Properties

[Paper] Particulate: Feed-Forward 3D Object Articulation

[Paper] AnchorDream: Repurposing Video Diffusion for Embodiment-Aware Robot Data Synthesis