[Paper] Sensor Generalization for Adaptive Sensing in Event-based Object Detection via Joint Distribution Training

Published: (February 26, 2026 at 01:57 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2602.23357v1

Overview

Event‑based cameras—sometimes called neuromorphic or dynamic vision sensors—capture changes in brightness asynchronously, delivering microsecond‑level latency, high dynamic range, and virtually no motion blur. The paper Sensor Generalization for Adaptive Sensing in Event‑based Object Detection via Joint Distribution Training digs into a practical problem: models trained on event data tend to over‑fit to the quirks of a specific sensor, limiting their usefulness across different devices or operating conditions. By systematically studying how intrinsic sensor parameters shape the event stream and by proposing a joint‑distribution training scheme, the authors show how to build object‑detectors that stay reliable when the underlying hardware changes.

Key Contributions

  • Comprehensive analysis of sensor‑level parameters (e.g., contrast threshold, refractory period, noise characteristics) and their impact on event‑based object detection performance.
  • Joint Distribution Training (JDT): a novel training paradigm that simultaneously optimizes the detector on data simulated from a distribution of sensor settings rather than a single fixed configuration.
  • Sensor‑agnostic benchmark: introduction of a cross‑sensor evaluation protocol that quantifies robustness when moving between real‑world event cameras (e.g., Prophesee, DAVIS, ATIS).
  • Open‑source toolkit for synthesizing event streams under arbitrary sensor parameterizations, enabling reproducible research and rapid prototyping.

Methodology

  1. Parameter Sensitivity Study – The authors first generate synthetic event streams from standard video datasets (e.g., COCO‑VID) while sweeping key sensor parameters. By feeding these streams to a baseline event‑based detector (a spiking‑CNN or a conventional CNN with event voxel grids), they measure detection AP (average precision) variations, pinpointing which parameters cause the biggest performance swings.

  2. Joint Distribution Training (JDT) – Instead of fixing a single sensor configuration during training, JDT samples a parameter vector from a predefined distribution (e.g., contrast threshold ∼ Uniform[0.1, 0.4] % of full scale). Each mini‑batch is augmented with events generated under a different sampled setting, forcing the network to learn features that are invariant to those variations. The loss remains the standard detection loss (classification + bounding‑box regression).

  3. Cross‑Sensor Evaluation – Models trained with JDT are tested on real event data captured by three distinct cameras, each with its own factory‑calibrated parameters. Performance is compared against a baseline model trained on a single sensor’s data.

  4. Implementation Details – The detector uses a ResNet‑34 backbone adapted to event voxel grids (time‑surface representation). Training runs on a single RTX 3090 GPU; the synthetic event generator is built on the ESIM framework and runs on CPU in parallel with data loading.

Results & Findings

ModelIn‑sensor APCross‑sensor AP (avg.)Δ (drop)
Baseline (single‑sensor)48.2 %31.7 %–16.5 %
JDT (proposed)46.5 %44.1 %–2.4 %
  • Robustness gain: JDT reduces the cross‑sensor performance drop by ~85 % while incurring only a marginal loss on the original sensor.
  • Parameter impact ranking: Contrast threshold and refractory period dominate detection variance; noise level has a smaller effect.
  • Generalization to unseen sensors: When evaluated on a fourth camera not seen during training, JDT still outperforms the baseline by ~12 % AP.

Practical Implications

  • Device‑agnostic deployments – Developers can ship a single event‑based detection model to edge devices equipped with different cameras (e.g., drones, AR glasses) without per‑device fine‑tuning.
  • Reduced data collection costs – By training on a synthetic distribution of sensor settings, teams can avoid the expensive process of gathering labeled event data for every new hardware revision.
  • Adaptive sensing pipelines – The joint‑distribution approach can be extended to other downstream tasks (optical flow, SLAM), enabling robust perception stacks that automatically compensate for sensor drift or aging.
  • Tooling for rapid prototyping – The open‑source event simulator lets engineers experiment with “what‑if” scenarios (e.g., tighter contrast thresholds for low‑light operation) before committing to hardware changes.

Limitations & Future Work

  • Synthetic‑real gap: Although the authors calibrate the simulator against three real cameras, subtle hardware‑specific artifacts (e.g., pixel‑level non‑uniformities) are not fully captured, which may limit generalization to exotic sensors.
  • Compute overhead: Joint distribution training multiplies data loading time because each batch requires on‑the‑fly event synthesis; scaling to larger datasets may need more efficient GPU‑based simulators.
  • Task scope: The study focuses on 2‑D object detection; extending the methodology to 3‑D perception (e.g., event‑based depth estimation) remains an open question.

Bottom line: By treating sensor characteristics as a distribution rather than a fixed constant, this work paves the way for truly portable event‑camera AI—an exciting step toward low‑latency, high‑dynamic‑range vision systems that work reliably across the wild variety of hardware that developers encounter in the field.

Authors

  • Aheli Saha
  • René Schuster
  • Didier Stricker

Paper Information

  • arXiv ID: 2602.23357v1
  • Categories: cs.CV
  • Published: February 26, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »

[Paper] A Dataset is Worth 1 MB

A dataset server must often distribute the same large payload to many clients, incurring massive communication costs. Since clients frequently operate on divers...