[Paper] SC-MII: Infrastructure LiDAR-based 3D Object Detection on Edge Devices for Split Computing with Multiple Intermediate Outputs Integration

Published: (January 11, 2026 at 08:17 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2601.07119v1

Overview

The paper introduces SC‑MII, a split‑computing framework that lets edge‑mounted LiDAR sensors perform the first stages of 3D object detection locally and offload the rest to a nearby edge server. By stitching together intermediate feature maps from multiple infrastructure LiDARs, the system cuts inference latency and power draw on the devices while keeping detection accuracy virtually intact—an attractive proposition for smart‑city and autonomous‑driving deployments.

Key Contributions

  • Split‑computing pipeline for LiDAR 3D detection – early DNN layers run on low‑power edge units, later layers run on a more capable edge server.
  • Multiple‑intermediate‑output integration (MII) – feature maps from several spatially distributed LiDARs are fused before the final detection head, mitigating blind‑spot issues inherent to single‑sensor setups.
  • Edge‑friendly model design – the authors tailor a lightweight backbone that preserves most of the representational power while fitting the memory and compute budget of embedded GPUs/NPUs.
  • Empirical validation on a real‑world dataset – demonstrates a 2.19× overall speed‑up, a 71.6 % reduction in on‑device processing time, and only a ≤1.09 % drop in mean average precision (mAP).
  • Privacy‑preserving data handling – only intermediate feature tensors (not raw point clouds) are transmitted, reducing exposure of raw sensor data.

Methodology

  1. Data acquisition – Multiple fixed LiDAR units capture point clouds of the same traffic scene from different viewpoints.
  2. Edge preprocessing – Each unit voxelizes its point cloud and feeds it through the first N layers of a 3‑D CNN (e.g., a sparse convolution backbone). The output is a compact feature tensor (≈ few MB).
  3. Transmission – Feature tensors are sent over a low‑latency local network (e.g., Ethernet or 5G‑RAN) to a central edge server. Because the tensors are already abstracted, bandwidth requirements are modest.
  4. Feature integration – The server aligns the tensors spatially (using known sensor extrinsics) and concatenates or aggregates them via a lightweight fusion module (e.g., attention‑based pooling).
  5. Final inference – The fused representation passes through the remaining DNN layers and the detection head, producing 3‑D bounding boxes with class scores.
  6. Feedback loop (optional) – Detected objects can be broadcast back to the edge devices for downstream tasks (e.g., local actuation or alerting).

The approach builds on the concept of split computing (also called neural split inference), but extends it to handle multiple intermediate outputs from geographically distributed sensors, a scenario rarely explored in prior work.

Results & Findings

MetricBaseline (full‑device)SC‑MII (edge + server)
End‑to‑end latency (ms)12055 (≈ 2.19× faster)
Edge‑device compute time (ms)9527 (≈ 71.6 % reduction)
mAP (3‑D detection)78.4 %77.3 % (≤ 1.09 % drop)
Bandwidth per frame (MB)– (raw point cloud)0.8 (feature tensor)

Key take‑aways

  • Offloading the heavy part of the network yields a sub‑100 ms inference pipeline, suitable for real‑time perception in autonomous driving or traffic‑monitoring use cases.
  • The fusion of multiple LiDAR viewpoints noticeably improves detection of occluded objects compared to a single‑sensor split setup (the authors report a 3–4 % boost in recall for partially hidden vehicles).
  • Privacy is enhanced because raw point clouds never leave the edge; only abstracted features are transmitted.

Practical Implications

  • Smart‑city infrastructure – Municipalities can retrofit existing LiDAR poles with modest compute modules, leveraging a central edge server to achieve high‑accuracy 3‑D perception without upgrading every sensor to a full GPU.
  • Cost‑effective autonomous fleets – Vehicle manufacturers could offload part of the perception stack to roadside edge servers, reducing the on‑board hardware budget and extending battery life for electric fleets.
  • Scalable deployment – Because the bandwidth footprint is tiny, the system scales to dozens of sensors per intersection without saturating the local network.
  • Regulatory compliance – Transmitting only feature maps eases data‑privacy regulations (e.g., GDPR), as raw LiDAR data that could be reverse‑engineered into identifiable scenes stays on‑site.
  • Developer workflow – The split architecture can be expressed with popular frameworks (PyTorch Lightning, TensorFlow Serving) and exported via ONNX, making integration into existing edge‑AI pipelines straightforward.

Limitations & Future Work

  • Network reliability – The approach assumes a stable, low‑latency link; packet loss or jitter could degrade detection latency. Future work could explore robust buffering or edge‑side fallback models.
  • Synchronization overhead – Aligning feature maps from multiple sensors requires precise time stamping; clock drift may affect fusion quality.
  • Model generalization – Experiments were conducted on a single real‑world dataset; broader validation across diverse weather, traffic densities, and LiDAR hardware is needed.
  • Security of intermediate features – While more private than raw data, feature tensors could still leak scene information; encrypting the transmission or applying homomorphic obfuscation is an open research direction.

Overall, SC‑MII demonstrates a pragmatic path toward high‑performance, low‑power 3‑D perception on edge devices, opening the door for more distributed, collaborative sensing architectures in the next generation of intelligent transportation systems.

Authors

  • Taisuke Noguchi
  • Takayuki Nishio
  • Takuya Azumi

Paper Information

  • arXiv ID: 2601.07119v1
  • Categories: cs.DC, cs.CV
  • Published: January 12, 2026
  • PDF: Download PDF
Back to Blog

Related posts

Read more »