[Paper] ECCO: Leveraging Cross-Camera Correlations for Efficient Live Video Continuous Learning

Published: (December 12, 2025 at 12:07 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.11727v1

Overview

The paper introduces ECCO, a framework that makes continuous learning for live video streams far more efficient. By recognizing that cameras in the same area often see similar changes over time, ECCO groups them together and retrains a single shared model per group instead of one model per camera. This reduces both the compute load on GPUs and the bandwidth needed to ship training data, while actually improving accuracy.

Key Contributions

  • Cross‑camera grouping algorithm – a lightweight, online method that clusters cameras whose video streams exhibit correlated data drift.
  • Dynamic GPU allocator – a scheduler that flexibly divides GPU capacity among groups, balancing retraining quality and fairness.
  • Per‑camera transmission controller – adjusts frame sampling rates and coordinates bandwidth sharing based on the group’s GPU share.
  • Empirical validation – experiments on three real‑world datasets (object detection & classification) show 6.7‑18.1 % higher accuracy for the same resource budget, or the ability to support 3.3× more cameras at a fixed accuracy level.

Methodology

  1. Detecting Drift Correlation

    • Each camera continuously monitors simple statistics (e.g., feature distribution shifts) on its incoming frames.
    • A low‑overhead similarity metric is computed between cameras; when two streams drift in the same direction, they become candidates for grouping.
  2. Dynamic Group Formation

    • The grouping algorithm runs periodically, merging or splitting groups as drift patterns evolve.
    • Groups are kept small enough to avoid “one‑size‑fits‑all” degradation, yet large enough to reap sharing benefits.
  3. Resource‑aware Retraining

    • A central GPU allocator receives the current group list and their desired training workloads.
    • It assigns GPU time slices (or memory partitions) to each group, ensuring that groups with higher drift get more compute while still giving a baseline to all groups.
  4. Adaptive Frame Sampling & Bandwidth Sharing

    • Each camera’s transmission controller throttles the frame rate it sends to the training pipeline, proportional to the GPU share its group received.
    • Cameras can also borrow bandwidth from less‑active peers, smoothing network spikes.
  5. Continuous Learning Loop

    • Collected frames are used to fine‑tune the shared model for the group.
    • Updated model weights are pushed back to all cameras in the group, completing the loop.

Results & Findings

MetricBaseline (per‑camera retraining)ECCO (same resources)ECCO (same accuracy)
Retraining accuracy gain+6.7 % to +18.1 %
Supported concurrent cameras3.3×
GPU utilizationOften idle on many camerasNear‑full utilization across groupsBalanced
Network trafficLinear with camera count~30 % reduction (thanks to shared sampling)

Key takeaways

  • Grouping cameras that drift together not only cuts cost but also yields better models because the shared dataset is richer.
  • The dynamic GPU allocator prevents “starvation” of high‑drift groups while still giving low‑drift groups enough compute to stay up‑to‑date.
  • Adaptive sampling keeps the bandwidth within realistic limits even when dozens of cameras are active.

Practical Implications

  • Scalable Edge Analytics – Operators of smart‑city cameras, retail stores, or industrial monitoring can now run continuous learning on hundreds of streams without a proportional increase in GPU clusters or network upgrades.
  • Cost Savings – By reusing compute and reducing uplink traffic, cloud‑based video analytics services can lower their infrastructure bills dramatically.
  • Simplified Deployment – ECCO’s grouping and resource allocation are fully automated; developers only need to plug in their existing lightweight DNNs.
  • Improved Model Freshness – Faster adaptation to lighting changes, seasonal variations, or new object appearances translates into higher detection/recognition reliability in production.

Limitations & Future Work

  • Assumption of Spatial Correlation – ECCO works best when cameras are geographically close; highly heterogeneous scenes (e.g., indoor vs. outdoor) may not benefit from grouping.
  • Group Size Upper Bound – Very large groups could dilute specific nuances; the paper suggests a heuristic cap but leaves optimal sizing as an open problem.
  • GPU‑Centric Allocation – The current scheduler focuses on GPU time; extending it to heterogeneous accelerators (TPUs, NPUs) or CPU‑only edge nodes is future work.
  • Security & Privacy – Sharing frames across cameras raises privacy concerns; integrating encryption or on‑device differential privacy is a suggested direction.

Overall, ECCO demonstrates that cross‑camera collaboration is a practical lever for making continuous video learning both affordable and more accurate, opening the door for truly large‑scale, adaptive video analytics deployments.

Authors

  • Yuze He
  • Ferdi Kossmann
  • Srinivasan Seshan
  • Peter Steenkiste

Paper Information

  • arXiv ID: 2512.11727v1
  • Categories: cs.DC, cs.LG, cs.NI
  • Published: December 12, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »