[Paper] ECCO: Leveraging Cross-Camera Correlations for Efficient Live Video Continuous Learning

Published: 1 month ago (December 12, 2025 at 12:07 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.11727v1

Overview

The paper introduces ECCO, a framework that makes continuous learning for live video streams far more efficient. By recognizing that cameras in the same area often see similar changes over time, ECCO groups them together and retrains a single shared model per group instead of one model per camera. This reduces both the compute load on GPUs and the bandwidth needed to ship training data, while actually improving accuracy.

Key Contributions

Cross‑camera grouping algorithm – a lightweight, online method that clusters cameras whose video streams exhibit correlated data drift.
Dynamic GPU allocator – a scheduler that flexibly divides GPU capacity among groups, balancing retraining quality and fairness.
Per‑camera transmission controller – adjusts frame sampling rates and coordinates bandwidth sharing based on the group’s GPU share.
Empirical validation – experiments on three real‑world datasets (object detection & classification) show 6.7‑18.1 % higher accuracy for the same resource budget, or the ability to support 3.3× more cameras at a fixed accuracy level.

Methodology

Detecting Drift Correlation
- Each camera continuously monitors simple statistics (e.g., feature distribution shifts) on its incoming frames.
- A low‑overhead similarity metric is computed between cameras; when two streams drift in the same direction, they become candidates for grouping.
Dynamic Group Formation
- The grouping algorithm runs periodically, merging or splitting groups as drift patterns evolve.
- Groups are kept small enough to avoid “one‑size‑fits‑all” degradation, yet large enough to reap sharing benefits.
Resource‑aware Retraining
- A central GPU allocator receives the current group list and their desired training workloads.
- It assigns GPU time slices (or memory partitions) to each group, ensuring that groups with higher drift get more compute while still giving a baseline to all groups.
Adaptive Frame Sampling & Bandwidth Sharing
- Each camera’s transmission controller throttles the frame rate it sends to the training pipeline, proportional to the GPU share its group received.
- Cameras can also borrow bandwidth from less‑active peers, smoothing network spikes.
Continuous Learning Loop
- Collected frames are used to fine‑tune the shared model for the group.
- Updated model weights are pushed back to all cameras in the group, completing the loop.

Results & Findings

Metric	Baseline (per‑camera retraining)	ECCO (same resources)	ECCO (same accuracy)
Retraining accuracy gain	–	+6.7 % to +18.1 %	–
Supported concurrent cameras	1×	–	3.3×
GPU utilization	Often idle on many cameras	Near‑full utilization across groups	Balanced
Network traffic	Linear with camera count	~30 % reduction (thanks to shared sampling)	–

Key takeaways

Grouping cameras that drift together not only cuts cost but also yields better models because the shared dataset is richer.
The dynamic GPU allocator prevents “starvation” of high‑drift groups while still giving low‑drift groups enough compute to stay up‑to‑date.
Adaptive sampling keeps the bandwidth within realistic limits even when dozens of cameras are active.

Practical Implications

Scalable Edge Analytics – Operators of smart‑city cameras, retail stores, or industrial monitoring can now run continuous learning on hundreds of streams without a proportional increase in GPU clusters or network upgrades.
Cost Savings – By reusing compute and reducing uplink traffic, cloud‑based video analytics services can lower their infrastructure bills dramatically.
Simplified Deployment – ECCO’s grouping and resource allocation are fully automated; developers only need to plug in their existing lightweight DNNs.
Improved Model Freshness – Faster adaptation to lighting changes, seasonal variations, or new object appearances translates into higher detection/recognition reliability in production.

Limitations & Future Work

Assumption of Spatial Correlation – ECCO works best when cameras are geographically close; highly heterogeneous scenes (e.g., indoor vs. outdoor) may not benefit from grouping.
Group Size Upper Bound – Very large groups could dilute specific nuances; the paper suggests a heuristic cap but leaves optimal sizing as an open problem.
GPU‑Centric Allocation – The current scheduler focuses on GPU time; extending it to heterogeneous accelerators (TPUs, NPUs) or CPU‑only edge nodes is future work.
Security & Privacy – Sharing frames across cameras raises privacy concerns; integrating encryption or on‑device differential privacy is a suggested direction.

Overall, ECCO demonstrates that cross‑camera collaboration is a practical lever for making continuous video learning both affordable and more accurate, opening the door for truly large‑scale, adaptive video analytics deployments.

Authors

Yuze He
Ferdi Kossmann
Srinivasan Seshan
Peter Steenkiste

Paper Information

arXiv ID: 2512.11727v1
Categories: cs.DC, cs.LG, cs.NI
Published: December 12, 2025
PDF: Download PDF

[Paper] ECCO: Leveraging Cross-Camera Correlations for Efficient Live Video Continuous Learning

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Particulate: Feed-Forward 3D Object Articulation

[Paper] A General Algorithm for Detecting Higher-Order Interactions via Random Sequential Additions

[Paper] Softmax as Linear Attention in the Large-Prompt Regime: a Measure-based Perspective

[Paper] Super Suffixes: Bypassing Text Generation Alignment and Guard Models Simultaneously