[Paper] RedunCut: Measurement-Driven Sampling and Accuracy Performance Modeling for Low-Cost Live Video Analytics
Source: arXiv - 2512.24386v1
Overview
Live video analytics (LVA) powers everything from traffic‑monitoring dashboards to drone‑based inspection pipelines, but running state‑of‑the‑art vision models on every frame quickly becomes prohibitively expensive. The paper RedunCut proposes a smarter way to pick the “right‑sized” model for each video segment on the fly, cutting compute costs by up to two‑thirds while keeping accuracy guarantees intact.
Key Contributions
- Measurement‑driven sampling planner – a runtime component that decides whether and how many models to sample based on a cost‑benefit analysis, avoiding wasteful over‑sampling.
- Lightweight data‑driven accuracy model – a fast predictor that estimates per‑segment accuracy for each candidate model size, improving the selection decision without needing ground‑truth labels.
- Robustness to diverse workloads – demonstrated on road‑vehicle, drone, and surveillance footage, covering multiple model families (e.g., YOLO, EfficientDet) and tasks (object detection, classification).
- Empirical savings of 14‑62 % in compute at fixed accuracy across all tested scenarios, even when only a small history of past runs is available or when video content drifts over time.
- No model retraining required – RedunCut works with existing black‑box models, making it drop‑in compatible with current LVA pipelines.
Methodology
- Segment‑wise decision loop – The video stream is broken into short segments (e.g., a few seconds). For each segment RedunCut must pick a model size (small, medium, large, …).
- Planner stage – Before sampling, a lightweight planner estimates the expected reduction in compute if a cheaper model is chosen versus the overhead of sampling a few models to gather statistics. The planner uses recent runtime measurements (latency, confidence distributions) to decide the optimal number of samples.
- Sampling stage – If the planner decides sampling is worthwhile, RedunCut runs a small subset of candidate models on a few frames, collects confidence scores, and feeds them to the accuracy predictor.
- Accuracy predictor – Trained offline on a modest labeled dataset, this model learns the relationship between observable statistics (e.g., average confidence, entropy) and the true accuracy of each candidate model for the current video domain. It runs in microseconds, so it does not add noticeable overhead.
- Model selection – The predictor outputs an estimated accuracy for each candidate; RedunCut then picks the smallest model that meets the user‑specified accuracy target. The chosen model processes the rest of the segment, and the loop repeats for the next segment.
The whole pipeline is designed to be measurement‑driven: every decision is grounded in actual runtime data rather than static heuristics, which lets the system adapt to changing lighting, motion, or scene composition.
Results & Findings
| Dataset / Task | Accuracy Target | Compute Reduction vs. Baseline | Observations |
|---|---|---|---|
| Road‑vehicle (YOLO‑v5) – object detection | 90 % mAP | 62 % lower FLOPs | Sampling overhead stayed < 5 % of total cost |
| Drone footage (EfficientDet) – detection | 85 % mAP | 48 % lower FLOPs | Predictor remained accurate despite rapid viewpoint changes |
| Surveillance (ResNet‑50) – classification | 92 % top‑1 | 14 % lower FLOPs | Gains modest but consistent; planner avoided unnecessary sampling |
| Limited history (≤ 5 min) | 90 % mAP | 30‑55 % reduction | System quickly converged to reliable estimates |
| Concept drift (weather change) | 90 % mAP | ≈ 40 % reduction | Planner re‑evaluated sampling frequency, keeping cost low |
Overall, RedunCut kept the target accuracy within ±1 % of the baseline while delivering sizable compute savings across all tested scenarios.
Practical Implications
- Cost‑effective edge deployments – Operators of smart‑city cameras or drone fleets can run heavier models only when needed, extending battery life and reducing cloud‑ingress bandwidth.
- Simplified pipeline integration – Because RedunCut treats models as black boxes, existing inference services (TensorRT, ONNX Runtime, etc.) can be wrapped with the planner without code changes.
- Dynamic SLAs – Service providers can expose “accuracy‑as‑a‑service” contracts; RedunCut automatically throttles compute to meet the promised precision while minimizing spend.
- Rapid prototyping – Data scientists can experiment with new model families without re‑engineering the runtime; RedunCut will automatically discover the most cost‑effective size for each video domain.
- Scalable cloud billing – For SaaS video analytics platforms, per‑frame compute reductions translate directly into lower GPU hours and more predictable billing for customers.
Limitations & Future Work
- Reliance on short‑term statistics – In highly erratic scenes (e.g., sudden flashes), the confidence‑based predictor may mis‑estimate accuracy, leading to occasional over‑aggressive model downsizing.
- Initial warm‑up cost – The planner needs a brief observation window to gather reliable measurements; during this period compute savings are modest.
- Model family granularity – RedunCut assumes a discrete set of pre‑trained model sizes; extending it to continuous scaling (e.g., dynamic channel pruning) is left for future research.
- Broader task coverage – Experiments focused on detection and classification; applying the same ideas to segmentation, pose estimation, or multimodal video‑audio pipelines remains an open avenue.
The authors suggest exploring adaptive learning of the accuracy predictor on‑the‑fly and integrating reinforcement‑learning‑based planners to further tighten the cost‑accuracy trade‑off.
Authors
- Gur‑Eyal Sela
- Kumar Krishna Agrawal
- Bharathan Balaji
- Joseph Gonzalez
- Ion Stoica
Paper Information
- arXiv ID: 2512.24386v1
- Categories: cs.CV, cs.DC
- Published: December 30, 2025
- PDF: Download PDF