[Paper] Objects in Generated Videos Are Slower Than They Appear: Models Suffer Sub-Earth Gravity and Don't Know Galileo's Principle...for now
Source: arXiv - 2512.02016v1
Overview
Recent advances in video generation have sparked excitement about using these models as “world simulators” that understand physics. This paper uncovers a surprising shortcoming: out‑of‑the‑box generators consistently make falling objects accelerate far slower than real‑world gravity would dictate. The authors devise a clever, scale‑free test to prove the issue isn’t just a framing‑rate or pixel‑size artifact, and they show that a tiny, data‑efficient adaptor can dramatically close the gap.
Key Contributions
- Discovery of systematic gravity under‑estimation in popular video generators (effective g ≈ 1.8 m/s² vs. 9.81 m/s²).
- Unit‑free two‑object protocol that isolates physical reasoning from ambiguous video metrics, exposing violations of Galileo’s equivalence principle.
- Low‑rank adaptor fine‑tuning (only 100 single‑ball clips) that boosts the effective gravity to ~6.4 m/s² (≈ 65 % of real gravity).
- Zero‑shot generalization of the adaptor to more complex scenes (two‑ball drops, inclined planes) without additional training.
- Comprehensive analysis showing that simple temporal rescaling cannot fix the high‑variance gravity errors, confirming a genuine representational flaw.
Methodology
-
Baseline Evaluation – The authors generate videos of single balls dropping from various heights using several state‑of‑the‑art video generators. They fit a parabola to the vertical trajectory and compute an “effective gravity” (gₑff).
-
Confound Check – To rule out scale or frame‑rate mismatches, they apply temporal rescaling (speed‑up/slow‑down) and re‑measure gₑff. The variance remains, indicating a deeper issue.
-
Unit‑Free Two‑Object Test – They drop two balls from different heights in the same video. Physics predicts the timing ratio (t_1^2/t_2^2 = h_1/h_2) regardless of absolute scale, focal length, or true g. By measuring the fall times, they directly test whether the model respects Galileo’s equivalence principle.
-
Specialist Adaptor – A lightweight low‑rank adaptation layer (≈ 0.1 % of the original model parameters) is fine‑tuned on just 100 short clips of single‑ball drops. The adaptor learns to correct the internal dynamics without retraining the whole generator.
-
Zero‑Shot Transfer – The adapted model is evaluated on unseen scenarios (two‑ball drops, inclined‑plane slides) to assess whether the learned correction generalizes beyond the fine‑tuning data.
Results & Findings
| Model / Setting | Effective g (m/s²) | % of Real g |
|---|---|---|
| Baseline generators (average) | 1.81 | 18 % |
| After temporal rescaling | ~1.8–2.0 (no improvement) | — |
| After low‑rank adaptor (100 clips) | 6.43 | 65 % |
| Zero‑shot on two‑ball drops | ~6.0 m/s² (close to adaptor) | ~60 % |
| Zero‑shot on inclined planes | comparable improvement, still under‑estimates acceleration |
- The two‑object unit‑free test shows a systematic deviation from the expected timing ratio, confirming that the models do not encode Galileo’s equivalence principle.
- The adaptor’s gains are achieved with tiny data and minimal compute, suggesting that the underlying model already contains a latent capacity for correct physics that can be unlocked with targeted fine‑tuning.
Practical Implications
| Area | Impact |
|---|---|
| Game Development & VR | More physically plausible AI‑generated animations could reduce manual rigging and improve immersion. |
| Robotics Simulation | Video generators could serve as cheap, visual world models for training perception‑action loops, provided they respect basic dynamics. |
| Content Creation Platforms | Tools like RunwayML or Adobe Firefly could offer “physics‑aware” video synthesis, preventing uncanny‑valley artifacts (e.g., floating objects). |
| Scientific Visualization | Researchers can trust generated videos for illustrative purposes only after applying a specialist adaptor, avoiding misleading depictions of motion. |
| Model Auditing | The unit‑free protocol offers a lightweight benchmark for any generative model that claims physical reasoning, enabling systematic QA before deployment. |
In short, the paper shows that current video generators are not ready to be trusted as physics engines, but a modest, data‑efficient tweak can bring them much closer—opening the door for practical, physics‑aware generative tools.
Limitations & Future Work
- Partial Correction – Even after adaptation, the effective gravity remains ~35 % below real Earth gravity; full fidelity is still out of reach.
- Scope of Physical Laws – The study focuses on gravity and simple kinematics; other forces (friction, collisions, fluid dynamics) remain untested.
- Adaptor Generality – While zero‑shot transfer worked for two‑ball drops and inclined planes, more complex multi‑object interactions may require additional fine‑tuning data.
- Model Diversity – Experiments were run on a limited set of video generators; broader coverage could reveal architecture‑specific biases.
Future research could explore multi‑task adapters that simultaneously correct several physical principles, investigate self‑supervised physics regularization during pre‑training, and develop standardized physics benchmarks for generative video models.
Authors
- Varun Varma Thozhiyoor
- Shivam Tripathi
- Venkatesh Babu Radhakrishnan
- Anand Bhattad
Paper Information
- arXiv ID: 2512.02016v1
- Categories: cs.CV
- Published: December 1, 2025
- PDF: Download PDF