ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth
Source: Dev.to
Overview
ZoeDepth predicts depth from a single image, handling both near and far objects accurately. It combines two learning strategies: one that preserves metric scale and another that captures relative shape.
Method
- Dual learning: The model is first trained on large, diverse datasets using a relative‑depth objective, then fine‑tuned with metric‑depth supervision to retain real‑world scale.
- Dynamic specialist selection: An internal selector chooses the most suitable “tiny specialist” for each input image, ensuring optimal performance across varied scenes.
Generalization and Performance
- Trained on many datasets, ZoeDepth generalizes well to unseen indoor and outdoor environments.
- Demonstrates strong zero‑shot depth estimation, outperforming prior methods without additional training.
Applications
- Photo‑editing effects that rely on depth cues.
- Space measurement tools for interior design or architecture.
- Robotics and navigation systems that need reliable depth perception.
Availability
The code and pretrained models are publicly released for community use.
ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth