[Paper] EGGS: Exchangeable 2D/3D Gaussian Splatting for Geometry-Appearance Balanced Novel View Synthesis

Published: 2 months ago (December 2, 2025 at 12:01 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.02932v1

Overview

The paper introduces EGGS (Exchangeable Gaussian Splatting), a novel hybrid rendering pipeline that fuses 2‑D and 3‑D Gaussian splatting to produce novel‑view images with both high‑fidelity textures and accurate geometry. By letting the system swap between 2‑D and 3‑D Gaussians on the fly, EGGS overcomes the classic trade‑off between visual detail and multi‑view consistency that has limited prior real‑time NVS solutions.

Key Contributions

Hybrid representation that combines 2‑D and 3‑D Gaussians in a single scene model.
Hybrid Gaussian Rasterization: a unified CUDA‑based renderer that can rasterize both 2‑D and 3‑D splats in one pass.
Adaptive Type Exchange: a learning‑driven mechanism that dynamically decides whether a splat should be treated as 2‑D (texture‑focused) or 3‑D (geometry‑focused) during training and inference.
Frequency‑Decoupled Optimization: separates low‑frequency (shape) and high‑frequency (appearance) losses, letting each Gaussian type specialize where it excels.
Real‑time performance: the authors report training times comparable to pure 3‑DGS and inference speeds suitable for interactive AR/VR applications.

Methodology

Scene Initialization – The pipeline starts from a set of multi‑view images and builds an initial cloud of Gaussian primitives, each equipped with position, covariance, color, and a “type flag” (2‑D or 3‑D).
Hybrid Rasterizer – A custom CUDA kernel projects every Gaussian onto the image plane. 2‑D Gaussians are rasterized directly in screen space (like sprites), while 3‑D Gaussians are first transformed by the camera pose and then splatted. The rasterizer blends contributions using a differentiable compositing step, enabling gradient‑based learning.
Adaptive Type Exchange – During each optimization iteration, the network evaluates a type‑confidence score for each Gaussian. If a splat’s geometry error (measured against depth cues) is high, it is promoted to 3‑D; if its texture error (high‑frequency color loss) dominates, it is demoted to 2‑D. This exchange is fully differentiable and happens on the GPU.
Frequency‑Decoupled Loss – The loss function is split into:
- Low‑frequency loss (e.g., depth consistency, smoothness) that primarily updates 3‑D Gaussians.
- High‑frequency loss (e.g., perceptual color loss, edge sharpness) that mainly drives 2‑D Gaussians.
  By decoupling, each Gaussian type can specialize without being pulled in opposite directions.
Training & Inference – The entire pipeline runs end‑to‑end on a single GPU. Training converges in ~30 min for a typical indoor scene (≈100 k Gaussians), and rendering a 1080p frame takes ~15 ms, meeting real‑time thresholds.

Results & Findings

Metric	EGGS	3‑DGS	2‑DGS	NeRF‑based baseline
PSNR (novel view)	31.8 dB	30.2 dB	28.9 dB	30.5 dB
SSIM	0.94	0.91	0.88	0.92
Geometry error (RMSE)	0.018 m	0.032 m	0.021 m	0.025 m
Rendering time (1080p)	15 ms	12 ms	18 ms	120 ms
Training time (per scene)	28 min	30 min	22 min	4 h

Visual quality: EGGS preserves fine textures (e.g., fabric patterns) while eliminating the ghosting artifacts typical of pure 3‑DGS.
Geometric fidelity: The hybrid model reduces depth drift, yielding cleaner edges around thin structures and better alignment across views.
Efficiency: Despite the added complexity of type exchange, the CUDA rasterizer keeps the runtime on par with the fastest existing splatting methods.

Practical Implications

AR/VR content pipelines: Developers can now generate high‑quality, low‑latency scene representations from a handful of captured images, cutting down on manual 3‑D modeling.
Robotics & autonomous driving: Accurate geometry is crucial for collision avoidance; EGGS can provide dense, view‑consistent depth maps in real time, useful for SLAM front‑ends.
Game engines & real‑time graphics: The hybrid splatting approach can be integrated as a plug‑in for Unity or Unreal, offering a fast “photo‑realistic proxy” for background environments without heavy polygon meshes.
Edge deployment: Because the method runs on a single GPU and avoids large MLPs (as in NeRF), it fits well on modern mobile GPUs or embedded NVIDIA Jetson devices.

Limitations & Future Work

Scalability to massive outdoor scenes: The current implementation caps the number of Gaussians at ~200 k; handling city‑scale environments will require hierarchical or streaming strategies.
Dynamic content: EGGS assumes a static scene; extending the type‑exchange mechanism to handle moving objects or changing lighting remains an open challenge.
Memory footprint: Storing both 2‑D and 3‑D attributes doubles per‑Gaussian memory compared to pure 3‑DGS, which could be a bottleneck on low‑VRAM devices.
Future directions suggested by the authors include: adaptive pruning of redundant Gaussians, integration with learned illumination models, and exploring multi‑modal inputs (e.g., LiDAR + RGB).

Authors

Yancheng Zhang
Guangyu Sun
Chen Chen

Paper Information

arXiv ID: 2512.02932v1
Categories: cs.CV, cs.AI
Published: December 2, 2025
PDF: Download PDF

[Paper] EGGS: Exchangeable 2D/3D Gaussian Splatting for Geometry-Appearance Balanced Novel View Synthesis

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] AQUA-Net: Adaptive Frequency Fusion and Illumination Aware Network for Underwater Image Enhancement

[Paper] M4-RAG: A Massive-Scale Multilingual Multi-Cultural Multimodal RAG

[Paper] Zoom in, Click out: Unlocking and Evaluating the Potential of Zooming for GUI Grounding

[Paper] Measuring the Effect of Background on Classification and Feature Importance in Deep Learning for AV Perception