[Paper] EGGS: Exchangeable 2D/3D Gaussian Splatting for Geometry-Appearance Balanced Novel View Synthesis

Published: (December 2, 2025 at 12:01 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.02932v1

Overview

The paper introduces EGGS (Exchangeable Gaussian Splatting), a novel hybrid rendering pipeline that fuses 2‑D and 3‑D Gaussian splatting to produce novel‑view images with both high‑fidelity textures and accurate geometry. By letting the system swap between 2‑D and 3‑D Gaussians on the fly, EGGS overcomes the classic trade‑off between visual detail and multi‑view consistency that has limited prior real‑time NVS solutions.

Key Contributions

  • Hybrid representation that combines 2‑D and 3‑D Gaussians in a single scene model.
  • Hybrid Gaussian Rasterization: a unified CUDA‑based renderer that can rasterize both 2‑D and 3‑D splats in one pass.
  • Adaptive Type Exchange: a learning‑driven mechanism that dynamically decides whether a splat should be treated as 2‑D (texture‑focused) or 3‑D (geometry‑focused) during training and inference.
  • Frequency‑Decoupled Optimization: separates low‑frequency (shape) and high‑frequency (appearance) losses, letting each Gaussian type specialize where it excels.
  • Real‑time performance: the authors report training times comparable to pure 3‑DGS and inference speeds suitable for interactive AR/VR applications.

Methodology

  1. Scene Initialization – The pipeline starts from a set of multi‑view images and builds an initial cloud of Gaussian primitives, each equipped with position, covariance, color, and a “type flag” (2‑D or 3‑D).
  2. Hybrid Rasterizer – A custom CUDA kernel projects every Gaussian onto the image plane. 2‑D Gaussians are rasterized directly in screen space (like sprites), while 3‑D Gaussians are first transformed by the camera pose and then splatted. The rasterizer blends contributions using a differentiable compositing step, enabling gradient‑based learning.
  3. Adaptive Type Exchange – During each optimization iteration, the network evaluates a type‑confidence score for each Gaussian. If a splat’s geometry error (measured against depth cues) is high, it is promoted to 3‑D; if its texture error (high‑frequency color loss) dominates, it is demoted to 2‑D. This exchange is fully differentiable and happens on the GPU.
  4. Frequency‑Decoupled Loss – The loss function is split into:
    • Low‑frequency loss (e.g., depth consistency, smoothness) that primarily updates 3‑D Gaussians.
    • High‑frequency loss (e.g., perceptual color loss, edge sharpness) that mainly drives 2‑D Gaussians.
      By decoupling, each Gaussian type can specialize without being pulled in opposite directions.
  5. Training & Inference – The entire pipeline runs end‑to‑end on a single GPU. Training converges in ~30 min for a typical indoor scene (≈100 k Gaussians), and rendering a 1080p frame takes ~15 ms, meeting real‑time thresholds.

Results & Findings

MetricEGGS3‑DGS2‑DGSNeRF‑based baseline
PSNR (novel view)31.8 dB30.2 dB28.9 dB30.5 dB
SSIM0.940.910.880.92
Geometry error (RMSE)0.018 m0.032 m0.021 m0.025 m
Rendering time (1080p)15 ms12 ms18 ms120 ms
Training time (per scene)28 min30 min22 min4 h
  • Visual quality: EGGS preserves fine textures (e.g., fabric patterns) while eliminating the ghosting artifacts typical of pure 3‑DGS.
  • Geometric fidelity: The hybrid model reduces depth drift, yielding cleaner edges around thin structures and better alignment across views.
  • Efficiency: Despite the added complexity of type exchange, the CUDA rasterizer keeps the runtime on par with the fastest existing splatting methods.

Practical Implications

  • AR/VR content pipelines: Developers can now generate high‑quality, low‑latency scene representations from a handful of captured images, cutting down on manual 3‑D modeling.
  • Robotics & autonomous driving: Accurate geometry is crucial for collision avoidance; EGGS can provide dense, view‑consistent depth maps in real time, useful for SLAM front‑ends.
  • Game engines & real‑time graphics: The hybrid splatting approach can be integrated as a plug‑in for Unity or Unreal, offering a fast “photo‑realistic proxy” for background environments without heavy polygon meshes.
  • Edge deployment: Because the method runs on a single GPU and avoids large MLPs (as in NeRF), it fits well on modern mobile GPUs or embedded NVIDIA Jetson devices.

Limitations & Future Work

  • Scalability to massive outdoor scenes: The current implementation caps the number of Gaussians at ~200 k; handling city‑scale environments will require hierarchical or streaming strategies.
  • Dynamic content: EGGS assumes a static scene; extending the type‑exchange mechanism to handle moving objects or changing lighting remains an open challenge.
  • Memory footprint: Storing both 2‑D and 3‑D attributes doubles per‑Gaussian memory compared to pure 3‑DGS, which could be a bottleneck on low‑VRAM devices.
  • Future directions suggested by the authors include: adaptive pruning of redundant Gaussians, integration with learned illumination models, and exploring multi‑modal inputs (e.g., LiDAR + RGB).

Authors

  • Yancheng Zhang
  • Guangyu Sun
  • Chen Chen

Paper Information

  • arXiv ID: 2512.02932v1
  • Categories: cs.CV, cs.AI
  • Published: December 2, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »