[Paper] PI-Light: Physics-Inspired Diffusion for Full-Image Relighting

Published: (January 29, 2026 at 01:55 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2601.22135v1

Overview

The paper presents π‑Light (PI‑Light), a two‑stage diffusion‑based framework that brings physics into the image‑relighting pipeline. By marrying a pretrained diffusion model with physics‑inspired constraints, the authors achieve realistic, full‑scene lighting edits without needing massive paired datasets, and they demonstrate strong generalization from synthetic training data to real‑world photographs.

Key Contributions

  • Batch‑aware attention: a novel attention mechanism that enforces consistency of intrinsic scene properties (e.g., albedo, geometry) across a batch of images, improving stability of relighting results.
  • Physics‑guided neural rendering module: integrates a differentiable light‑transport model into the diffusion process, guaranteeing physically plausible shading, specular highlights, and diffuse reflections.
  • Physics‑inspired loss functions: regularizers that steer the diffusion dynamics toward a physically meaningful solution space, boosting robustness on unseen real images.
  • Curated lighting dataset: a new collection of objects and indoor/outdoor scenes captured under controlled illumination, released as a benchmark for full‑image relighting research.
  • Efficient fine‑tuning recipe: demonstrates that a large pretrained diffusion model can be adapted to relighting tasks with modest compute, thanks to the physics‑driven constraints.

Methodology

  1. Two‑stage pipeline

    • Stage 1 – Intrinsic decomposition: The diffusion model predicts scene‑intrinsic maps (albedo, normal, depth) for each input image. Batch‑aware attention ensures these predictions stay coherent when processing multiple views of the same scene.
    • Stage 2 – Physics‑guided rendering: A lightweight neural renderer takes the intrinsic maps and a target lighting specification (e.g., direction, intensity) and computes the relit image using a differentiable rendering equation that respects energy conservation and the Lambert‑Phong reflectance model.
  2. Physics‑inspired losses

    • Energy‑preserving loss: penalizes deviations from the total reflected light predicted by the rendering equation.
    • Specular consistency loss: encourages the specular component to follow the micro‑facet distribution implied by the normals.
    • Temporal smoothness loss: when a batch contains a sequence of lighting conditions, this loss keeps the intrinsic maps stable across the sequence.
  3. Training & fine‑tuning

    • The model is first pretrained on a large synthetic corpus (e.g., rendered with Blender).
    • Fine‑tuning on the curated real‑lighting dataset uses the physics‑inspired losses to bridge the synthetic‑to‑real gap, requiring far fewer real images than a purely data‑driven approach.

Results & Findings

  • Visual quality: π‑Light reproduces crisp specular highlights on metals, soft diffuse shading on fabrics, and correct shadow boundaries, outperforming prior diffusion‑only relighting methods.
  • Quantitative metrics: On the new benchmark, the method improves PSNR by ~2.3 dB and SSIM by ~0.04 over the strongest baseline, while reducing the mean angular error of estimated lighting vectors by 15 %.
  • Generalization: When tested on out‑of‑distribution real photographs (e.g., indoor scenes captured with a phone camera), π‑Light maintains realistic lighting changes, whereas baseline models often produce color bleeding or unrealistic highlight shapes.
  • Efficiency: Fine‑tuning converges in ~8 hours on a single RTX 3090, a fraction of the time required to train a comparable end‑to‑end relighting network from scratch.

Practical Implications

  • Content creation pipelines: Artists can quickly re‑light existing renders or photographs without re‑capturing the scene, enabling rapid iteration for games, VFX, and AR/VR assets.
  • Mobile photo editing: The lightweight rendering stage can be ported to on‑device inference, allowing developers to add “relight” filters to camera apps that respect material properties.
  • Synthetic data generation: π‑Light can be used to augment training datasets with varied lighting conditions while preserving physical realism, benefiting downstream tasks like object detection or pose estimation.
  • Robotics & autonomous driving: Simulating realistic illumination changes (e.g., dusk, streetlights) on captured dash‑cam footage can improve robustness of perception models to lighting variations.

Limitations & Future Work

  • Material model simplifications: The current renderer assumes a Lambert‑Phong reflectance model, which may struggle with complex BRDFs such as anisotropic or subsurface scattering materials.
  • Lighting representation: Only directional point lights are supported; extending to area lights or environment maps would broaden applicability.
  • Batch‑aware attention scaling: While effective for modest batch sizes, the attention mechanism becomes memory‑intensive for very large image collections, suggesting a need for more scalable alternatives.
  • Future directions: The authors propose integrating learned BRDFs, exploring hierarchical lighting encodings, and applying the physics‑inspired diffusion paradigm to video relighting with temporal consistency guarantees.

Authors

  • Zhexin Liang
  • Zhaoxi Chen
  • Yongwei Chen
  • Tianyi Wei
  • Tengfei Wang
  • Xingang Pan

Paper Information

  • arXiv ID: 2601.22135v1
  • Categories: cs.CV
  • Published: January 29, 2026
  • PDF: Download PDF
Back to Blog

Related posts

Read more »