[Paper] NeRFscopy: Neural Radiance Fields for in-vivo Time-Varying Tissues from Endoscopy

Published: (February 17, 2026 at 01:05 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2602.15775v1

Overview

The paper presents NeRFscopy, a self‑supervised pipeline that brings neural radiance fields (NeRF) into the world of endoscopy. By turning a single‑camera video of soft, moving tissue into a controllable 3‑D model, the authors aim to give clinicians and surgeons richer visual cues while keeping the hardware requirements minimal.

Key Contributions

  • Deformable NeRF for medical video – Extends the classic static‑scene NeRF formulation with a time‑varying deformation field, enabling reconstruction of continuously moving tissue.
  • SE(3)‑based deformation parametrisation – Uses a sequence of 6‑DoF rigid transforms to model local tissue motion, keeping the optimisation tractable and interpretable.
  • Fully self‑supervised learning – No pre‑trained models, templates, or external markers are required; the system learns directly from the raw endoscopic video.
  • Robust colour‑consistency losses – Novel photometric terms that handle illumination changes and specular highlights typical of endoscopic lighting.
  • State‑of‑the‑art view synthesis – Demonstrates superior novel‑view rendering quality compared with existing dynamic‑scene NeRF and classical SLAM baselines on several challenging in‑vivo datasets.

Methodology

  1. Canonical radiance field – The pipeline first learns a static NeRF that represents the tissue in a “canonical” pose (i.e., an undeformed reference frame).
  2. Deformation field – For each video frame, a separate SE(3) transformation field warps points from the canonical space to the observed pose, capturing both global camera motion and local tissue deformation.
  3. Self‑supervision – The model is trained by rendering the warped radiance field back into the image plane and comparing it to the actual video frame. Losses include:
    • Photometric loss – pixel‑wise colour difference, robustified against specularities.
    • Temporal smoothness – penalises abrupt changes in the SE(3) parameters across consecutive frames.
    • Depth‑consistency regularisation – encourages plausible geometry without any depth sensor.
  4. Optimization loop – Alternates between updating the canonical NeRF weights (a multi‑layer MLP) and refining the per‑frame SE(3) parameters using gradient descent on the combined loss.

The whole pipeline runs on a single GPU and requires only the monocular video as input, making it practical for existing endoscopic rigs.

Results & Findings

  • Novel view synthesis – Quantitative metrics (PSNR, SSIM) improve by 15‑25 % over the best competing dynamic‑NeRF method on a publicly released colonoscopy dataset.
  • Geometric fidelity – Reconstructed surfaces capture fine mucosal folds and peristaltic motion, verified against a limited set of intra‑operative optical‑flow ground truth.
  • Robustness to lighting – The colour‑consistency terms successfully handle the rapid illumination swings caused by the endoscope’s moving light source.
  • Speed – After training (≈ 30 min on a RTX 3090 for a 10‑second clip), rendering a novel view takes < 0.1 s, enabling near‑real‑time preview.

Practical Implications

  • Enhanced intra‑operative navigation – Surgeons could query arbitrary viewpoints of the tissue in real time, helping to spot lesions hidden behind folds.
  • Improved diagnostic imaging – Radiologists can generate 3‑D reconstructions from routine endoscopic recordings without extra hardware, aiding lesion measurement and documentation.
  • Training and simulation – High‑fidelity, patient‑specific virtual endoscopy environments become feasible, supporting skill acquisition and pre‑operative rehearsal.
  • Integration with AI pipelines – The implicit 3‑D representation can serve as a common backbone for downstream tasks such as polyp detection, tissue classification, or robotic tool path planning.

Limitations & Future Work

  • Rigid‑body deformation model – SE(3) captures only locally rigid motions; highly elastic deformations (e.g., extreme peristalsis) may be under‑represented.
  • Scale to long procedures – Training time grows linearly with video length; future work could explore hierarchical or streaming NeRF updates.
  • Clinical validation – Current experiments are limited to ex‑vivo phantoms and a small in‑vivo dataset; larger multi‑center studies are needed to assess diagnostic impact.
  • Hardware constraints – While no extra sensors are required, real‑time deployment on standard operating‑room workstations will demand further optimisation or model compression.

NeRFscopy opens a promising path toward turning everyday endoscopic footage into interactive 3‑D models, bridging the gap between cutting‑edge neural rendering research and practical medical imaging tools.

Authors

  • Laura Salort-Benejam
  • Antonio Agudo

Paper Information

  • arXiv ID: 2602.15775v1
  • Categories: cs.CV
  • Published: February 17, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »