[Paper] NeRFscopy: Neural Radiance Fields for in-vivo Time-Varying Tissues from Endoscopy

Published: 3 days ago (February 17, 2026 at 01:05 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2602.15775v1

Overview

The paper presents NeRFscopy, a self‑supervised pipeline that brings neural radiance fields (NeRF) into the world of endoscopy. By turning a single‑camera video of soft, moving tissue into a controllable 3‑D model, the authors aim to give clinicians and surgeons richer visual cues while keeping the hardware requirements minimal.

Key Contributions

Deformable NeRF for medical video – Extends the classic static‑scene NeRF formulation with a time‑varying deformation field, enabling reconstruction of continuously moving tissue.
SE(3)‑based deformation parametrisation – Uses a sequence of 6‑DoF rigid transforms to model local tissue motion, keeping the optimisation tractable and interpretable.
Fully self‑supervised learning – No pre‑trained models, templates, or external markers are required; the system learns directly from the raw endoscopic video.
Robust colour‑consistency losses – Novel photometric terms that handle illumination changes and specular highlights typical of endoscopic lighting.
State‑of‑the‑art view synthesis – Demonstrates superior novel‑view rendering quality compared with existing dynamic‑scene NeRF and classical SLAM baselines on several challenging in‑vivo datasets.

Methodology

Canonical radiance field – The pipeline first learns a static NeRF that represents the tissue in a “canonical” pose (i.e., an undeformed reference frame).
Deformation field – For each video frame, a separate SE(3) transformation field warps points from the canonical space to the observed pose, capturing both global camera motion and local tissue deformation.
Self‑supervision – The model is trained by rendering the warped radiance field back into the image plane and comparing it to the actual video frame. Losses include:
- Photometric loss – pixel‑wise colour difference, robustified against specularities.
- Temporal smoothness – penalises abrupt changes in the SE(3) parameters across consecutive frames.
- Depth‑consistency regularisation – encourages plausible geometry without any depth sensor.
Optimization loop – Alternates between updating the canonical NeRF weights (a multi‑layer MLP) and refining the per‑frame SE(3) parameters using gradient descent on the combined loss.

The whole pipeline runs on a single GPU and requires only the monocular video as input, making it practical for existing endoscopic rigs.

Results & Findings

Novel view synthesis – Quantitative metrics (PSNR, SSIM) improve by 15‑25 % over the best competing dynamic‑NeRF method on a publicly released colonoscopy dataset.
Geometric fidelity – Reconstructed surfaces capture fine mucosal folds and peristaltic motion, verified against a limited set of intra‑operative optical‑flow ground truth.
Robustness to lighting – The colour‑consistency terms successfully handle the rapid illumination swings caused by the endoscope’s moving light source.
Speed – After training (≈ 30 min on a RTX 3090 for a 10‑second clip), rendering a novel view takes < 0.1 s, enabling near‑real‑time preview.

Practical Implications

Enhanced intra‑operative navigation – Surgeons could query arbitrary viewpoints of the tissue in real time, helping to spot lesions hidden behind folds.
Improved diagnostic imaging – Radiologists can generate 3‑D reconstructions from routine endoscopic recordings without extra hardware, aiding lesion measurement and documentation.
Training and simulation – High‑fidelity, patient‑specific virtual endoscopy environments become feasible, supporting skill acquisition and pre‑operative rehearsal.
Integration with AI pipelines – The implicit 3‑D representation can serve as a common backbone for downstream tasks such as polyp detection, tissue classification, or robotic tool path planning.

Limitations & Future Work

Rigid‑body deformation model – SE(3) captures only locally rigid motions; highly elastic deformations (e.g., extreme peristalsis) may be under‑represented.
Scale to long procedures – Training time grows linearly with video length; future work could explore hierarchical or streaming NeRF updates.
Clinical validation – Current experiments are limited to ex‑vivo phantoms and a small in‑vivo dataset; larger multi‑center studies are needed to assess diagnostic impact.
Hardware constraints – While no extra sensors are required, real‑time deployment on standard operating‑room workstations will demand further optimisation or model compression.

NeRFscopy opens a promising path toward turning everyday endoscopic footage into interactive 3‑D models, bridging the gap between cutting‑edge neural rendering research and practical medical imaging tools.

Authors

Laura Salort-Benejam
Antonio Agudo

Paper Information

arXiv ID: 2602.15775v1
Categories: cs.CV
Published: February 17, 2026
PDF: Download PDF

[Paper] NeRFscopy: Neural Radiance Fields for in-vivo Time-Varying Tissues from Endoscopy

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] OpenEarthAgent: A Unified Framework for Tool-Augmented Geospatial Agents

[Paper] When Vision Overrides Language: Evaluating and Mitigating Counterfactual Failures in VLAs

[Paper] Human-level 3D shape perception emerges from multi-view learning

[Paper] Pushing the Frontier of Black-Box LVLM Attacks via Fine-Grained Detail Targeting