[Paper] Surflo: Consistent 3D Surface Flow Model with Global State

Published: 3 days ago (June 11, 2026 at 01:48 PM EDT)

2 min read

Source: arXiv

Source: arXiv - 2606.13644v1

Overview

Geometry is invariant to viewpoint, which makes any collection of images a redundant encoding of a single 3D state. Existing feed-forward reconstruction models fail to exploit this: per-view methods emit overlapping, unaligned pointmaps that grow linearly with input count, while global-latent methods commit to a fixed, low-resolution output. We introduce Surflo, which compresses a variable number of unposed RGB views into K latent tokens-one global state-and decodes oriented 3D surface points by independently transporting them from noise onto the surface via flow matching. This frees the output from any fixed grid or token budget: the same latent yields from a few thousand to a million points in a single forward pass. To suppress the local inconsistencies inherent to independent per-point decoding, an inference-time guidance term correlates nearby points by injecting a photometric gradient during ODE integration. Surflo matches or surpasses feed-forward baselines on surface metrics, runs an order of magnitude faster than optimization-based methods that require hundreds of views, and is the only feed-forward approach to combine a global latent with arbitrary-resolution decoding.

Key Contributions

This paper presents research in the following areas:

cs.CV

Methodology

Please refer to the full paper for detailed methodology.

Practical Implications

This research contributes to the advancement of cs.CV.

Authors

Antoine Guédon
Shu Nakamura
Nicolas Dufour
Jiahui Lei
Ko Nishino
Angjoo Kanazawa

Paper Information

arXiv ID: 2606.13644v1
Categories: cs.CV
Published: June 11, 2026
PDF: Download PDF

[Paper] Surflo: Consistent 3D Surface Flow Model with Global State

Overview

Key Contributions

Methodology

Practical Implications

Authors

Paper Information

Related posts

[Paper] InterleaveThinker: Reinforcing Agentic Interleaved Generation

[Paper] Mana: Dexterous Manipulation of Articulated Tools

[Paper] Modality Forcing for Scalable Spatial Generation

[Paper] RepWAM: World Action Modeling with Representation Visual-Action Tokenizers