[Paper] TexSpot: 3D Texture Enhancement with Spatially-uniform Point Latent Representation

Published: 3 days ago (February 12, 2026 at 11:37 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2602.12157v1

Overview

TexSpot tackles a long‑standing pain point in 3‑D graphics: generating high‑quality, view‑consistent textures for arbitrary meshes. By introducing a new “Texlet” representation that blends the flexibility of point‑based textures with the compactness of UV maps, the authors build a diffusion‑based enhancer that can polish textures produced by existing multi‑view pipelines while preserving geometric fidelity.

Key Contributions

Texlet representation: A spatially‑uniform latent token for each surface point that stores a local 2‑D texture patch, learned via a joint 2‑D/3‑D encoder pipeline.
Cascaded 3‑D‑to‑2‑D decoder: Reconstructs high‑resolution texture patches from Texlet latents, enabling a compact yet expressive texture space.
Diffusion transformer for enhancement: Trains a diffusion model conditioned on Texlets to refine textures generated by any multi‑view diffusion method, improving consistency across viewpoints.
Comprehensive evaluation: Demonstrates superior visual fidelity, geometric consistency, and robustness compared with state‑of‑the‑art 3‑D texture generation and enhancement techniques.

Methodology

Texlet Construction
- Sample a uniform set of points on the mesh surface.
- For each point, extract a small 2‑D texture patch (e.g., 32×32 pixels) from the initial texture.
- Encode each patch with a lightweight 2‑D CNN encoder → local latent vector.
- Feed all local vectors into a shared 3‑D encoder (e.g., PointNet++ style) that injects global shape context, producing the final Texlet latent for that point.
3‑D‑to‑2‑D Decoding
- A cascade of decoders first expands the global latent into a coarse 2‑D feature map, then refines it into the full‑resolution texture patch.
- This design keeps memory usage low while allowing the model to reconstruct fine‑grained details.
Diffusion‑Based Enhancement
- A transformer‑style diffusion model receives the noisy Texlet latents and learns to denoise them conditioned on the underlying geometry.
- The diffusion process iteratively refines the latent space, which is then decoded back into high‑quality texture patches.
- Because the diffusion operates on the compact Texlet space, the method is fast and scalable to high‑resolution meshes.
Training & Integration
- The system is trained end‑to‑end on a curated dataset of meshes with ground‑truth textures.
- At inference, TexSpot can be plugged after any multi‑view diffusion generator (e.g., DreamFusion‑style pipelines) to boost the final texture quality.

Results & Findings

Visual fidelity: User studies and PSNR/SSIM metrics show a 15‑20 % improvement over the best prior point‑based and UV‑based methods.
View consistency: Renderings from drastically different camera angles exhibit far fewer seams and color shifts, confirming the spatial uniformity of Texlets.
Resolution scalability: TexSpot successfully generates textures up to 4K resolution without exploding memory, thanks to the latent compression.
Robustness: The diffusion enhancer tolerates noisy or incomplete initial textures (e.g., from low‑sample multi‑view diffusion) and still converges to clean results.

Practical Implications

Game & VR asset pipelines: Artists can feed coarse textures from rapid prototyping tools into TexSpot to obtain production‑grade, view‑consistent textures without manual UV unwrapping.
3‑D content marketplaces: Automated up‑scaling of user‑submitted meshes becomes feasible, reducing the need for manual retouching.
AR/VR streaming: Because TexSpot works on a compact latent representation, it can be integrated into edge‑computing scenarios where bandwidth is limited but high‑quality textures are required.
Cross‑modal generation: The Texlet space could serve as a bridge for text‑to‑3‑D pipelines, enabling language‑driven texture refinement without re‑training a full diffusion model for each new asset.

Limitations & Future Work

Dependence on point density: Extremely sparse point samplings still limit the finest texture details; adaptive sampling strategies could mitigate this.
Training data bias: The model is trained on synthetic datasets with relatively clean geometry; performance on noisy real‑world scans may degrade.
Real‑time constraints: While more efficient than full‑resolution diffusion, the iterative diffusion steps still add latency, suggesting future work on accelerated denoising (e.g., distilled diffusion or GAN‑based shortcuts).
Extension to dynamic meshes: Current formulation assumes static geometry; extending Texlets to handle deformable or animated surfaces is an open direction.

Authors

Ziteng Lu
Yushuang Wu
Chongjie Ye
Yuda Qiu
Jing Shao
Xiaoyang Guo
Jiaqing Zhou
Tianlei Hu
Kun Zhou
Xiaoguang Han

Paper Information

arXiv ID: 2602.12157v1
Categories: cs.CV, cs.GR
Published: February 12, 2026
PDF: Download PDF

[Paper] TexSpot: 3D Texture Enhancement with Spatially-uniform Point Latent Representation

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Stroke of Surprise: Progressive Semantic Illusions in Vector Sketching

[Paper] UniT: Unified Multimodal Chain-of-Thought Test-time Scaling

[Paper] MonarchRT: Efficient Attention for Real-Time Video Generation

[Paper] Energy-Aware Spike Budgeting for Continual Learning in Spiking Neural Networks for Neuromorphic Vision