[Paper] TexSpot: 3D Texture Enhancement with Spatially-uniform Point Latent Representation

Published: (February 12, 2026 at 11:37 AM EST)
4 min read
Source: arXiv

Source: arXiv - 2602.12157v1

Overview

TexSpot tackles a long‑standing pain point in 3‑D graphics: generating high‑quality, view‑consistent textures for arbitrary meshes. By introducing a new “Texlet” representation that blends the flexibility of point‑based textures with the compactness of UV maps, the authors build a diffusion‑based enhancer that can polish textures produced by existing multi‑view pipelines while preserving geometric fidelity.

Key Contributions

  • Texlet representation: A spatially‑uniform latent token for each surface point that stores a local 2‑D texture patch, learned via a joint 2‑D/3‑D encoder pipeline.
  • Cascaded 3‑D‑to‑2‑D decoder: Reconstructs high‑resolution texture patches from Texlet latents, enabling a compact yet expressive texture space.
  • Diffusion transformer for enhancement: Trains a diffusion model conditioned on Texlets to refine textures generated by any multi‑view diffusion method, improving consistency across viewpoints.
  • Comprehensive evaluation: Demonstrates superior visual fidelity, geometric consistency, and robustness compared with state‑of‑the‑art 3‑D texture generation and enhancement techniques.

Methodology

  1. Texlet Construction

    • Sample a uniform set of points on the mesh surface.
    • For each point, extract a small 2‑D texture patch (e.g., 32×32 pixels) from the initial texture.
    • Encode each patch with a lightweight 2‑D CNN encoder → local latent vector.
    • Feed all local vectors into a shared 3‑D encoder (e.g., PointNet++ style) that injects global shape context, producing the final Texlet latent for that point.
  2. 3‑D‑to‑2‑D Decoding

    • A cascade of decoders first expands the global latent into a coarse 2‑D feature map, then refines it into the full‑resolution texture patch.
    • This design keeps memory usage low while allowing the model to reconstruct fine‑grained details.
  3. Diffusion‑Based Enhancement

    • A transformer‑style diffusion model receives the noisy Texlet latents and learns to denoise them conditioned on the underlying geometry.
    • The diffusion process iteratively refines the latent space, which is then decoded back into high‑quality texture patches.
    • Because the diffusion operates on the compact Texlet space, the method is fast and scalable to high‑resolution meshes.
  4. Training & Integration

    • The system is trained end‑to‑end on a curated dataset of meshes with ground‑truth textures.
    • At inference, TexSpot can be plugged after any multi‑view diffusion generator (e.g., DreamFusion‑style pipelines) to boost the final texture quality.

Results & Findings

  • Visual fidelity: User studies and PSNR/SSIM metrics show a 15‑20 % improvement over the best prior point‑based and UV‑based methods.
  • View consistency: Renderings from drastically different camera angles exhibit far fewer seams and color shifts, confirming the spatial uniformity of Texlets.
  • Resolution scalability: TexSpot successfully generates textures up to 4K resolution without exploding memory, thanks to the latent compression.
  • Robustness: The diffusion enhancer tolerates noisy or incomplete initial textures (e.g., from low‑sample multi‑view diffusion) and still converges to clean results.

Practical Implications

  • Game & VR asset pipelines: Artists can feed coarse textures from rapid prototyping tools into TexSpot to obtain production‑grade, view‑consistent textures without manual UV unwrapping.
  • 3‑D content marketplaces: Automated up‑scaling of user‑submitted meshes becomes feasible, reducing the need for manual retouching.
  • AR/VR streaming: Because TexSpot works on a compact latent representation, it can be integrated into edge‑computing scenarios where bandwidth is limited but high‑quality textures are required.
  • Cross‑modal generation: The Texlet space could serve as a bridge for text‑to‑3‑D pipelines, enabling language‑driven texture refinement without re‑training a full diffusion model for each new asset.

Limitations & Future Work

  • Dependence on point density: Extremely sparse point samplings still limit the finest texture details; adaptive sampling strategies could mitigate this.
  • Training data bias: The model is trained on synthetic datasets with relatively clean geometry; performance on noisy real‑world scans may degrade.
  • Real‑time constraints: While more efficient than full‑resolution diffusion, the iterative diffusion steps still add latency, suggesting future work on accelerated denoising (e.g., distilled diffusion or GAN‑based shortcuts).
  • Extension to dynamic meshes: Current formulation assumes static geometry; extending Texlets to handle deformable or animated surfaces is an open direction.

Authors

  • Ziteng Lu
  • Yushuang Wu
  • Chongjie Ye
  • Yuda Qiu
  • Jing Shao
  • Xiaoyang Guo
  • Jiaqing Zhou
  • Tianlei Hu
  • Kun Zhou
  • Xiaoguang Han

Paper Information

  • arXiv ID: 2602.12157v1
  • Categories: cs.CV, cs.GR
  • Published: February 12, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »