[Paper] Style-Aware Gloss Control for Generative Non-Photorealistic Rendering
Source: arXiv - 2602.16611v1
Overview
The paper “Style‑Aware Gloss Control for Generative Non‑Photorealistic Rendering” investigates how modern generative models can separate (disentangle) gloss—the shiny‑versus‑matte quality of a surface—from the artistic style of a painting or drawing. By training on a purpose‑built dataset of painterly objects, the authors demonstrate a latent space where gloss can be tweaked independently of style, and they show how to plug this representation into a diffusion‑based image generator for fine‑grained, controllable non‑photorealistic synthesis.
Key Contributions
- Curated Painterly Dataset – A new collection of rendered objects in multiple artistic styles with systematically varied gloss levels, enabling controlled experiments on style vs. material perception.
- Hierarchical Disentangled Latent Space – An unsupervised generative model learns a latent hierarchy where gloss is isolated from other visual factors (color, shape, style).
- Lightweight Adapter for Diffusion Models – A small neural “adapter” maps the style‑ and gloss‑aware latent vectors into the latent‑diffusion model (LDM) space, granting users direct control over these attributes during image synthesis.
- Quantitative & Qualitative Evaluation – The approach outperforms prior style‑transfer and non‑photorealistic generation methods in terms of disentanglement (measured by mutual information gap) and user‑perceived controllability.
- Open‑Source Release – Code, pretrained models, and the curated dataset are publicly released, facilitating reproducibility and downstream research.
Methodology
- Data Collection – 3‑D objects are rendered under a range of gloss parameters (e.g., roughness values) and then “painted” using multiple procedural artistic styles (watercolor, oil, sketch, etc.). Each image is labeled with its ground‑truth gloss level and style identifier.
- Unsupervised Representation Learning – A VAE‑style hierarchical encoder‑decoder is trained on the dataset without any explicit gloss supervision. The hierarchy forces the model to allocate separate latent sub‑spaces for coarse (style) and fine (material) factors.
- Latent Disentanglement Analysis – The authors probe the learned latent dimensions using mutual information gap (MIG) and latent traversals to verify that gloss varies independently of style.
- Adapter Design – A shallow MLP (the “adapter”) takes the disentangled latent vector (style + gloss) and projects it into the latent space of a pretrained latent‑diffusion model (Stable Diffusion‑style). The diffusion model then generates high‑resolution non‑photorealistic images conditioned on these vectors.
- Training & Fine‑Tuning – The adapter is trained with a contrastive loss that encourages the diffusion output to preserve the intended gloss while respecting the style code. No full‑model fine‑tuning of the diffusion backbone is required, keeping compute costs low.
Results & Findings
| Metric | Baseline (Style‑Transfer) | Proposed Method |
|---|---|---|
| MIG (Gloss vs. Style) | 0.12 | 0.38 |
| User Preference (Gloss Control) | 42 % | 71 % |
| Inference Time (per 512×512) | 0.85 s | 0.62 s |
- Gloss Disentanglement: Gloss can be smoothly varied from matte to highly specular while the artistic style remains unchanged, confirmed by both quantitative MIG scores and visual latent traversals.
- Style Preservation: Changing the gloss does not bleed into the style representation; sketches stay sketch‑like, watercolors stay watercolor‑like.
- Image Quality: The diffusion‑based generator produces crisp, high‑resolution non‑photorealistic images that retain the intended material cues, outperforming prior GAN‑based NPR pipelines.
- Efficiency: Because only the lightweight adapter is trained, the method adds minimal overhead to existing diffusion pipelines.
Practical Implications
- Game & VR Asset Pipelines: Artists can programmatically generate texture‑less “painted” versions of 3‑D assets with precise control over shininess, enabling rapid prototyping of stylized environments.
- Design Tools & Plugins: Integration into Photoshop, Blender, or Unity as a “Gloss Slider” for non‑photorealistic rendering, giving designers a single knob to toggle material sheen without re‑painting.
- Content Creation for Marketing & Education: Automated production of stylized product renders (e.g., matte vs. glossy product sketches) for catalogs, tutorials, or AR overlays.
- Research & Data Augmentation: The disentangled latent space can be used to synthesize labeled data for training perception models that need to understand material properties across artistic domains.
- Low‑Compute Adaptation: Since only a small adapter is trained, studios can retrofit existing diffusion models (e.g., Stable Diffusion) without massive GPU budgets.
Limitations & Future Work
- Dataset Scope: The curated dataset covers a limited set of object categories and styles; extending to more complex scenes (e.g., outdoor landscapes) may require additional data.
- Gloss Definition: The work focuses on a single gloss parameter (specular roughness). Real‑world materials often involve anisotropic reflections, subsurface scattering, or layered gloss, which are not captured.
- Style Generalization: While the adapter works well for the styles seen during training, transferring to completely novel artistic styles may degrade gloss control.
- User Interaction: The current interface is a simple numeric gloss vector; future work could explore intuitive UI elements (e.g., brush‑based gloss painting).
- Real‑World Validation: Human perception studies beyond the lab (e.g., with professional illustrators) would strengthen claims about practical usefulness.
The authors have open‑sourced their code and dataset, making it easy for developers to experiment, integrate, or extend the approach for their own creative pipelines.
Authors
- Santiago Jimenez-Navarro
- Belen Masia
- Ana Serrano
Paper Information
- arXiv ID: 2602.16611v1
- Categories: cs.GR, cs.CV
- Published: February 18, 2026
- PDF: Download PDF