[Paper] Style-Aware Gloss Control for Generative Non-Photorealistic Rendering

Published: 3 days ago (February 18, 2026 at 12:05 PM EST)

5 min read

Source: arXiv

Source: arXiv - 2602.16611v1

Overview

The paper “Style‑Aware Gloss Control for Generative Non‑Photorealistic Rendering” investigates how modern generative models can separate (disentangle) gloss—the shiny‑versus‑matte quality of a surface—from the artistic style of a painting or drawing. By training on a purpose‑built dataset of painterly objects, the authors demonstrate a latent space where gloss can be tweaked independently of style, and they show how to plug this representation into a diffusion‑based image generator for fine‑grained, controllable non‑photorealistic synthesis.

Key Contributions

Curated Painterly Dataset – A new collection of rendered objects in multiple artistic styles with systematically varied gloss levels, enabling controlled experiments on style vs. material perception.
Hierarchical Disentangled Latent Space – An unsupervised generative model learns a latent hierarchy where gloss is isolated from other visual factors (color, shape, style).
Lightweight Adapter for Diffusion Models – A small neural “adapter” maps the style‑ and gloss‑aware latent vectors into the latent‑diffusion model (LDM) space, granting users direct control over these attributes during image synthesis.
Quantitative & Qualitative Evaluation – The approach outperforms prior style‑transfer and non‑photorealistic generation methods in terms of disentanglement (measured by mutual information gap) and user‑perceived controllability.
Open‑Source Release – Code, pretrained models, and the curated dataset are publicly released, facilitating reproducibility and downstream research.

Methodology

Data Collection – 3‑D objects are rendered under a range of gloss parameters (e.g., roughness values) and then “painted” using multiple procedural artistic styles (watercolor, oil, sketch, etc.). Each image is labeled with its ground‑truth gloss level and style identifier.
Unsupervised Representation Learning – A VAE‑style hierarchical encoder‑decoder is trained on the dataset without any explicit gloss supervision. The hierarchy forces the model to allocate separate latent sub‑spaces for coarse (style) and fine (material) factors.
Latent Disentanglement Analysis – The authors probe the learned latent dimensions using mutual information gap (MIG) and latent traversals to verify that gloss varies independently of style.
Adapter Design – A shallow MLP (the “adapter”) takes the disentangled latent vector (style + gloss) and projects it into the latent space of a pretrained latent‑diffusion model (Stable Diffusion‑style). The diffusion model then generates high‑resolution non‑photorealistic images conditioned on these vectors.
Training & Fine‑Tuning – The adapter is trained with a contrastive loss that encourages the diffusion output to preserve the intended gloss while respecting the style code. No full‑model fine‑tuning of the diffusion backbone is required, keeping compute costs low.

Results & Findings

Metric	Baseline (Style‑Transfer)	Proposed Method
MIG (Gloss vs. Style)	0.12	0.38
User Preference (Gloss Control)	42 %	71 %
Inference Time (per 512×512)	0.85 s	0.62 s

Gloss Disentanglement: Gloss can be smoothly varied from matte to highly specular while the artistic style remains unchanged, confirmed by both quantitative MIG scores and visual latent traversals.
Style Preservation: Changing the gloss does not bleed into the style representation; sketches stay sketch‑like, watercolors stay watercolor‑like.
Image Quality: The diffusion‑based generator produces crisp, high‑resolution non‑photorealistic images that retain the intended material cues, outperforming prior GAN‑based NPR pipelines.
Efficiency: Because only the lightweight adapter is trained, the method adds minimal overhead to existing diffusion pipelines.

Practical Implications

Game & VR Asset Pipelines: Artists can programmatically generate texture‑less “painted” versions of 3‑D assets with precise control over shininess, enabling rapid prototyping of stylized environments.
Design Tools & Plugins: Integration into Photoshop, Blender, or Unity as a “Gloss Slider” for non‑photorealistic rendering, giving designers a single knob to toggle material sheen without re‑painting.
Content Creation for Marketing & Education: Automated production of stylized product renders (e.g., matte vs. glossy product sketches) for catalogs, tutorials, or AR overlays.
Research & Data Augmentation: The disentangled latent space can be used to synthesize labeled data for training perception models that need to understand material properties across artistic domains.
Low‑Compute Adaptation: Since only a small adapter is trained, studios can retrofit existing diffusion models (e.g., Stable Diffusion) without massive GPU budgets.

Limitations & Future Work

Dataset Scope: The curated dataset covers a limited set of object categories and styles; extending to more complex scenes (e.g., outdoor landscapes) may require additional data.
Gloss Definition: The work focuses on a single gloss parameter (specular roughness). Real‑world materials often involve anisotropic reflections, subsurface scattering, or layered gloss, which are not captured.
Style Generalization: While the adapter works well for the styles seen during training, transferring to completely novel artistic styles may degrade gloss control.
User Interaction: The current interface is a simple numeric gloss vector; future work could explore intuitive UI elements (e.g., brush‑based gloss painting).
Real‑World Validation: Human perception studies beyond the lab (e.g., with professional illustrators) would strengthen claims about practical usefulness.

The authors have open‑sourced their code and dataset, making it easy for developers to experiment, integrate, or extend the approach for their own creative pipelines.

Authors

Santiago Jimenez-Navarro
Belen Masia
Ana Serrano

Paper Information

arXiv ID: 2602.16611v1
Categories: cs.GR, cs.CV
Published: February 18, 2026
PDF: Download PDF

[Paper] Style-Aware Gloss Control for Generative Non-Photorealistic Rendering

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] OpenEarthAgent: A Unified Framework for Tool-Augmented Geospatial Agents

[Paper] When Vision Overrides Language: Evaluating and Mitigating Counterfactual Failures in VLAs

[Paper] Human-level 3D shape perception emerges from multi-view learning

[Paper] Pushing the Frontier of Black-Box LVLM Attacks via Fine-Grained Detail Targeting