[Paper] GlazyBench: A Benchmark for Ceramic Glaze Property Prediction and Image Generation
Source: arXiv - 2605.06641v1
Overview
The paper introduces GlazyBench, the first large‑scale dataset dedicated to AI‑assisted ceramic glaze design. By compiling over 23 k real‑world glaze recipes together with their measured post‑firing properties and reference images, the authors open a new playground for multimodal models that can both predict material properties from ingredient lists and render realistic glaze visuals—a task that has traditionally relied on costly trial‑and‑error in the studio.
Key Contributions
- A curated benchmark dataset (23,148 glaze formulations) linking raw material proportions, measured properties (color, transparency, gloss, etc.), and high‑resolution photos of the fired glaze.
- Two benchmark tasks:
- Property prediction – infer quantitative surface attributes from a textual/structured recipe.
- Image generation – synthesize a faithful visual representation conditioned on the predicted properties.
- Baseline implementations spanning classic ML (Random Forest, XGBoost), large language models (LLMs fine‑tuned on the recipe‑property mapping), and state‑of‑the‑art generative models (Stable Diffusion, DALL‑E‑3, ControlNet‑style conditioning).
- Comprehensive evaluation protocol (MAE, R² for regression; FID, CLIP‑Score for image quality) that can serve as a reference point for future research.
- Open‑source release of the dataset, code, and trained baselines to encourage reproducibility and community contributions.
Methodology
- Data collection & cleaning – The authors aggregated glaze recipes from open‑source pottery forums, commercial formulation sheets, and academic publications. Each entry was normalized to a common set of 45 raw material categories (e.g., silica, feldspar, metal oxides) and paired with lab‑measured properties (L*a*b* color coordinates, opacity, gloss) and a calibrated photograph of the fired tile.
- Property prediction pipeline – Recipes are encoded as sparse vectors (material → weight %) and fed to several regressors:
- Traditional: Gradient Boosted Trees (XGBoost) and Random Forests.
- Neural: A simple feed‑forward network and a transformer‑style encoder that treats the recipe as a token sequence.
- LLM‑based: GPT‑4‑style models prompted with “Given the following ingredients, predict the final color (L*a*b*) and transparency.”
- Image generation pipeline – Two strategies were explored:
- Direct diffusion: Conditioning a latent diffusion model on the predicted property vector (concatenated to the text embedding).
- ControlNet: Using the property vector to drive a control map that guides a pretrained Stable Diffusion model, ensuring the output respects the target color and opacity.
- Evaluation – Regression performance is reported via Mean Absolute Error (MAE) and coefficient of determination (R²). Generated images are assessed with Fréchet Inception Distance (FID) for realism and CLIP‑Score for semantic alignment with the target properties.
Results & Findings
- Property prediction: Gradient Boosted Trees achieved the lowest MAE on L*a*b* (≈ 3.2) and opacity (≈ 4 %). LLM prompts performed competitively on color but lagged on opacity, suggesting that raw numeric regression still outperforms language‑model inference for fine‑grained material properties.
- Image generation: The ControlNet‑augmented diffusion model reduced FID by ~15 % compared to vanilla Stable Diffusion, and its CLIP‑Score improved by 0.08, indicating better adherence to the predicted color palette and translucency. However, subtle texture cues (e.g., surface sheen) remain difficult to capture.
- Cross‑task synergy: When the property predictor’s output was fed directly into the image generator, the end‑to‑end pipeline achieved a respectable visual fidelity, but error propagation (mis‑predicted opacity) noticeably degraded the generated image’s realism.
- Overall takeaway: The benchmark is solvable enough to produce meaningful baselines, yet challenging enough to leave ample room for improvement—especially in handling the high‑dimensional, chemically constrained space of glaze formulations.
Practical Implications
- Rapid prototyping for ceramic artists – A developer could integrate the property‑prediction API into a design tool, allowing artists to tweak ingredient ratios and instantly see predicted color/opacity, cutting down on costly kiln runs.
- E‑commerce and customization platforms – Manufacturers of pottery supplies could offer “virtual glaze try‑on” features, letting customers preview how a new glaze will look on their products before purchase.
- Materials‑by‑AI pipelines – The dataset and baseline models provide a template for other niche material domains (e.g., glass, enamel, polymer coatings) where formulation‑to‑property mapping is scarce.
- Educational tools – Interactive notebooks that expose the transformer‑based recipe encoder can teach chemistry students about the quantitative impact of metal oxides on glaze outcomes.
- Integration with existing AI stacks – Since the baselines rely on widely used libraries (scikit‑learn, PyTorch, Hugging Face Diffusers), developers can plug the models into CI pipelines, cloud functions, or even mobile apps with minimal friction.
Limitations & Future Work
- Dataset bias – The collected recipes are heavily skewed toward traditional earthenware and stoneware glazes; exotic or experimental formulations are under‑represented, limiting model generalization.
- Property scope – Only a handful of surface metrics (color, opacity, gloss) are captured; mechanical properties (e.g., durability, thermal shock resistance) are absent but crucial for industrial adoption.
- Image realism ceiling – Current diffusion models struggle with fine‑scale surface texture and specular highlights that are perceptually important for glaze evaluation.
- Error propagation – The two‑stage pipeline amplifies prediction mistakes; end‑to‑end multimodal training (jointly optimizing property regression and image synthesis) is a promising direction.
- Explainability – While tree‑based models offer feature importance, deep models remain black boxes; future work could explore attention visualizations or counterfactual analysis to help artisans understand why a certain ingredient drives a color shift.
By addressing these gaps, the community can move toward truly AI‑driven material design workflows that are both scientifically rigorous and artistically empowering.
Authors
- Ziyu Zhai
- Siyou Li
- Juexi Shao
- Juntao Yu
Paper Information
- arXiv ID: 2605.06641v1
- Categories: cs.AI, cs.CV
- Published: May 7, 2026
- PDF: Download PDF