[Paper] Enhancing Authorship Attribution with Synthetic Paintings

Published: 1 day ago (March 4, 2026 at 01:00 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2603.04343v1

Overview

The paper investigates whether synthetic paintings generated with modern text‑to‑image diffusion models can fill the data gap that has long hampered AI‑driven authorship attribution. By fine‑tuning Stable Diffusion via DreamBooth on a handful of real works, the authors create realistic “fake” paintings and blend them with the original dataset. The resulting hybrid training set boosts the accuracy of classifiers that try to guess who painted a given artwork—an advance that could make AI‑based art authentication more reliable in real‑world, data‑scarce scenarios.

Key Contributions

Synthetic data pipeline: Demonstrates how DreamBooth‑fine‑tuned Stable Diffusion can produce high‑fidelity paintings that preserve the stylistic nuances of a target artist.
Hybrid training strategy: Introduces a straightforward recipe for mixing real and synthetic images to improve downstream classification performance.
Empirical validation: Shows consistent gains in ROC‑AUC and overall accuracy across multiple artist‑pair experiments, confirming that synthetic samples act as effective regularizers.
Open‑source reproducibility: Provides code, model checkpoints, and a curated dataset split, enabling other researchers and developers to replicate and extend the work.

Methodology

Data collection: The authors start with a modest corpus of digitized paintings (≈ 200–300 per artist) from public museum archives.
DreamBooth fine‑tuning: For each artist, a Stable Diffusion model is fine‑tuned on the artist’s real works plus a few textual prompts that capture the artist’s name and style. This yields a personalized generator capable of producing new images that “look like” the artist.
Synthetic image generation: The fine‑tuned model generates 1–2 k synthetic paintings per artist, using diverse prompts to encourage variation while staying within the learned style distribution.
Hybrid dataset assembly: Real and synthetic images are combined in several ratios (e.g., 1:1, 1:2) to form training sets; a held‑out real‑only test set remains untouched for evaluation.
Classification model: A standard ResNet‑50 backbone (pre‑trained on ImageNet) is fine‑tuned on the hybrid sets to predict the artist label.
Evaluation metrics: ROC‑AUC, top‑1 accuracy, and confusion matrices are reported to assess both discriminative power and generalization.

The pipeline is deliberately simple—no exotic architectures or adversarial training—so developers can plug it into existing computer‑vision workflows with minimal friction.

Results & Findings

Training set	ROC‑AUC	Top‑1 Accuracy
Real only	0.78	71 %
Real + Synthetic (1:1)	0.86	78 %
Real + Synthetic (1:2)	0.84	76 %

Adding synthetic paintings consistently lifts ROC‑AUC by ~8–9 percentage points.
The best performance appears when synthetic data roughly matches the amount of real data (1:1 ratio), suggesting diminishing returns beyond that point.
Error analysis shows that the classifier becomes less prone to over‑fitting on idiosyncratic brush‑stroke artifacts present only in the limited real set.

In short, synthetic images act as a powerful data‑augmentation tool, improving both discrimination and robustness.

Practical Implications

Art market & authentication services: Companies that verify provenance can augment scarce provenance‑verified images with synthetic ones, reducing false‑positives/negatives without needing massive labeled collections.
Cultural heritage digitization: Museums digitizing small collections can train reliable style‑recognition models without waiting for large crowdsourced labeling efforts.
Developer tooling: The approach can be wrapped into a plug‑and‑play library—feed a few labeled images, get a synthetic generator, and train a classifier—all using familiar PyTorch or TensorFlow APIs.
Beyond paintings: The same hybrid strategy could be applied to any domain where data is scarce but generative models exist (e.g., historical document classification, medical imaging with synthetic lesions).

Limitations & Future Work

Synthetic realism ceiling: While DreamBooth captures high‑level style, subtle material cues (canvas texture, craquelure) are still missing, which may matter for forensic experts.
Artist similarity bias: The method works best when the target artists have distinct visual vocabularies; highly overlapping styles may still confuse the classifier.
Scalability to many artists: Fine‑tuning a separate diffusion model per artist becomes costly as the number of classes grows. Future work could explore multi‑artist conditioning or latent‑space interpolation to share generators.
Human evaluation: The paper relies on quantitative metrics; a user study with art historians would strengthen claims about “authentic‑looking” synthetic works.

Overall, the study opens a practical pathway for developers to leverage generative AI as a data‑augmentation ally in the niche but high‑stakes field of artwork authorship attribution.

Authors

Clarissa Loures
Caio Hosken
Luan Oliveira
Gianlucca Zuin
Adriano Veloso

Paper Information

arXiv ID: 2603.04343v1
Categories: cs.CV, cs.LG
Published: March 4, 2026
PDF: Download PDF

[Paper] Enhancing Authorship Attribution with Synthetic Paintings

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] SimpliHuMoN: Simplifying Human Motion Prediction

[Paper] ZipMap: Linear-Time Stateful 3D Reconstruction with Test-Time Training

[Paper] RANGER: Sparsely-Gated Mixture-of-Experts with Adaptive Retrieval Re-ranking for Pathology Report Generation

[Paper] Balancing Fidelity, Utility, and Privacy in Synthetic Cardiac MRI Generation: A Comparative Study