[Paper] Reducing Domain Gap with Diffusion-Based Domain Adaptation for Cell Counting

Published: (December 12, 2025 at 01:19 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.11763v1

Overview

Synthetic microscopy images are a cheap way to train deep‑learning models for tasks like cell counting, but the visual gap between computer‑generated and real microscope data often limits performance. This paper adapts a diffusion‑based style‑transfer framework (Inversion‑Based Style Transfer, InST) to bridge that gap, turning bland synthetic images into realistic‑looking samples that still preserve the underlying cell layout. The result: models pretrained on these “styled” images achieve dramatically lower counting errors than those trained on traditional synthetic or even real‑only data.

Key Contributions

  • Cross‑domain style transfer for microscopy – repurposes the InST framework (originally for artistic style transfer) to inject realistic fluorescence textures into synthetic cell images.
  • Latent‑space Adaptive Instance Normalization (AdaIN) + stochastic diffusion inversion – a novel combination that preserves cell geometry while randomizing visual appearance.
  • Extensive benchmarking – pre‑trains EfficientNet‑B0 on three data sources (hard‑coded synthetic, Cell200‑s, and InST‑styled synthetic) and fine‑tunes on real data, showing up to 37 % MAE reduction vs. hard‑coded synthetic and 52 % reduction vs. Cell200‑s.
  • Synergy with lightweight domain‑adaptation tricks – adding DACS + CutMix on top of InST‑styled data yields further gains, proving the method complements existing adaptation pipelines.
  • Open‑source release – full code, pretrained models, and data generation scripts are publicly available, enabling immediate reproducibility.

Methodology

  1. Synthetic base generation – start from a conventional cell‑simulation pipeline that produces clean binary masks and corresponding grayscale images (no realistic texture).
  2. Style source collection – gather a modest set of real fluorescence microscopy frames (no annotations needed).
  3. Diffusion model inversion – feed each synthetic image into a pretrained diffusion model and run stochastic inversion to map it into the model’s latent space.
  4. Latent‑space AdaIN – compute channel‑wise mean/variance statistics of the real‑style latent codes and apply Adaptive Instance Normalization to the synthetic latent codes, effectively swapping texture statistics while keeping spatial structure.
  5. Re‑generation – run the diffusion decoder to synthesize a new image that now carries the real‑style texture but retains the original cell layout.
  6. Training pipeline – pre‑train EfficientNet‑B0 on the styled synthetic set, then fine‑tune on a small real‑labeled subset. Optional DACS (domain‑aware contrastive loss) and CutMix augmentations are added during fine‑tuning.

The whole pipeline runs on a single GPU in a few hours, making it practical for labs without massive compute budgets.

Results & Findings

Training dataMAE (cell count)
Hard‑coded synthetic only41.3
Cell200‑s synthetic53.7
Real data only27.7
InST‑styled synthetic25.9
InST + DACS + CutMix (fine‑tune)23.4
  • 37 % MAE drop compared to the baseline synthetic pipeline.
  • 52 % MAE drop versus the public Cell200‑s dataset, turning a previously underperforming synthetic source into the best pre‑training material.
  • Even when the real‑only baseline is strong, InST‑styled data outperforms it (25.9 vs. 27.7 MAE).
  • Adding lightweight domain‑adaptation (DACS + CutMix) on top of InST pushes performance further, showing the method is compatible with existing tricks rather than a replacement.

Practical Implications

  • Reduced labeling cost – labs can generate thousands of realistic training images from a handful of unlabeled real frames, slashing the need for exhaustive manual counting.
  • Faster model iteration – pre‑training on styled synthetic data yields a strong initialization, so fine‑tuning on a small real set converges in fewer epochs, saving compute time.
  • Plug‑and‑play for other microscopy tasks – the same InST pipeline can be repurposed for segmentation, phenotype classification, or drug‑response prediction where texture realism matters.
  • Edge‑device readiness – the downstream model (EfficientNet‑B0) is lightweight enough for deployment on embedded systems (e.g., on‑instrument analysis modules), and the synthetic data generation can be done offline.
  • Open‑source toolkit – developers can integrate the provided scripts into CI pipelines, automatically refreshing synthetic datasets as new real samples become available.

Limitations & Future Work

  • Dependence on a diffusion model – the quality of style transfer hinges on the pretrained diffusion backbone; training a domain‑specific diffusion model could improve results but adds overhead.
  • Partial content preservation – while cell layout is largely retained, extreme morphological variations (e.g., overlapping clusters) sometimes get smoothed out during inversion.
  • Evaluation limited to one cell‑counting architecture (EfficientNet‑B0); broader testing on transformer‑based or segmentation‑centric models would solidify generality.
  • Scalability to 3‑D microscopy – the current pipeline works on 2‑D slices; extending to volumetric data will require 3‑D diffusion models and memory‑efficient inversion strategies.

Future research directions include training task‑specific diffusion models, exploring multi‑style conditioning (e.g., different staining protocols), and integrating self‑supervised pre‑training to further reduce the need for any labeled real data.

Authors

  • Mohammad Dehghanmanshadi
  • Wallapak Tavanapong

Paper Information

  • arXiv ID: 2512.11763v1
  • Categories: cs.CV
  • Published: December 12, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »