[Paper] Closing the Domain Gap in Biomedical Imaging by In-Context Control Samples

Published: (April 22, 2026 at 01:49 PM EDT)
4 min read
Source: arXiv

Source: arXiv - 2604.20824v1

Overview

Batch effects—systematic technical variations that are unrelated to the biological signal—are the biggest obstacle to deploying deep‑learning models on real‑world biomedical images. The authors introduce Control‑Stabilized Adaptive Risk Minimization via Batch Normalization (CS‑ARM‑BN), a meta‑learning adaptation technique that leverages the “negative control” images that are automatically captured in every experimental batch. On the massive JUMP‑CP drug‑discovery dataset, CS‑ARM‑BN restores near‑training‑domain performance, effectively closing the domain gap that has plagued prior approaches.

Key Contributions

  • In‑context adaptation using control samples: Turns the ubiquitous unperturbed reference images into a stable context for domain adaptation.
  • CS‑ARM‑BN algorithm: Combines meta‑learning (Adaptive Risk Minimization) with batch‑normalization‑based statistics derived from control samples.
  • Empirical breakthrough: Achieves 0.935 ± 0.018 accuracy on out‑of‑domain batches, compared with a drop to 0.862 ± 0.060 for standard ResNets.
  • Robustness to extreme shifts: Demonstrates that when new batches come from a different lab, the method remains stable thanks to the always‑available controls.
  • Open‑source validation pipeline: Provides code and training scripts that can be plugged into existing PyTorch/TensorFlow workflows.

Methodology

  1. Problem framing: Treat each experimental batch as a separate “task” with its own distribution shift (batch effect).
  2. Negative control samples: Every batch contains a set of reference images (e.g., untreated cells). These are assumed to share the same underlying biology across batches and thus act as anchors.
  3. Meta‑learning loop (ARM):
    • Inner loop: Fine‑tune a base model on the current batch using only the control samples to estimate batch‑specific statistics (mean/variance).
    • Outer loop: Update the shared model parameters so that after the inner adaptation the model performs well on the labeled (perturbed) images of that batch.
  4. Batch Normalization (BN) integration: The BN layers are re‑parameterized to accept the control‑derived statistics, allowing the model to “normalize away” batch effects without altering the learned feature extractor.
  5. Training pipeline: Standard ResNet‑50 backbone, Adam optimizer, and a modest number of meta‑training epochs (≈10–15) suffice because the control samples provide a strong signal.

The whole procedure can be wrapped in a few lines of code and runs on a single GPU for datasets the size of JUMP‑CP.

Results & Findings

ModelIn‑domain accuracyOut‑of‑domain accuracy
Standard ResNet‑500.939 ± 0.0050.862 ± 0.060
Foundation model + Typical Variation Normalization≈0.90≈0.88 (still a gap)
CS‑ARM‑BN (proposed)0.935 ± 0.018
  • The gap between training and new‑batch performance shrinks from ~8 % to <1 %.
  • When the new batch originates from a completely different laboratory (larger covariate shift), CS‑ARM‑BN remains stable, whereas vanilla meta‑learning diverges.
  • Ablation studies show that removing the control‑sample statistics from BN degrades performance back to the baseline, confirming their central role.

Practical Implications

  • Drug‑discovery pipelines: Researchers can train a single model on historical plates and reliably apply it to new screening runs without re‑training or costly domain‑specific calibration.
  • Clinical imaging: Hospitals that acquire microscopy data with slightly different hardware or staining protocols can adopt the same model, using the routine control slides as adaptation anchors.
  • MLOps integration: CS‑ARM‑BN fits into existing CI/CD for ML; the adaptation step is a lightweight forward‑pass that updates BN statistics on‑the‑fly, making it suitable for real‑time inference services.
  • Cost reduction: Eliminates the need for large labeled re‑annotation campaigns whenever a batch‑effect change occurs, saving both time and expert labor.

Limitations & Future Work

  • Dependence on control quality: If the negative controls are noisy, mislabeled, or missing, the adaptation may fail.
  • Scalability to ultra‑large models: The current experiments use ResNet‑50; extending to billion‑parameter foundation models may require more sophisticated BN handling.
  • Generalization beyond imaging: The method assumes a clear set of control samples; applying it to modalities without such built‑in references (e.g., genomics) will need new strategies.
  • Future directions:
    • Explore alternative normalization schemes (e.g., LayerNorm) that could be more robust to scarce controls.
    • Combine CS‑ARM‑BN with self‑supervised pre‑training to further reduce labeled data requirements.
    • Open a benchmark suite for batch‑effect adaptation across multiple biomedical imaging domains.

Authors

  • Ana Sanchez-Fernandez
  • Thomas Pinetz
  • Werner Zellinger
  • Günter Klambauer

Paper Information

  • arXiv ID: 2604.20824v1
  • Categories: cs.LG, q-bio.QM
  • Published: April 22, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »