[Paper] Closing the Domain Gap in Biomedical Imaging by In-Context Control Samples

Published: 2 days ago (April 22, 2026 at 01:49 PM EDT)

4 min read

Source: arXiv

Source: arXiv - 2604.20824v1

Overview

Batch effects—systematic technical variations that are unrelated to the biological signal—are the biggest obstacle to deploying deep‑learning models on real‑world biomedical images. The authors introduce Control‑Stabilized Adaptive Risk Minimization via Batch Normalization (CS‑ARM‑BN), a meta‑learning adaptation technique that leverages the “negative control” images that are automatically captured in every experimental batch. On the massive JUMP‑CP drug‑discovery dataset, CS‑ARM‑BN restores near‑training‑domain performance, effectively closing the domain gap that has plagued prior approaches.

Key Contributions

In‑context adaptation using control samples: Turns the ubiquitous unperturbed reference images into a stable context for domain adaptation.
CS‑ARM‑BN algorithm: Combines meta‑learning (Adaptive Risk Minimization) with batch‑normalization‑based statistics derived from control samples.
Empirical breakthrough: Achieves 0.935 ± 0.018 accuracy on out‑of‑domain batches, compared with a drop to 0.862 ± 0.060 for standard ResNets.
Robustness to extreme shifts: Demonstrates that when new batches come from a different lab, the method remains stable thanks to the always‑available controls.
Open‑source validation pipeline: Provides code and training scripts that can be plugged into existing PyTorch/TensorFlow workflows.

Methodology

Problem framing: Treat each experimental batch as a separate “task” with its own distribution shift (batch effect).
Negative control samples: Every batch contains a set of reference images (e.g., untreated cells). These are assumed to share the same underlying biology across batches and thus act as anchors.
Meta‑learning loop (ARM):
- Inner loop: Fine‑tune a base model on the current batch using only the control samples to estimate batch‑specific statistics (mean/variance).
- Outer loop: Update the shared model parameters so that after the inner adaptation the model performs well on the labeled (perturbed) images of that batch.
Batch Normalization (BN) integration: The BN layers are re‑parameterized to accept the control‑derived statistics, allowing the model to “normalize away” batch effects without altering the learned feature extractor.
Training pipeline: Standard ResNet‑50 backbone, Adam optimizer, and a modest number of meta‑training epochs (≈10–15) suffice because the control samples provide a strong signal.

The whole procedure can be wrapped in a few lines of code and runs on a single GPU for datasets the size of JUMP‑CP.

Results & Findings

Model	In‑domain accuracy	Out‑of‑domain accuracy
Standard ResNet‑50	0.939 ± 0.005	0.862 ± 0.060
Foundation model + Typical Variation Normalization	≈0.90	≈0.88 (still a gap)
CS‑ARM‑BN (proposed)	—	0.935 ± 0.018

The gap between training and new‑batch performance shrinks from ~8 % to <1 %.
When the new batch originates from a completely different laboratory (larger covariate shift), CS‑ARM‑BN remains stable, whereas vanilla meta‑learning diverges.
Ablation studies show that removing the control‑sample statistics from BN degrades performance back to the baseline, confirming their central role.

Practical Implications

Drug‑discovery pipelines: Researchers can train a single model on historical plates and reliably apply it to new screening runs without re‑training or costly domain‑specific calibration.
Clinical imaging: Hospitals that acquire microscopy data with slightly different hardware or staining protocols can adopt the same model, using the routine control slides as adaptation anchors.
MLOps integration: CS‑ARM‑BN fits into existing CI/CD for ML; the adaptation step is a lightweight forward‑pass that updates BN statistics on‑the‑fly, making it suitable for real‑time inference services.
Cost reduction: Eliminates the need for large labeled re‑annotation campaigns whenever a batch‑effect change occurs, saving both time and expert labor.

Limitations & Future Work

Dependence on control quality: If the negative controls are noisy, mislabeled, or missing, the adaptation may fail.
Scalability to ultra‑large models: The current experiments use ResNet‑50; extending to billion‑parameter foundation models may require more sophisticated BN handling.
Generalization beyond imaging: The method assumes a clear set of control samples; applying it to modalities without such built‑in references (e.g., genomics) will need new strategies.
Future directions:
- Explore alternative normalization schemes (e.g., LayerNorm) that could be more robust to scarce controls.
- Combine CS‑ARM‑BN with self‑supervised pre‑training to further reduce labeled data requirements.
- Open a benchmark suite for batch‑effect adaptation across multiple biomedical imaging domains.

Authors

Ana Sanchez-Fernandez
Thomas Pinetz
Werner Zellinger
Günter Klambauer

Paper Information

arXiv ID: 2604.20824v1
Categories: cs.LG, q-bio.QM
Published: April 22, 2026
PDF: Download PDF

[Paper] Closing the Domain Gap in Biomedical Imaging by In-Context Control Samples

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Seeing Fast and Slow: Learning the Flow of Time in Videos

[Paper] Temporal Taskification in Streaming Continual Learning: A Source of Evaluation Instability

[Paper] Fine-Tuning Regimes Define Distinct Continual Learning Problems

[Paper] The Sample Complexity of Multicalibration