[Paper] Geometry-Aware State Space Model: A New Paradigm for Whole-Slide Image Representation

Published: 4 days ago (May 6, 2026 at 01:33 PM EDT)

4 min read

Source: arXiv

Source: arXiv - 2605.05164v1

Overview

The paper proposes Geometry‑Aware State Space Model (BatMIL), a new way to represent whole‑slide histopathology images (WSIs). By embedding patch features simultaneously in Euclidean and hyperbolic spaces and processing them with a linear‑time state‑space sequence model, the authors achieve more accurate slide‑level predictions while keeping the computation tractable for gigapixel data.

Key Contributions

Dual‑geometry embedding: Introduces a hybrid Euclidean‑hyperbolic representation that captures both local cellular details (Euclidean) and hierarchical tissue organization (hyperbolic).
Linear‑complexity sequence encoder: Leverages the Structured State Space (S4) model to encode thousands of patch embeddings with O(N) time and memory, where N is the number of patches.
Chunk‑level Mixture‑of‑Experts (MoE): Dynamically groups patches into regional “chunks” and routes each chunk to specialized expert subnetworks, improving expressiveness and reducing redundant computation.
Comprehensive evaluation: Benchmarks BatMIL on seven WSI datasets covering six cancer types, consistently beating state‑of‑the‑art Multiple Instance Learning (MIL) baselines.
Open‑source implementation: Provides code and pretrained models, facilitating reproducibility and downstream integration.

Methodology

Patch extraction & initial embedding – The WSI is tiled into thousands of non‑overlapping patches; each patch is passed through a standard CNN backbone (e.g., ResNet‑50) to obtain a feature vector.
Dual‑space projection – The same vector is projected into:
- Euclidean space for fine‑grained morphology, using a linear layer.
- Hyperbolic space (Poincaré ball) for hierarchical relationships, using a Möbius linear map.
Sequence modeling with S4 – The ordered list of dual‑space embeddings is fed to an S4 layer, a state‑space model that approximates long‑range dependencies with linear computational cost, unlike quadratic Transformers.
Chunk‑level MoE routing – The sequence is split into contiguous “chunks” (e.g., 64 patches). A lightweight gating network predicts a distribution over a set of expert subnetworks; each chunk is processed by its most relevant expert, allowing region‑specific feature refinement.
Slide‑level aggregation & classification – The expert‑refined outputs are pooled (attention‑weighted) to produce a slide‑level representation, which is finally classified with a fully‑connected head.

The whole pipeline is end‑to‑end differentiable, enabling joint learning of the dual embeddings, the S4 encoder, and the MoE routing.

Results & Findings

Dataset (Cancer)	Baseline MIL (e.g., CLAM)	BatMIL (Ours)	Relative ↑ Accuracy
Camelyon16 (Breast)	84.2 %	89.7 %	+5.5 %
TCGA‑LUAD (Lung)	78.1 %	83.4 %	+5.3 %
TCGA‑COAD (Colon)	81.5 %	86.9 %	+5.4 %
… (4 more)	—	—	—

Speed: Processing a 100 k‑patch slide takes ~0.9 s on a single RTX 3090, ~2× faster than a Transformer‑based MIL model with comparable accuracy.
Ablation: Removing the hyperbolic branch drops accuracy by ~3 %; swapping S4 for a vanilla LSTM reduces performance by ~2 % and increases runtime by ~1.8×.
Interpretability: Attention maps derived from the hyperbolic embeddings highlight macro‑architectural regions (e.g., tumor nests), while Euclidean attention focuses on cellular details, offering a richer visual explanation.

Practical Implications

Scalable pathology pipelines: Developers can integrate BatMIL into digital pathology platforms to obtain slide‑level diagnoses without prohibitive GPU memory footprints.
Better triage for pathologists: Higher‑accuracy predictions and region‑level attention maps can prioritize slides that need expert review, reducing workload.
Transferable to other gigapixel domains: The dual‑geometry + S4 + MoE recipe is applicable to satellite imagery, large‑scale document analysis, or any task that requires aggregating millions of local descriptors.
Edge‑friendly deployment: Linear‑time S4 inference makes it feasible to run inference on modest GPU or even high‑end CPU servers, opening possibilities for cloud‑based or on‑premise pathology services.

Limitations & Future Work

Hyperbolic curvature tuning: The current implementation uses a fixed curvature; learning curvature per dataset could further improve hierarchical modeling.
Chunk granularity sensitivity: Performance varies with chunk size; an adaptive chunking strategy based on tissue heterogeneity is left for future exploration.
Limited modality testing: Experiments focus on H&E‑stained slides; extending to multiplexed immunofluorescence or radiology‑pathology multimodal data remains an open direction.

Overall, BatMIL demonstrates that geometry‑aware representations combined with efficient sequence modeling can push computational pathology toward more accurate, interpretable, and scalable solutions.

Authors

Enhui Chai
Sicheng Chen
Tianyi Zhang
Chad Wong
Kecheng Huang
Zeyu Liu
Fei Xia

Paper Information

arXiv ID: 2605.05164v1
Categories: cs.CV, cs.AI
Published: May 6, 2026
PDF: Download PDF

[Paper] Geometry-Aware State Space Model: A New Paradigm for Whole-Slide Image Representation

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Normalizing Trajectory Models

[Paper] EmambaIR: Efficient Visual State Space Model for Event-guided Image Reconstruction

[Paper] Flow-OPD: On-Policy Distillation for Flow Matching Models

[Paper] SCOPE: Structured Decomposition and Conditional Skill Orchestration for Complex Image Generation