[Paper] TopoLoRA-SAM: Topology-Aware Parameter-Efficient Adaptation of Foundation Segmenters for Thin-Structure and Cross-Domain Binary Semantic Segmentation

Published: (January 5, 2026 at 12:03 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2601.02273v1

Overview

The paper introduces TopoLoRA‑SAM, a lightweight, topology‑aware adaptation layer that lets the large‑scale Segment Anything Model (SAM) be repurposed for binary semantic segmentation tasks such as retinal vessel, polyp, and SAR sea‑land detection. By freezing SAM’s massive Vision Transformer (ViT) backbone and only training a few percent of its parameters, the authors achieve state‑of‑the‑art accuracy on thin‑structure and noisy‑modality datasets while keeping compute and memory footprints low.

Key Contributions

  • Parameter‑efficient adaptation: Injects Low‑Rank Adaptation (LoRA) modules into the frozen ViT encoder, training only ~5.2 % of SAM’s parameters (~4.9 M).
  • Topology‑aware supervision: Adds an optional differentiable clDice loss that explicitly penalizes topological errors, crucial for thin structures like blood vessels.
  • Hybrid adapter design: Combines LoRA with a lightweight spatial convolutional adapter to capture both global context (via ViT) and local detail (via convolutions).
  • Comprehensive benchmarking: Evaluates on five diverse binary segmentation datasets (retinal vessels, polyps, SAR sea/land) and outperforms strong baselines (U‑Net, DeepLabV3+, SegFormer, Mask2Former).
  • Open‑source implementation: Provides reproducible code and pretrained adapters, enabling rapid experimentation.

Methodology

  1. Freeze the SAM backbone: The pre‑trained ViT encoder and mask decoder remain unchanged, preserving SAM’s zero‑shot knowledge.
  2. Insert LoRA adapters: For each linear projection in the ViT, a low‑rank pair of matrices (ΔW = A Bᵀ) is added. During training only A and B are updated, drastically reducing the number of trainable weights.
  3. Add a spatial convolutional adapter: A small 3 × 3 convolution block sits after the ViT output, injecting locality that pure transformer layers may miss.
  4. Topology‑aware loss (optional): The differentiable clDice metric measures overlap of skeletonized predictions and ground‑truth, encouraging preservation of thin, elongated structures. The total loss = standard binary cross‑entropy + Dice + λ·clDice (when used).
  5. Training pipeline: Fine‑tune only the adapters on the target dataset using standard SGD/Adam optimizers; the rest of SAM stays frozen, so GPU memory usage is comparable to training a modest CNN.

Results & Findings

DatasetMetric (Dice)TopoLoRA‑SAMBest Baseline
DRIVE (retina)0.820.840.81 (Mask2Former)
STARE (retina)0.800.830.78
CHASE_DB1 (retina)0.780.820.74
Kvasir‑SEG (polyp)0.910.920.90
SL‑SSDD (SAR)0.880.890.86
  • Parameter efficiency: Only 5.2 % of SAM’s parameters are updated, yet the average Dice improvement over baselines is +2.3 %.
  • Thin‑structure boost: On CHASE_DB1, the clDice‑augmented version reduces broken‑vessel errors by ~30 % compared to a vanilla LoRA‑only variant.
  • Cross‑domain robustness: The same adapter set works across optical, endoscopic, and radar modalities without any architectural changes.

Practical Implications

  • Rapid domain adaptation: Developers can take a pre‑trained SAM model and, with a few hours of fine‑tuning on a modest GPU, obtain a specialist binary segmenter for medical imaging, remote sensing, or industrial inspection.
  • Lower compute cost: Because the backbone stays frozen, training memory and time are comparable to training a small CNN, making it feasible on consumer‑grade hardware or in CI pipelines.
  • Plug‑and‑play for thin structures: The topology‑aware loss can be toggled on/off, allowing teams to prioritize structural fidelity (e.g., vessel tracing, road network extraction) without redesigning the network.
  • Unified codebase: With the open‑source adapters, teams can maintain a single SAM‑based inference service and swap in task‑specific adapters on the fly, simplifying deployment and versioning.
  • Potential for continual learning: Since only adapters are updated, new domains can be added incrementally without risking catastrophic forgetting of previously learned tasks.

Limitations & Future Work

  • Binary focus: The current framework targets binary masks; extending to multi‑class segmentation would require redesigning the adapter heads and loss weighting.
  • Dependency on SAM’s pretraining bias: If the target domain is far from SAM’s training distribution (e.g., hyperspectral imagery), the frozen backbone may limit performance despite adapters.
  • Topology loss overhead: Computing clDice adds a modest runtime cost during training; optimizing its implementation for large‑scale datasets is an open challenge.
  • Future directions: The authors suggest exploring adapter stacking for hierarchical tasks, integrating prompt engineering (e.g., point or box prompts) to further reduce annotation effort, and evaluating on 3‑D volumetric data such as OCT or CT scans.

Authors

  • Salim Khazem

Paper Information

  • arXiv ID: 2601.02273v1
  • Categories: cs.CV, cs.AI, cs.LG
  • Published: January 5, 2026
  • PDF: Download PDF
Back to Blog

Related posts

Read more »