[Paper] LSP-DETR: Efficient and Scalable Nuclei Segmentation in Whole Slide Images

Published: (January 6, 2026 at 11:35 AM EST)
3 min read
Source: arXiv

Source: arXiv - 2601.03163v1

Overview

The paper presents LSP‑DETR, a new end‑to‑end framework for nuclei instance segmentation in gigapixel Whole‑Slide Images (WSIs). By marrying a lightweight transformer with a star‑convex polygon representation, the authors achieve fast, scalable segmentation without the patch‑wise processing and heavy post‑processing that have limited previous methods.

Key Contributions

  • Linear‑complexity transformer that can ingest much larger image patches than conventional DETR‑style models, keeping compute roughly constant.
  • Star‑convex polygon encoding of each nucleus, enabling a compact yet expressive shape description.
  • Radial distance loss that naturally separates overlapping nuclei, eliminating the need for explicit overlap annotations or handcrafted post‑processing steps.
  • Fully end‑to‑end training (no separate detection → segmentation pipelines), simplifying deployment.
  • State‑of‑the‑art speed/accuracy trade‑off: >5× faster than the next‑fastest method while matching or surpassing segmentation quality on benchmark datasets (PanNuke, MoNuSeg).

Methodology

  1. Input handling – Instead of slicing WSIs into tiny patches, LSP‑DETR processes relatively large crops (e.g., 1024 × 1024 px) using a transformer encoder whose attention is approximated with linear‑complexity kernels (e.g., Performer or Linformer). This keeps memory usage low even for high‑resolution inputs.

  2. Object representation – Each nucleus is modeled as a star‑convex polygon defined by a set of radial distances from a central point to the polygon vertices. This representation captures irregular nuclear shapes with far fewer parameters than a full mask.

  3. Prediction head – The transformer decoder outputs a fixed‑size set of queries. For each query, the network predicts:

    • a confidence score,
    • the centroid coordinates, and
    • a vector of radial distances (one per predefined angle).
  4. Loss function – The radial distance loss combines an L1 term on the predicted radii with a novel overlap‑aware term that penalizes inconsistent ordering of radii for neighboring nuclei. Because the loss is defined per‑radius, the model learns to shrink overlapping regions without any explicit overlap masks.

  5. Training & inference – The system is trained end‑to‑end on standard nuclei datasets. At inference time, the predicted polygons are rasterized into binary masks on the fly, yielding the final segmentation map. No extra clustering, watershed, or morphological cleanup is required.

Results & Findings

DatasetmAP (seg)Inference time (per 1024 × 1024 crop)Speed‑up vs. runner‑up
PanNuke0.7145 ms5.3×
MoNuSeg0.7838 ms5.1×
  • Accuracy: LSP‑DETR matches or exceeds the best published instance segmentation scores, especially on challenging overlapping nuclei.
  • Efficiency: Linear‑complexity attention reduces GPU memory footprints, allowing larger crops and fewer forward passes.
  • Generalization: Models trained on one tissue type transfer well to unseen organs, indicating robust feature learning.

Practical Implications

  • Accelerated pathology pipelines – Pathology labs can run nuclei segmentation on whole‑slide scans in near‑real time, enabling rapid downstream analyses (e.g., tumor grading, biomarker quantification).
  • Simplified deployment – The single‑stage, end‑to‑end nature means fewer moving parts (no patch stitching, no post‑processing scripts), reducing engineering overhead and potential sources of bugs.
  • Edge‑friendly inference – Because the transformer’s attention scales linearly, the model can be run on modest GPUs or even high‑end CPUs, opening possibilities for on‑premise or cloud‑cost‑effective services.
  • Extensible to other instance‑segmentation tasks – The star‑convex polygon + radial loss paradigm can be adapted to segment other small, densely packed objects (e.g., cells in microscopy, particles in materials science).

Limitations & Future Work

  • Shape bias – Star‑convex polygons assume roughly convex nuclei; highly concave or multi‑lobed structures may be under‑represented.
  • Fixed angular resolution – The number of radial rays is a hyper‑parameter; too few rays limit shape fidelity, too many increase prediction overhead.
  • Training data dependency – While generalization is strong, extreme domain shifts (e.g., different staining protocols) still require fine‑tuning.
  • Future directions – The authors suggest exploring adaptive ray sampling, integrating self‑supervised pre‑training on unlabeled WSIs, and extending the framework to 3‑D histology volumes.

Authors

  • Matěj Pekár
  • Vít Musil
  • Rudolf Nenutil
  • Petr Holub
  • Tomáš Brázdil

Paper Information

  • arXiv ID: 2601.03163v1
  • Categories: cs.CV
  • Published: January 6, 2026
  • PDF: Download PDF
Back to Blog

Related posts

Read more »