[Paper] Mask-HybridGNet: Graph-based segmentation with emergent anatomical correspondence from pixel-level supervision

Published: (February 24, 2026 at 01:29 PM EST)
5 min read
Source: arXiv

Source: arXiv - 2602.21179v1

Overview

Mask‑HybridGNet is a new framework that lets you train graph‑based medical image segmenters using only ordinary pixel‑wise masks—the kind of annotations that are already abundant in public datasets. By doing so, it sidesteps the long‑standing bottleneck of needing manually placed anatomical landmarks with exact point‑to‑point correspondence across patients. The result is a model that not only produces accurate segmentations but also learns a stable, implicit anatomical atlas, opening the door to downstream tasks such as temporal tracking and population‑level shape analysis.

Key Contributions

  • Pixel‑mask‑to‑graph training pipeline – eliminates the need for handcrafted landmark annotations while preserving the benefits of graph‑structured outputs.
  • Chamfer‑distance supervision + edge regularization – aligns variable‑length ground‑truth contours with a fixed‑size landmark graph and enforces smooth, evenly spaced landmarks.
  • Differentiable rasterization layer – bridges the graph representation back to a pixel mask for end‑to‑end learning with standard segmentation losses.
  • Emergent anatomical correspondence – the model automatically learns consistent landmark locations across subjects, effectively building an atlas without explicit supervision.
  • Broad experimental validation – tested on chest X‑rays, cardiac ultrasound, cardiac MRI, and fetal ultrasound, achieving performance on par with state‑of‑the‑art pixel‑based networks while guaranteeing topological consistency.
  • Atlas extraction from any pretrained mask model – the framework can retro‑fit existing segmentation networks to produce structured, correspondence‑aware outputs.

Methodology

  1. Fixed‑topology graph definition – A pre‑specified graph (e.g., a closed polyline for a heart chamber) defines the number of landmarks and their connectivity.
  2. Hybrid encoder‑decoder – An image encoder extracts deep features; a graph decoder predicts the 2‑D coordinates of each landmark.
  3. Chamfer loss – Computes the bidirectional nearest‑neighbor distance between the predicted landmark set and the sampled points from the ground‑truth mask contour, allowing a variable‑length mask to supervise a fixed‑size graph.
  4. Edge‑based regularizer – Penalizes large deviations in edge length and angle between neighboring landmarks, encouraging smooth, evenly spaced points along the anatomy.
  5. Differentiable rasterizer – Converts the predicted landmark polygon back into a binary mask; this rasterized mask is then compared to the original mask with a standard Dice/CE loss.
  6. End‑to‑end training – The Chamfer, regularization, and rasterization losses are summed, enabling the whole pipeline to be optimized with gradient descent using only the pixel masks.

The graph’s adjacency matrix stays static throughout training, guaranteeing that the output always respects the intended topology (no broken loops, no self‑intersections).

Results & Findings

ModalityMetric (Dice)Compared to SOTA pixel models
Chest X‑ray (lung fields)0.93+0.01
Cardiac US (RV)0.88–0.02
Cardiac MRI (LV)0.95+0.00
Fetal US (head)0.90+0.03
  • Segmentation quality is on par with or slightly better than leading CNN/Transformer‑based pixel segmenters.
  • Topological guarantees: All predicted contours are closed and non‑self‑intersecting, something pixel‑only methods can violate without post‑processing.
  • Correspondence consistency: Visualizing landmark indices across a cohort shows that, e.g., landmark 7 always lands near the apex of the left ventricle, confirming emergent atlas formation.
  • Runtime: Inference adds ~15 ms per slice compared to a pure pixel model—negligible for most clinical pipelines.

Practical Implications

  • Rapid development of anatomy‑aware tools – Developers can now plug Mask‑HybridGNet into existing segmentation pipelines and instantly obtain a structured representation suitable for downstream analysis (e.g., shape statistics, disease progression tracking).
  • Cross‑modal and longitudinal studies – Because landmarks are consistently indexed, you can align scans across time points or modalities without building a separate registration step.
  • Regulatory friendliness – Fixed topology and guaranteed connectivity simplify validation and compliance checks for medical‑device software.
  • Legacy data leverage – Any dataset that only provides masks (the majority of public medical imaging repositories) can be turned into a correspondence‑rich resource, enabling the creation of population atlases without extra annotation cost.
  • Potential for non‑medical domains – The same idea could be applied to any segmentation task where shape consistency matters (e.g., satellite imagery road networks, industrial part inspection).

Limitations & Future Work

  • Fixed graph topology – The current design assumes a known number of landmarks and a pre‑defined connectivity pattern; highly variable anatomies may need adaptive graph structures.
  • 2‑D focus – Experiments are limited to 2‑D slices; extending to full 3‑D volumes will require more memory‑efficient graph decoders and possibly hierarchical graph representations.
  • Dependence on mask quality – Noisy or coarse masks can propagate errors into the learned atlas; future work could incorporate uncertainty modeling or semi‑supervised refinement.
  • Atlas interpretability – While landmarks become consistent, the paper does not provide a quantitative evaluation of the anatomical meaning of each index; a follow‑up study could map indices to clinical landmarks explicitly.

Overall, Mask‑HybridGNet demonstrates that you can get the best of both worlds—high‑quality pixel segmentation and a structured, correspondence‑aware representation—using only the data that’s already plentiful in the medical imaging community. This opens a practical pathway for developers to build smarter, more reliable health‑tech applications without the prohibitive cost of manual landmark annotation.

Authors

  • Nicolás Gaggion
  • Maria J. Ledesma‑Carbayo
  • Stergios Christodoulidis
  • Maria Vakalopoulou
  • Enzo Ferrante

Paper Information

  • arXiv ID: 2602.21179v1
  • Categories: cs.CV
  • Published: February 24, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »

[Paper] A Dataset is Worth 1 MB

A dataset server must often distribute the same large payload to many clients, incurring massive communication costs. Since clients frequently operate on divers...