[Paper] DALight-3D: A Lightweight 3D U-Net for Brain Tumor Segmentation from Multi-Modal MRI

Published: (May 6, 2026 at 01:54 AM EDT)
4 min read
Source: arXiv

Source: arXiv - 2605.04518v1

Overview

The paper introduces DALight‑3D, a streamlined 3D U‑Net architecture designed to segment brain tumors from multi‑modal MRI scans. By swapping out heavyweight 3‑D convolutions for depthwise‑separable variants and adding a few clever attention‑style tricks, the authors achieve higher Dice scores than several classic 3‑D segmentation nets while using 30 % fewer parameters—a win for both accuracy and compute efficiency.

Key Contributions

  • Depthwise‑separable 3‑D convolutions to cut parameter count and FLOPs without sacrificing representational power.
  • Identifier‑conditioned normalization (ICN) that injects modality‑specific information (e.g., T1, T2, FLAIR) directly into the normalization layers.
  • Cross‑slice attention (CSA) that lets the network reason across the axial dimension, improving context for thin tumor structures.
  • Adaptive skip‑fusion block (SSFB) that learns how much to trust encoder features at each decoder level, replacing the static concatenation used in vanilla U‑Net.
  • Comprehensive ablation study confirming each component’s contribution to the final performance.

Methodology

DALight‑3D follows the classic encoder‑decoder “U‑shape” but with three major redesigns:

  1. Encoder blocks use 3‑D depthwise‑separable convolutions (a depthwise filter per channel followed by a pointwise 1×1×1 convolution). This reduces the number of learnable weights dramatically while still capturing 3‑D spatial patterns.

  2. Identifier‑conditioned normalization replaces standard batch/instance norm. For each MRI modality, a small learned vector (the “identifier”) is added to the scaling and bias parameters of the norm layer, allowing the network to adapt its statistics per modality without extra heavy branches.

  3. Cross‑slice attention operates on the feature map after the encoder’s bottleneck. It computes attention weights across the slice dimension (the third axis) so that information from neighboring slices can be re‑weighted, helping the model capture elongated or thin tumor regions that may span only a few slices.

  4. Adaptive skip‑fusion block (SSFB) replaces the fixed concatenation of encoder‑decoder features. Instead, a lightweight gating network learns a per‑channel weighting, effectively deciding how much of the high‑resolution encoder signal to pass to the decoder at each scale.

Training follows the standard Dice‑plus‑Cross‑Entropy loss on the Medical Segmentation Decathlon (MSD) Task01 BrainTumour dataset, using the same optimizer, learning‑rate schedule, and data augmentations as the baseline models for a fair comparison.

Results & Findings

Model (params)Mean Dice (50‑epoch)
Residual 3‑D U‑Net (3.20 M)0.710
V‑Net (≈3.5 M)0.698
Attention U‑Net (≈3.1 M)0.704
DALight‑3D (2.22 M)0.727
  • Higher Dice with fewer parameters: DALight‑3D outperforms all baselines despite a 30 % reduction in model size.
  • Ablation insights: Removing any of the four core components (SepConv, ICN, CSA, SSFB) drops the Dice by 0.01–0.03, confirming that each piece contributes meaningfully.
  • Training efficiency: Fewer parameters translate to faster epoch times and lower GPU memory footprints, making the model viable on mid‑range hardware (e.g., a single RTX 3060).

Practical Implications

  • Edge‑ready clinical tools: Hospitals with limited compute resources can run DALight‑3D on commodity GPUs or even high‑end CPUs, enabling near‑real‑time tumor delineation during radiology workflows.
  • Rapid prototyping for AI‑augmented radiology: Developers can integrate the model into existing pipelines (e.g., MONAI, NVIDIA Clara) without worrying about massive memory overheads.
  • Multi‑modal flexibility: The identifier‑conditioned normalization makes it straightforward to add or drop MRI sequences (e.g., adding diffusion‑weighted images) without redesigning the network.
  • Potential for other 3‑D tasks: The same lightweight building blocks could be repurposed for lung nodule detection, cardiac segmentation, or even non‑medical volumetric data (e.g., LiDAR point‑cloud processing).

Limitations & Future Work

  • Benchmark scope: Experiments are limited to the MSD brain‑tumor task; performance on other tumor types or imaging modalities remains untested.
  • Training schedule: The reported results use a relatively short 50‑epoch schedule; longer training might narrow the gap between DALight‑3D and larger models further, but the authors did not explore this.
  • Inference latency: While parameter count is lower, the depthwise‑separable 3‑D convolutions are not yet fully optimized in all deep‑learning libraries, which could affect real‑world latency.
  • Future directions: The authors suggest extending the cross‑slice attention to a full 3‑D self‑attention module, exploring mixed‑precision training for even smaller footprints, and validating on multi‑institutional datasets to assess generalization.

Authors

  • Nand Kumar Mishra
  • Dhruv Mishra
  • Dr Manu Pratap Singh

Paper Information

  • arXiv ID: 2605.04518v1
  • Categories: cs.CV, cs.LG, cs.NE
  • Published: May 6, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »

[Paper] Normalizing Trajectory Models

Diffusion-based models decompose sampling into many small Gaussian denoising steps -- an assumption that breaks down when generation is compressed to a few coar...