[Paper] Skewness-Guided Pruning of Multimodal Swin Transformers for Federated Skin Lesion Classification on Edge Devices
Source: arXiv - 2512.08751v1
Overview
A new study tackles two hot topics in AI‑driven healthcare: privacy‑preserving federated learning and model compression for edge devices. By introducing a skewness‑guided pruning technique for multimodal Swin Transformers, the authors show that skin‑lesion classifiers can be shrunk by more than a third without sacrificing diagnostic accuracy, opening the door to on‑device dermatology assistants that respect patient data confidentiality.
Key Contributions
- Skewness‑based pruning criterion: Uses the statistical skewness of attention and MLP layer outputs to decide which heads/neurons to drop, a novel, data‑driven way to trim Transformers.
- Multimodal Swin Transformer adaptation: Extends the Swin architecture to fuse visual skin images with auxiliary clinical metadata (e.g., patient age, lesion location).
- Federated learning integration: Implements the pruning pipeline within a horizontal FL setup, allowing hospitals or clinics to collaboratively train a shared model while keeping raw images local.
- Edge‑ready compression: Achieves ~36 % reduction in model size (and corresponding FLOPs) on a compact Swin variant, with no measurable drop in classification accuracy on standard skin‑lesion benchmarks.
- Comprehensive evaluation: Provides ablation studies comparing skewness‑guided pruning against magnitude‑based and random pruning, demonstrating superior trade‑offs.
Methodology
- Base architecture – A compact Swin Transformer processes dermoscopic images; a parallel MLP ingests patient‑level metadata. The two streams are merged before the final classifier.
- Collecting activation statistics – After a few federated rounds, each client records the output distribution of every Multi‑Head Self‑Attention (MHSA) head and each MLP neuron across a validation batch.
- Computing skewness – For each distribution, the third standardized moment (skewness) is calculated. Heads/neurons with low absolute skewness are deemed less informative (their outputs are near‑symmetric and thus less discriminative).
- Pruning decision – A global pruning budget (e.g., 30 % of heads, 20 % of MLP units) is allocated to the lowest‑skewness components. The remaining sub‑network is re‑initialized and fine‑tuned locally.
- Federated training loop – Standard FedAvg aggregates the pruned sub‑models from all clients. Because the pruning mask is identical across participants, model compatibility is preserved.
- Edge deployment – The final compressed model is exported to ONNX/TFLite for inference on smartphones or dedicated medical edge hardware.
Results & Findings
| Metric | Unpruned Swin (baseline) | Skewness‑pruned Swin |
|---|---|---|
| Model size | 48 MB | 31 MB (≈ 36 % reduction) |
| FLOPs (per image) | 2.1 G | 1.4 G (≈ 33 % drop) |
| Accuracy (AUROC) on ISIC‑2018 | 0.923 | 0.923 (±0.001) |
| Sensitivity @ 95 % specificity | 0.78 | 0.78 |
| Communication overhead (per round) | 48 MB | 31 MB |
- No accuracy loss: The AUROC remained statistically unchanged despite the size cut‑back.
- Better than baselines: Random pruning caused a 2–3 % AUROC drop; magnitude‑based pruning saved only ~20 % of parameters before performance degraded.
- Stable convergence: The federated training curve for the pruned model matched the unpruned one after ~10 communication rounds.
Practical Implications
- On‑device dermatology assistants: Clinics can run a high‑perform skin‑lesion classifier on a smartphone or a low‑power edge gateway, enabling real‑time triage without internet connectivity.
- Reduced bandwidth & storage: Smaller model checkpoints mean faster OTA updates and lower data‑plan costs for remote health workers.
- Privacy‑first collaboration: Hospitals can jointly improve their AI without ever moving patient images, complying with GDPR/HIPAA constraints.
- Generalizable recipe: The skewness‑guided pruning framework can be plugged into other Transformer‑based vision models (e.g., ViT, DeiT) and other modalities (e.g., radiology, pathology).
Limitations & Future Work
- Skewness stability: The metric is computed on a limited validation set; noisy estimates could lead to sub‑optimal pruning in highly heterogeneous data environments.
- Static pruning mask: Once the mask is set, it remains fixed for the rest of training. Adaptive or iterative pruning could capture evolving feature importance.
- Edge hardware variance: The study evaluated inference on a single ARM‑based platform; performance on ultra‑low‑power microcontrollers remains untested.
- Broader clinical validation: Experiments were limited to public ISIC datasets; prospective trials on multi‑center clinical data are needed to confirm real‑world robustness.
Bottom line: By marrying a clever statistical pruning rule with federated learning, this work demonstrates that state‑of‑the‑art multimodal Transformers can be made lightweight enough for edge deployment while preserving patient privacy—a promising step toward AI‑augmented dermatology in the field.
Authors
- Kuniko Paxton
- Koorosh Aslansefat
- Dhavalkumar Thakker
- Yiannis Papadopoulos
Paper Information
- arXiv ID: 2512.08751v1
- Categories: cs.CV, cs.DC
- Published: December 9, 2025
- PDF: Download PDF