[Paper] MHub.ai: A Simple, Standardized, and Reproducible Platform for AI Models in Medical Imaging

Published: (January 15, 2026 at 02:53 AM EST)
4 min read
Source: arXiv

Source: arXiv - 2601.10154v1

Overview

MHub.ai is an open‑source, container‑based platform that packages AI models for medical imaging into a single, reproducible interface. By wrapping peer‑reviewed models in standardized Docker containers that understand DICOM and other clinical formats, the authors aim to eliminate the “model‑integration hell” that currently blocks rapid prototyping, benchmarking, and clinical translation.

Key Contributions

  • Standardized container format for AI models that includes:
    • Unified command‑line/API entry point
    • Built‑in DICOM ingestion and output handling
    • Structured metadata (model provenance, licensing, hardware requirements)
  • Reference data bundles shipped with each model, enabling users to verify that a container runs correctly out‑of‑the‑box.
  • Open‑source library of state‑of‑the‑art models (segmentation, prediction, feature extraction) across multiple imaging modalities (CT, MRI, PET, etc.).
  • Modular framework that lets developers plug in any PyTorch/TensorFlow model with minimal code changes.
  • Transparent benchmarking workflow demonstrated with a side‑by‑side comparison of lung‑segmentation models, complete with publicly released segmentations, metrics, and interactive dashboards.
  • Community‑ready contribution pipeline (GitHub actions, CI/CD) that enforces reproducibility checks before a model is added to the hub.

Methodology

  1. Containerization – Each model is packaged in a Docker image that contains the runtime environment (Python, libraries, GPU drivers) and a thin wrapper script exposing a uniform CLI (mhubl run <model> --input <dicom_dir> --output <out_dir>).
  2. Metadata schema – A JSON‑LD file describes the model’s architecture, training data, evaluation metrics, and required hardware. This schema is validated automatically during CI.
  3. Reference dataset – For every model a small, publicly available DICOM set is bundled. After pulling a container, users run a sanity‑check command that produces known outputs, confirming the container behaves as expected.
  4. Benchmarking pipeline – The authors built a reproducible evaluation script that pulls multiple containers, runs them on the same test cohort, and aggregates Dice scores, inference time, and memory usage. Results are visualized via a Plotly‑based dashboard.
  5. Extensibility – New models are added by providing a Dockerfile, a metadata JSON, and a reference dataset. The CI pipeline builds the image, runs the sanity check, and publishes the container to Docker Hub and the MHub.ai registry.

Results & Findings

  • Reproducibility – All 7 baseline lung‑segmentation models produced identical results on the reference data across three different host machines (Linux, Windows, macOS) and GPU configurations, confirming the container approach eliminates environment drift.
  • Benchmarking – When evaluated on a 200‑case external lung CT cohort, the top‑performing model achieved a mean Dice coefficient of 0.93, while the worst performed at 0.84; inference time varied from 0.8 s to 3.2 s per scan, illustrating the value of side‑by‑side comparison.
  • Developer overhead – Integration time for a new model dropped from an average of 3–5 days (custom scripts, dependency hell) to under 2 hours using the MHub.ai template.
  • Community uptake – Within the first month of release, 12 external research groups forked the repository and contributed 4 additional models, demonstrating the low barrier to entry.

Practical Implications

  • Rapid prototyping – Data scientists can pull a model, run it on local PACS data, and get results without writing any preprocessing code.
  • Consistent benchmarking – Companies developing AI‑assisted radiology tools can benchmark against the same reference implementations, making performance claims more credible.
  • Regulatory friendliness – The embedded metadata and reference data provide an audit trail that aligns with FDA’s “software as a medical device” documentation requirements.
  • Scalable deployment – Because each model lives in its own container, orchestration tools like Kubernetes or AWS Batch can spin up multiple inference workers on demand, simplifying cloud‑native deployment pipelines.
  • Education & training – Medical imaging curricula can use MHub.ai to let students experiment with cutting‑edge models without wrestling with complex environment setups.

Limitations & Future Work

  • Scope of modalities – The current catalog focuses on CT and MRI; extending to ultrasound, pathology slides, or multimodal fusion will require additional format adapters.
  • Performance overhead – Containerization adds a modest (~5 %) runtime penalty compared with bare‑metal execution, which may be non‑trivial for ultra‑low‑latency applications.
  • Model licensing – Some state‑of‑the‑art models have restrictive commercial licenses, limiting their inclusion in the open hub. The authors plan to implement a license‑aware registry that can gate access based on user credentials.
  • Automated validation – Future releases aim to integrate continuous‑learning pipelines that automatically re‑run reference checks when upstream libraries (e.g., PyTorch) are updated.

MHub.ai sets a new baseline for how AI models in medical imaging can be shared, evaluated, and deployed—turning the current “wild west” of ad‑hoc scripts into a reproducible, developer‑friendly ecosystem.

Authors

  • Leonard Nürnberg
  • Dennis Bontempi
  • Suraj Pai
  • Curtis Lisle
  • Steve Pieper
  • Ron Kikinis
  • Sil van de Leemput
  • Rahul Soni
  • Gowtham Murugesan
  • Cosmin Ciausu
  • Miriam Groeneveld
  • Felix J. Dorfner
  • Jue Jiang
  • Aneesh Rangnekar
  • Harini Veeraraghavan
  • Joeran S. Bosma
  • Keno Bressem
  • Raymond Mak
  • Andrey Fedorov
  • Hugo JWL Aerts

Paper Information

  • arXiv ID: 2601.10154v1
  • Categories: cs.AI, cs.CV, cs.ET, cs.LG, cs.SE
  • Published: January 15, 2026
  • PDF: Download PDF
Back to Blog

Related posts

Read more »