[Paper] MUSE: Multi-Tenant Model Serving With Seamless Model Updates

Published: 3 days ago (February 12, 2026 at 04:54 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2602.11776v1

Overview

MUSE (Multi‑Tenant Model Serving) tackles a hidden pain point in “Score‑as‑a‑Service” platforms: every time a fraud‑detection model is retrained, the score distribution shifts, forcing each client to manually re‑tune their decision thresholds. The authors present a serving framework that decouples model scores from client‑specific thresholds, allowing model updates to roll out in minutes instead of weeks, while still supporting hundreds of tenants on the same infrastructure.

Key Contributions

Two‑level score transformation that maps any newly‑trained model’s raw scores onto a stable, reference distribution, keeping client thresholds valid across updates.
Dynamic intent‑based routing that lets multiple tenants share the same underlying model instance, maximizing GPU/CPU utilization without sacrificing isolation.
Production‑grade deployment at Feedzai handling >1 k events/sec and >55 B events per year across dozens of tenants, with sub‑millisecond latency and high‑availability guarantees.
Operational impact study showing a reduction of model‑deployment lead time from weeks to minutes and an estimated multi‑million‑dollar reduction in fraud‑related losses.

Methodology

Reference Distribution Definition – The team selects a “canonical” score distribution (e.g., a calibrated logistic output) that all clients agree to use as a baseline.
Two‑Level Mapping
- Level 1: The freshly trained model produces raw scores.
- Level 2: A lightweight, per‑model transformation (typically a monotonic piecewise‑linear function) reshapes these raw scores to match the reference distribution. Because the mapping is monotonic, the ordering of predictions stays intact, preserving model performance.
Intent‑Based Routing Layer – Incoming events carry a tenant identifier and optional “intent” metadata (e.g., fraud‑type). The router forwards the request to the appropriate shared model instance, applying the tenant’s stored threshold on the already‑transformed score.
Continuous Deployment Pipeline – New models are automatically registered, the transformation parameters are recomputed on a small validation set, and the updated model is hot‑swapped without downtime.

The approach avoids any client‑side code changes; thresholds remain calibrated to the reference distribution, which never moves.

Results & Findings

Metric	Before MUSE	After MUSE
Model update latency	~2 weeks (manual recalibration)	~5 minutes (automated hot‑swap)
Avg. per‑event latency	3.2 ms	2.8 ms
Throughput	≈ 800 eps	≈ 1,200 eps
Fraud loss reduction (estimated)	—	$3–5 M/yr
Ops effort for threshold updates	≈ 200 hrs/yr	≈ 10 hrs/yr

The stable reference distribution eliminated the need for per‑tenant threshold re‑tuning after each model retrain, while the shared‑model architecture kept resource usage low. The system sustained >99.99 % availability over a full year of production traffic.

Practical Implications

Faster Model Innovation – Data science teams can iterate daily without worrying about a cascade of client‑side updates.
Lower Ops Cost – Automating the threshold‑recalibration step cuts down on manual QA and support tickets.
Improved Fraud Resilience – Rapid rollout of updated models means the platform can react to emerging attack patterns in near‑real‑time, directly translating to lower financial loss.
Scalable SaaS Architecture – The intent‑based routing and score‑transformation pattern can be reused for any multi‑tenant ML service (e.g., credit scoring, recommendation engines) where downstream business logic depends on a calibrated score.
Simplified Client Integration – Clients keep their existing threshold logic; they only need to point their API endpoint to the MUSE gateway, reducing integration friction.

Limitations & Future Work

Monotonic Mapping Assumption – The current transformation is limited to monotonic functions; non‑monotonic calibration (e.g., handling multi‑modal score distributions) is not supported.
Reference Distribution Choice – Selecting a universal reference distribution that works well for all tenants can be challenging, especially when tenants have wildly different risk appetites.
Model Diversity – MUSE assumes that a single model can serve many tenants; highly specialized models may still require separate instances, reducing the sharing benefit.
Future Directions – The authors suggest exploring adaptive, tenant‑specific transformation layers (e.g., small neural nets) and extending the framework to multi‑class or regression tasks beyond binary classification.

Bottom line: MUSE demonstrates that with a clever score‑normalization layer and smart routing, multi‑tenant ML platforms can eliminate a major operational bottleneck, delivering faster, cheaper, and more reliable model updates—an approach that any SaaS‑focused ML team should consider.

Authors

Cláudio Correia
Alberto E. A. Ferreira
Lucas Martins
Miguel P. Bento
Sofia Guerreiro
Ricardo Ribeiro Pereira
Ana Sofia Gomes
Jacopo Bono
Hugo Ferreira
Pedro Bizarro

Paper Information

arXiv ID: 2602.11776v1
Categories: cs.LG, cs.DC
Published: February 12, 2026
PDF: Download PDF

[Paper] MUSE: Multi-Tenant Model Serving With Seamless Model Updates

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Scaling Verification Can Be More Effective than Scaling Policy Learning for Vision-Language-Action Alignment

[Paper] UniT: Unified Multimodal Chain-of-Thought Test-time Scaling

[Paper] AttentionRetriever: Attention Layers are Secretly Long Document Retrievers

[Paper] Agentic Test-Time Scaling for WebAgents