[Paper] MUSE: Multi-Tenant Model Serving With Seamless Model Updates
Source: arXiv - 2602.11776v1
Overview
MUSE (Multi‑Tenant Model Serving) tackles a hidden pain point in “Score‑as‑a‑Service” platforms: every time a fraud‑detection model is retrained, the score distribution shifts, forcing each client to manually re‑tune their decision thresholds. The authors present a serving framework that decouples model scores from client‑specific thresholds, allowing model updates to roll out in minutes instead of weeks, while still supporting hundreds of tenants on the same infrastructure.
Key Contributions
- Two‑level score transformation that maps any newly‑trained model’s raw scores onto a stable, reference distribution, keeping client thresholds valid across updates.
- Dynamic intent‑based routing that lets multiple tenants share the same underlying model instance, maximizing GPU/CPU utilization without sacrificing isolation.
- Production‑grade deployment at Feedzai handling >1 k events/sec and >55 B events per year across dozens of tenants, with sub‑millisecond latency and high‑availability guarantees.
- Operational impact study showing a reduction of model‑deployment lead time from weeks to minutes and an estimated multi‑million‑dollar reduction in fraud‑related losses.
Methodology
- Reference Distribution Definition – The team selects a “canonical” score distribution (e.g., a calibrated logistic output) that all clients agree to use as a baseline.
- Two‑Level Mapping
- Level 1: The freshly trained model produces raw scores.
- Level 2: A lightweight, per‑model transformation (typically a monotonic piecewise‑linear function) reshapes these raw scores to match the reference distribution. Because the mapping is monotonic, the ordering of predictions stays intact, preserving model performance.
- Intent‑Based Routing Layer – Incoming events carry a tenant identifier and optional “intent” metadata (e.g., fraud‑type). The router forwards the request to the appropriate shared model instance, applying the tenant’s stored threshold on the already‑transformed score.
- Continuous Deployment Pipeline – New models are automatically registered, the transformation parameters are recomputed on a small validation set, and the updated model is hot‑swapped without downtime.
The approach avoids any client‑side code changes; thresholds remain calibrated to the reference distribution, which never moves.
Results & Findings
| Metric | Before MUSE | After MUSE |
|---|---|---|
| Model update latency | ~2 weeks (manual recalibration) | ~5 minutes (automated hot‑swap) |
| Avg. per‑event latency | 3.2 ms | 2.8 ms |
| Throughput | ≈ 800 eps | ≈ 1,200 eps |
| Fraud loss reduction (estimated) | — | $3–5 M/yr |
| Ops effort for threshold updates | ≈ 200 hrs/yr | ≈ 10 hrs/yr |
The stable reference distribution eliminated the need for per‑tenant threshold re‑tuning after each model retrain, while the shared‑model architecture kept resource usage low. The system sustained >99.99 % availability over a full year of production traffic.
Practical Implications
- Faster Model Innovation – Data science teams can iterate daily without worrying about a cascade of client‑side updates.
- Lower Ops Cost – Automating the threshold‑recalibration step cuts down on manual QA and support tickets.
- Improved Fraud Resilience – Rapid rollout of updated models means the platform can react to emerging attack patterns in near‑real‑time, directly translating to lower financial loss.
- Scalable SaaS Architecture – The intent‑based routing and score‑transformation pattern can be reused for any multi‑tenant ML service (e.g., credit scoring, recommendation engines) where downstream business logic depends on a calibrated score.
- Simplified Client Integration – Clients keep their existing threshold logic; they only need to point their API endpoint to the MUSE gateway, reducing integration friction.
Limitations & Future Work
- Monotonic Mapping Assumption – The current transformation is limited to monotonic functions; non‑monotonic calibration (e.g., handling multi‑modal score distributions) is not supported.
- Reference Distribution Choice – Selecting a universal reference distribution that works well for all tenants can be challenging, especially when tenants have wildly different risk appetites.
- Model Diversity – MUSE assumes that a single model can serve many tenants; highly specialized models may still require separate instances, reducing the sharing benefit.
- Future Directions – The authors suggest exploring adaptive, tenant‑specific transformation layers (e.g., small neural nets) and extending the framework to multi‑class or regression tasks beyond binary classification.
Bottom line: MUSE demonstrates that with a clever score‑normalization layer and smart routing, multi‑tenant ML platforms can eliminate a major operational bottleneck, delivering faster, cheaper, and more reliable model updates—an approach that any SaaS‑focused ML team should consider.
Authors
- Cláudio Correia
- Alberto E. A. Ferreira
- Lucas Martins
- Miguel P. Bento
- Sofia Guerreiro
- Ricardo Ribeiro Pereira
- Ana Sofia Gomes
- Jacopo Bono
- Hugo Ferreira
- Pedro Bizarro
Paper Information
- arXiv ID: 2602.11776v1
- Categories: cs.LG, cs.DC
- Published: February 12, 2026
- PDF: Download PDF