[Paper] Scalable Explainability-as-a-Service (XaaS) for Edge AI Systems

Published: (February 3, 2026 at 08:28 PM EST)
3 min read
Source: arXiv

Source: arXiv - 2602.04120v1

Overview

The paper introduces Explainability‑as‑a‑Service (XaaS), a new architecture that treats AI explanations as a standalone system service rather than a hard‑wired part of each model. By decoupling inference from explanation generation, XaaS makes it feasible to deliver high‑quality, low‑latency explanations on heterogeneous edge and IoT devices—something that traditional, “coupled” XAI approaches struggle with.

Key Contributions

  • Decoupled XAI service – separates inference and explanation, allowing explanations to be requested, cached, and verified on demand.
  • Distributed explanation cache – uses semantic similarity to retrieve existing explanations, cutting redundant computation across devices.
  • Lightweight verification protocol – guarantees that cached or newly generated explanations faithfully reflect the underlying model’s reasoning.
  • Adaptive explanation engine – dynamically selects the most suitable explanation method based on device resources and user needs.
  • Real‑world validation – demonstrated on manufacturing QC, autonomous vehicle perception, and healthcare diagnostics, achieving ~38 % latency reduction while preserving explanation quality.

Methodology

  1. Service‑Oriented Architecture – Edge nodes run a thin inference client that forwards explanation requests to a central XaaS layer.
  2. Semantic Similarity Retrieval – When an explanation is needed, XaaS first checks a distributed cache. If a semantically similar input already has an explanation, it is reused, avoiding re‑running expensive XAI algorithms.
  3. Verification Protocol – A lightweight checksum‑style check validates that a cached explanation still aligns with the current model parameters, ensuring trustworthiness.
  4. Adaptive Engine – Based on CPU, memory, and latency budgets, the engine picks from a toolbox of XAI techniques (e.g., SHAP, LIME, Grad‑CAM) to generate explanations that fit the device’s constraints.
  5. Evaluation – The authors deployed the system on three edge‑AI scenarios, measuring latency, bandwidth, and explanation fidelity against baseline coupled XAI implementations.

Results & Findings

ScenarioLatency ReductionExplanation Fidelity*Cache Hit Rate
Manufacturing QC40 %0.92 (vs. 0.94 baseline)68 %
Autonomous Vehicles35 %0.90 (vs. 0.93 baseline)62 %
Healthcare Diagnostics39 %0.91 (vs. 0.95 baseline)71 %

*Fidelity measured by agreement between explanations and ground‑truth model reasoning.

The results show that XaaS consistently cuts inference‑plus‑explanation latency by roughly a third while keeping explanation quality within a few percentage points of the best‑in‑class coupled methods. The cache dramatically reduces duplicate work, especially in repetitive edge workloads (e.g., similar sensor frames).

Practical Implications

  • Scalable Edge Deployments – Companies can roll out AI models to thousands of sensors or edge gateways without worrying that each node must run heavyweight XAI algorithms locally.
  • Resource‑Constrained Devices – Low‑power microcontrollers can still obtain explanations because the heavy lifting is off‑loaded to the XaaS layer, with only lightweight verification done locally.
  • Regulatory Compliance – Industries with strict audit requirements (medical devices, autonomous driving) can meet explainability mandates without sacrificing real‑time performance.
  • Developer Productivity – Teams can plug any existing model into the XaaS API and instantly gain a suite of explanation options, reducing the need to re‑engineer XAI pipelines for each new edge use case.
  • Cost Savings – By reusing explanations from the cache, network bandwidth and compute costs are lowered, which is especially valuable in bandwidth‑limited IoT deployments.

Limitations & Future Work

  • Cache Staleness – If the underlying model is updated frequently, cached explanations may become outdated faster than the verification protocol can detect, requiring more aggressive cache invalidation strategies.
  • Security & Privacy – Transmitting raw inputs to a central explanation service could expose sensitive data; the authors note the need for encrypted or federated explanation mechanisms.
  • Method Selection Overhead – While the adaptive engine reduces runtime cost, the decision logic itself adds a small overhead that may be noticeable on ultra‑low‑power nodes.
  • Future Directions – The authors plan to explore on‑device learning of similarity metrics, integrate differential privacy into the explanation pipeline, and extend XaaS to support multimodal models (e.g., audio‑visual AI).

Authors

  • Samaresh Kumar Singh
  • Joyjit Roy

Paper Information

  • arXiv ID: 2602.04120v1
  • Categories: cs.LG, cs.AI, cs.DC, cs.SE
  • Published: February 4, 2026
  • PDF: Download PDF
Back to Blog

Related posts

Read more »