[Paper] Verifying Machine Learning Interpretability Requirements through Provenance

Published: (April 23, 2026 at 08:22 AM EDT)
4 min read
Source: arXiv

Source: arXiv - 2604.21599v1

Overview

The paper “Verifying Machine Learning Interpretability Requirements through Provenance” tackles a persistent pain point in ML engineering: how to prove that a model satisfies an interpretability requirement. By treating model‑ and data‑lineage information (provenance) as a measurable artifact, the authors turn an otherwise vague non‑functional requirement into a set of concrete functional checks that can be automated and audited.

Key Contributions

  • Provenance‑driven verification framework – a systematic way to capture, store, and query model and data provenance for interpretability checks.
  • Mapping interpretability NFRs to quantifiable functional requirements (FRs) – defines concrete metrics (e.g., feature‑importance stability, data‑slice coverage) derived from provenance.
  • Tool‑agnostic provenance schema – compatible with popular ML pipelines (TensorFlow, PyTorch, Scikit‑Learn) and version‑control systems (DVC, MLflow).
  • Case‑study validation – demonstrates the approach on a real‑world image‑classification model, showing how provenance data can be used to certify compliance with a pre‑defined interpretability policy.
  • Guidelines for integrating provenance capture into CI/CD for ML – practical steps for teams to embed verification into existing DevOps workflows.

Methodology

  1. Define Interpretability Requirements – The authors start by expressing an interpretability NFR (e.g., “the model must provide stable feature attributions across retraining”) in natural language.
  2. Derive Functional Requirements – Each NFR is broken down into measurable FRs such as:
    • Attribution Consistency: variance of SHAP/LIME scores across model versions.
    • Data‑Slice Coverage: proportion of training data slices (by label, demographic, etc.) that have associated explanations.
  3. Capture Provenance – During model development, the pipeline logs:
    • Dataset snapshots (hashes, preprocessing steps).
    • Model artifacts (architecture, hyper‑parameters, random seeds).
    • Explanation artifacts (feature‑importance vectors, saliency maps).
      All logs are stored in a queryable provenance store (e.g., a graph database).
  4. Verification Engine – A lightweight service queries the provenance store, computes the FR metrics, and checks them against thresholds defined in the original NFR.
  5. Feedback Loop – If a verification fails, the engine surfaces the exact provenance records that caused the violation, enabling developers to pinpoint the root cause (e.g., a data drift episode or a nondeterministic training run).

Results & Findings

  • Quantifiable Interpretability – The authors expressed three common interpretability NFRs as FRs with clear numeric thresholds (e.g., attribution variance < 0.05).
  • High Detection Rate – In the case study, the verification engine caught 4 out of 5 intentional violations (e.g., removing a preprocessing step that broke feature‑importance stability).
  • Low Overhead – Provenance capture added < 7 % runtime overhead and < 12 % storage increase for typical image‑classification pipelines.
  • Auditability – The provenance graph allowed a post‑hoc audit that reconstructed exactly which training data and code version produced a given explanation, satisfying an internal compliance audit without extra effort.

Practical Implications

  • Regulatory Readiness – Teams building models for regulated domains (healthcare, finance) can now produce evidence that interpretability requirements have been met, easing audits and certifications.
  • CI/CD Integration – By plugging the verification engine into existing ML CI pipelines, developers get immediate feedback (“build fails: attribution consistency below threshold”), turning interpretability into a first‑class quality gate.
  • Debugging & Root‑Cause Analysis – Provenance records pinpoint the exact data slice or code change that caused an interpretability breach, reducing mean‑time‑to‑resolution for model‑explainability bugs.
  • Cross‑Team Collaboration – Data scientists, ML engineers, and product owners can agree on concrete interpretability metrics, aligning expectations and reducing “interpretability” debates to measurable SLAs.
  • Reusable Artefacts – The provenance schema is portable across projects, enabling organizations to build a shared “interpretability ledger” that can be queried for compliance across the entire model portfolio.

Limitations & Future Work

  • Scope of Interpretability – The framework focuses on post‑hoc explanation methods (SHAP, LIME, saliency maps). It does not yet cover intrinsic interpretability techniques such as rule‑based models or attention‑based explanations.
  • Threshold Selection – Determining appropriate numeric thresholds for FRs still requires domain expertise; the paper does not provide an automated way to set them.
  • Scalability to Massive Datasets – While overhead is modest for medium‑scale experiments, the authors note that provenance storage could become a bottleneck for petabyte‑scale training pipelines.
  • User Study – The paper lacks a formal user study measuring how developers interact with the verification engine in day‑to‑day workflows.
  • Future Directions – Planned extensions include:
    1. Automated threshold calibration using statistical process control.
    2. Integration with model‑card standards for broader NFR coverage.
    3. Distributed provenance storage solutions to handle large‑scale production environments.

Authors

  • Lynn Vonderhaar
  • Juan Couder
  • Daryela Cisneros
  • Omar Ochoa

Paper Information

  • arXiv ID: 2604.21599v1
  • Categories: cs.SE, cs.LG
  • Published: April 23, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »