[Paper] Verifying Machine Learning Interpretability Requirements through Provenance

Published: 1 day ago (April 23, 2026 at 08:22 AM EDT)

4 min read

Source: arXiv

Source: arXiv - 2604.21599v1

Overview

The paper “Verifying Machine Learning Interpretability Requirements through Provenance” tackles a persistent pain point in ML engineering: how to prove that a model satisfies an interpretability requirement. By treating model‑ and data‑lineage information (provenance) as a measurable artifact, the authors turn an otherwise vague non‑functional requirement into a set of concrete functional checks that can be automated and audited.

Key Contributions

Provenance‑driven verification framework – a systematic way to capture, store, and query model and data provenance for interpretability checks.
Mapping interpretability NFRs to quantifiable functional requirements (FRs) – defines concrete metrics (e.g., feature‑importance stability, data‑slice coverage) derived from provenance.
Tool‑agnostic provenance schema – compatible with popular ML pipelines (TensorFlow, PyTorch, Scikit‑Learn) and version‑control systems (DVC, MLflow).
Case‑study validation – demonstrates the approach on a real‑world image‑classification model, showing how provenance data can be used to certify compliance with a pre‑defined interpretability policy.
Guidelines for integrating provenance capture into CI/CD for ML – practical steps for teams to embed verification into existing DevOps workflows.

Methodology

Define Interpretability Requirements – The authors start by expressing an interpretability NFR (e.g., “the model must provide stable feature attributions across retraining”) in natural language.
Derive Functional Requirements – Each NFR is broken down into measurable FRs such as:
- Attribution Consistency: variance of SHAP/LIME scores across model versions.
- Data‑Slice Coverage: proportion of training data slices (by label, demographic, etc.) that have associated explanations.
Capture Provenance – During model development, the pipeline logs:
- Dataset snapshots (hashes, preprocessing steps).
- Model artifacts (architecture, hyper‑parameters, random seeds).
- Explanation artifacts (feature‑importance vectors, saliency maps).
  All logs are stored in a queryable provenance store (e.g., a graph database).
Verification Engine – A lightweight service queries the provenance store, computes the FR metrics, and checks them against thresholds defined in the original NFR.
Feedback Loop – If a verification fails, the engine surfaces the exact provenance records that caused the violation, enabling developers to pinpoint the root cause (e.g., a data drift episode or a nondeterministic training run).

Results & Findings

Quantifiable Interpretability – The authors expressed three common interpretability NFRs as FRs with clear numeric thresholds (e.g., attribution variance < 0.05).
High Detection Rate – In the case study, the verification engine caught 4 out of 5 intentional violations (e.g., removing a preprocessing step that broke feature‑importance stability).
Low Overhead – Provenance capture added < 7 % runtime overhead and < 12 % storage increase for typical image‑classification pipelines.
Auditability – The provenance graph allowed a post‑hoc audit that reconstructed exactly which training data and code version produced a given explanation, satisfying an internal compliance audit without extra effort.

Practical Implications

Regulatory Readiness – Teams building models for regulated domains (healthcare, finance) can now produce evidence that interpretability requirements have been met, easing audits and certifications.
CI/CD Integration – By plugging the verification engine into existing ML CI pipelines, developers get immediate feedback (“build fails: attribution consistency below threshold”), turning interpretability into a first‑class quality gate.
Debugging & Root‑Cause Analysis – Provenance records pinpoint the exact data slice or code change that caused an interpretability breach, reducing mean‑time‑to‑resolution for model‑explainability bugs.
Cross‑Team Collaboration – Data scientists, ML engineers, and product owners can agree on concrete interpretability metrics, aligning expectations and reducing “interpretability” debates to measurable SLAs.
Reusable Artefacts – The provenance schema is portable across projects, enabling organizations to build a shared “interpretability ledger” that can be queried for compliance across the entire model portfolio.

Limitations & Future Work

Scope of Interpretability – The framework focuses on post‑hoc explanation methods (SHAP, LIME, saliency maps). It does not yet cover intrinsic interpretability techniques such as rule‑based models or attention‑based explanations.
Threshold Selection – Determining appropriate numeric thresholds for FRs still requires domain expertise; the paper does not provide an automated way to set them.
Scalability to Massive Datasets – While overhead is modest for medium‑scale experiments, the authors note that provenance storage could become a bottleneck for petabyte‑scale training pipelines.
User Study – The paper lacks a formal user study measuring how developers interact with the verification engine in day‑to‑day workflows.
Future Directions – Planned extensions include:
1. Automated threshold calibration using statistical process control.
2. Integration with model‑card standards for broader NFR coverage.
3. Distributed provenance storage solutions to handle large‑scale production environments.

Authors

Lynn Vonderhaar
Juan Couder
Daryela Cisneros
Omar Ochoa

Paper Information

arXiv ID: 2604.21599v1
Categories: cs.SE, cs.LG
Published: April 23, 2026
PDF: Download PDF

[Paper] Verifying Machine Learning Interpretability Requirements through Provenance

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Seeing Fast and Slow: Learning the Flow of Time in Videos

[Paper] Temporal Taskification in Streaming Continual Learning: A Source of Evaluation Instability

[Paper] Fine-Tuning Regimes Define Distinct Continual Learning Problems

[Paper] The Sample Complexity of Multicalibration