[Paper] Verifying Machine Learning Interpretability Requirements through Provenance
Source: arXiv - 2604.21599v1
Overview
The paper “Verifying Machine Learning Interpretability Requirements through Provenance” tackles a persistent pain point in ML engineering: how to prove that a model satisfies an interpretability requirement. By treating model‑ and data‑lineage information (provenance) as a measurable artifact, the authors turn an otherwise vague non‑functional requirement into a set of concrete functional checks that can be automated and audited.
Key Contributions
- Provenance‑driven verification framework – a systematic way to capture, store, and query model and data provenance for interpretability checks.
- Mapping interpretability NFRs to quantifiable functional requirements (FRs) – defines concrete metrics (e.g., feature‑importance stability, data‑slice coverage) derived from provenance.
- Tool‑agnostic provenance schema – compatible with popular ML pipelines (TensorFlow, PyTorch, Scikit‑Learn) and version‑control systems (DVC, MLflow).
- Case‑study validation – demonstrates the approach on a real‑world image‑classification model, showing how provenance data can be used to certify compliance with a pre‑defined interpretability policy.
- Guidelines for integrating provenance capture into CI/CD for ML – practical steps for teams to embed verification into existing DevOps workflows.
Methodology
- Define Interpretability Requirements – The authors start by expressing an interpretability NFR (e.g., “the model must provide stable feature attributions across retraining”) in natural language.
- Derive Functional Requirements – Each NFR is broken down into measurable FRs such as:
- Attribution Consistency: variance of SHAP/LIME scores across model versions.
- Data‑Slice Coverage: proportion of training data slices (by label, demographic, etc.) that have associated explanations.
- Capture Provenance – During model development, the pipeline logs:
- Dataset snapshots (hashes, preprocessing steps).
- Model artifacts (architecture, hyper‑parameters, random seeds).
- Explanation artifacts (feature‑importance vectors, saliency maps).
All logs are stored in a queryable provenance store (e.g., a graph database).
- Verification Engine – A lightweight service queries the provenance store, computes the FR metrics, and checks them against thresholds defined in the original NFR.
- Feedback Loop – If a verification fails, the engine surfaces the exact provenance records that caused the violation, enabling developers to pinpoint the root cause (e.g., a data drift episode or a nondeterministic training run).
Results & Findings
- Quantifiable Interpretability – The authors expressed three common interpretability NFRs as FRs with clear numeric thresholds (e.g., attribution variance < 0.05).
- High Detection Rate – In the case study, the verification engine caught 4 out of 5 intentional violations (e.g., removing a preprocessing step that broke feature‑importance stability).
- Low Overhead – Provenance capture added < 7 % runtime overhead and < 12 % storage increase for typical image‑classification pipelines.
- Auditability – The provenance graph allowed a post‑hoc audit that reconstructed exactly which training data and code version produced a given explanation, satisfying an internal compliance audit without extra effort.
Practical Implications
- Regulatory Readiness – Teams building models for regulated domains (healthcare, finance) can now produce evidence that interpretability requirements have been met, easing audits and certifications.
- CI/CD Integration – By plugging the verification engine into existing ML CI pipelines, developers get immediate feedback (“build fails: attribution consistency below threshold”), turning interpretability into a first‑class quality gate.
- Debugging & Root‑Cause Analysis – Provenance records pinpoint the exact data slice or code change that caused an interpretability breach, reducing mean‑time‑to‑resolution for model‑explainability bugs.
- Cross‑Team Collaboration – Data scientists, ML engineers, and product owners can agree on concrete interpretability metrics, aligning expectations and reducing “interpretability” debates to measurable SLAs.
- Reusable Artefacts – The provenance schema is portable across projects, enabling organizations to build a shared “interpretability ledger” that can be queried for compliance across the entire model portfolio.
Limitations & Future Work
- Scope of Interpretability – The framework focuses on post‑hoc explanation methods (SHAP, LIME, saliency maps). It does not yet cover intrinsic interpretability techniques such as rule‑based models or attention‑based explanations.
- Threshold Selection – Determining appropriate numeric thresholds for FRs still requires domain expertise; the paper does not provide an automated way to set them.
- Scalability to Massive Datasets – While overhead is modest for medium‑scale experiments, the authors note that provenance storage could become a bottleneck for petabyte‑scale training pipelines.
- User Study – The paper lacks a formal user study measuring how developers interact with the verification engine in day‑to‑day workflows.
- Future Directions – Planned extensions include:
- Automated threshold calibration using statistical process control.
- Integration with model‑card standards for broader NFR coverage.
- Distributed provenance storage solutions to handle large‑scale production environments.
Authors
- Lynn Vonderhaar
- Juan Couder
- Daryela Cisneros
- Omar Ochoa
Paper Information
- arXiv ID: 2604.21599v1
- Categories: cs.SE, cs.LG
- Published: April 23, 2026
- PDF: Download PDF