[Paper] AIBoMGen: Generating an AI Bill of Materials for Secure, Transparent, and Compliant Model Training

Published: 1 month ago (January 9, 2026 at 05:46 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2601.05703v1

Overview

The paper introduces AIBoMGen, a prototype platform that automatically creates a cryptographically‑signed AI Bill of Materials (AIBOM) for every model‑training run. By capturing datasets, model hyper‑parameters, code versions, and the exact compute environment, AIBoMGen gives developers a tamper‑evident record that can be used to prove compliance with emerging AI regulations such as the EU AI Act.

Key Contributions

AIBOM Specification – Extends the well‑known Software Bill of Materials (SBOM) to cover AI‑specific artifacts (training data, model weights, preprocessing pipelines, hardware details).
Automated Generation Pipeline – AIBoMGen hooks into the training workflow and produces a signed AIBOM without manual effort.
Root‑of‑Trust Architecture – The training platform acts as a neutral third‑party observer, using cryptographic hashes, digital signatures, and in‑toto attestations to guarantee integrity.
Tamper‑Detection Guarantees – Demonstrates that any post‑training modification of model files, data, or environment metadata is reliably detected.
Negligible Overhead – Empirical evaluation shows < 2 % runtime impact, making the approach practical for large‑scale training pipelines.

Methodology

Instrumentation Layer – A lightweight agent is attached to the training orchestrator (e.g., Kubernetes, Airflow). It records:
- Input datasets (hashes, provenance URLs)
- Code repository commits and dependency manifests
- Hyper‑parameters, model architecture, and training scripts
- Runtime environment (OS, driver versions, GPU/CPU specs)
Artifact Hashing & Collection – Each captured artifact is hashed (SHA‑256) and stored in a temporary ledger.
In‑toto Attestation – The collected hashes are wrapped in an in‑toto statement, which includes a cryptographic signature from the platform’s private key (the “root of trust”).
AIBOM Assembly – The attestation, together with a human‑readable JSON/YAML manifest, forms the final AIBOM.
Verification API – Downstream consumers (model registries, auditors, CI pipelines) can fetch the AIBOM and verify signatures and hashes against the actual artifacts, ensuring nothing was altered after training.

The whole flow is triggered automatically for every training job, requiring no extra steps from data scientists.

Results & Findings

Metric	Observation
Tamper detection	All simulated attacks (weight file replacement, dataset substitution, environment downgrade) were flagged by the verification step.
Performance overhead	Average added latency = 1.7 % (≈ 2 seconds per hour‑long training job).
Signature verification time	Sub‑millisecond on a standard CPU, negligible for CI pipelines.
Scalability	Tested on 50 concurrent training jobs across 4 GPU nodes; AIBOM generation remained stable with linear resource usage.

These results indicate that AIBoMGen can be deployed in production‑grade ML pipelines without sacrificing speed, while providing strong guarantees against artifact tampering.

Practical Implications

Regulatory Compliance – Companies can produce auditable evidence that their models were trained on approved data and under controlled environments, easing EU AI Act reporting.
Supply‑Chain Security – Just as SBOMs help secure software supply chains, AIBOMs expose hidden dependencies (e.g., third‑party datasets) that could be a source of bias or malicious data poisoning.
Model Marketplace Trust – Vendors can attach a signed AIBOM to every model they sell, giving buyers confidence that the model hasn’t been altered post‑delivery.
CI/CD Integration – The verification API can be plugged into existing MLOps pipelines (GitHub Actions, GitLab CI, Jenkins) to automatically reject builds that fail AIBOM checks.
Incident Response – In the event of a breach, the AIBOM provides a forensic snapshot of exactly what was used to create the compromised model, speeding root‑cause analysis.

Limitations & Future Work

Scope of Captured Artifacts – The current prototype focuses on static artifacts; dynamic runtime behaviors (e.g., on‑the‑fly data augmentation) are not fully captured.
Key Management – The system assumes a secure, centrally‑managed signing key; a distributed key‑rotation strategy would be needed for large enterprises.
Interoperability Standards – While the authors propose a JSON schema, broader industry adoption will require alignment with emerging standards bodies (e.g., SPDX, OpenChain).
Extending to Inference – Future work could generate AI Bill of Materials for Inference (AIBOM‑I) that records model serving environment, request‑time preprocessing, and post‑processing steps.

Overall, AIBoMGen offers a concrete, low‑overhead path toward transparent and secure AI model lifecycles—an essential building block as AI moves from research labs into regulated production environments.

Authors

Wiebe Vandendriessche
Jordi Thijsman
Laurens D’hooge
Bruno Volckaert
Merlijn Sebrechts

Paper Information

arXiv ID: 2601.05703v1
Categories: cs.SE, cs.AI, cs.CR
Published: January 9, 2026
PDF: Download PDF

[Paper] AIBoMGen: Generating an AI Bill of Materials for Secure, Transparent, and Compliant Model Training

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Manifold limit for the training of shallow graph convolutional neural networks

[Paper] AdaFuse: Adaptive Ensemble Decoding with Test-Time Scaling for LLMs

[Paper] LookAroundNet: Extending Temporal Context with Transformers for Clinically Viable EEG Seizure Detection

[Paper] Detecting Stochasticity in Discrete Signals via Nonparametric Excursion Theorem