Mithridatium: An Open-Source Toolkit for Verifying the Integrity of Pretrained Machine Learning Models
Source: Dev.to
Why Mithridatium?
Today’s ML ecosystem assumes that pretrained models are safe. In reality, the model file itself can be a silent attack vector:
- poisoned training data
- hidden triggers that activate under specific inputs
- manipulated weights
- malformed checkpoints that cause unexpected runtime behavior
Mithridatium provides a command‑line workflow to evaluate these risks through model‑centric defenses, inspired by academic research but simplified for real‑world use.
Offline Usage
Once installed, Mithridatium can run entirely offline.
You only need:
- Your
.pthmodel file - A local dataset directory (optional for STRIP; required for MMBD depending on configuration)
This makes the tool suitable for restricted environments, air‑gapped machines, or secure internal ML pipelines.
Installation
pip install mithridatium
Upgrade to the latest release:
pip install --upgrade mithridatium
Implemented Defenses
MMBD (Maximum Mean Backdoor Detection)
MMBD evaluates synthetic class‑optimized images to detect anomalous activation patterns commonly associated with backdoored models.
Features
- per‑class eigenvalue scores
- normalized anomaly distributions
- classical hypothesis testing (p‑value)
- deterministic verdict
Example invocation
mithridatium detect --model model.pth --defense mmbd --arch resnet18 --data cifar10
STRIP (Strong Intentional Perturbation)
STRIP is a black‑box defense that does not rely on internal architectural details. It evaluates prediction entropy when the model is exposed to heavily perturbed variants of the same input. Backdoored models typically exhibit abnormally low entropy under perturbation.
Features
- entropy computation on perturbed samples
- sampling and perturbation utilities
- summary metrics (mean, min, max entropy)
- integration into a unified reporting schema
Example invocation
mithridatium detect --defense strip --model model.pth --data cifar10 --arch resnet18
Recent Advancements
- STRIP Core Utility – modular implementation handling entropy scoring, perturbation generation, and device‑safe execution (CPU/MPS/CUDA).
- CLI Integration – STRIP can now be invoked like MMBD, with unified reporting and JSON output.
- Output Schema Normalization – standardizing all defenses toward a single report format for ecosystem integration.
- End‑to‑End CLI Tests – full test coverage ensures STRIP runs cleanly through subprocess without crashes.
What’s Next
- Improving documentation
- Adding developer notes
- Refining report summaries
- Strengthening validation and error messaging
No new defenses are planned until next year; the focus is on polishing the tool for maintainability and accessibility.
Try it Yourself
The project is open‑source and available here: mithridatium
Contributions, issues, and feedback are welcome.
If you’re working with pretrained models—research, deployment, or security—you should not assume integrity. Mithridatium helps you verify it. Detailed explanations, defense theory, and usage examples are in the repository’s README.