Mithridatium: An Open-Source Toolkit for Verifying the Integrity of Pretrained Machine Learning Models

Published: 3 days ago (December 2, 2025 at 09:53 PM EST)

2 min read

Source: Dev.to

Why Mithridatium?

Today’s ML ecosystem assumes that pretrained models are safe. In reality, the model file itself can be a silent attack vector:

poisoned training data
hidden triggers that activate under specific inputs
manipulated weights
malformed checkpoints that cause unexpected runtime behavior

Mithridatium provides a command‑line workflow to evaluate these risks through model‑centric defenses, inspired by academic research but simplified for real‑world use.

Offline Usage

Once installed, Mithridatium can run entirely offline.

You only need:

Your .pth model file
A local dataset directory (optional for STRIP; required for MMBD depending on configuration)

This makes the tool suitable for restricted environments, air‑gapped machines, or secure internal ML pipelines.

Installation

pip install mithridatium

Upgrade to the latest release:

pip install --upgrade mithridatium

Implemented Defenses

MMBD (Maximum Mean Backdoor Detection)

MMBD evaluates synthetic class‑optimized images to detect anomalous activation patterns commonly associated with backdoored models.

Features

per‑class eigenvalue scores
normalized anomaly distributions
classical hypothesis testing (p‑value)
deterministic verdict

Example invocation

mithridatium detect --model model.pth --defense mmbd --arch resnet18 --data cifar10

STRIP (Strong Intentional Perturbation)

STRIP is a black‑box defense that does not rely on internal architectural details. It evaluates prediction entropy when the model is exposed to heavily perturbed variants of the same input. Backdoored models typically exhibit abnormally low entropy under perturbation.

Features

entropy computation on perturbed samples
sampling and perturbation utilities
summary metrics (mean, min, max entropy)
integration into a unified reporting schema

Example invocation

mithridatium detect --defense strip --model model.pth --data cifar10 --arch resnet18

Recent Advancements

STRIP Core Utility – modular implementation handling entropy scoring, perturbation generation, and device‑safe execution (CPU/MPS/CUDA).
CLI Integration – STRIP can now be invoked like MMBD, with unified reporting and JSON output.
Output Schema Normalization – standardizing all defenses toward a single report format for ecosystem integration.
End‑to‑End CLI Tests – full test coverage ensures STRIP runs cleanly through subprocess without crashes.

What’s Next

Improving documentation
Adding developer notes
Refining report summaries
Strengthening validation and error messaging

No new defenses are planned until next year; the focus is on polishing the tool for maintainability and accessibility.

Try it Yourself

The project is open‑source and available here: mithridatium

Contributions, issues, and feedback are welcome.

If you’re working with pretrained models—research, deployment, or security—you should not assume integrity. Mithridatium helps you verify it. Detailed explanations, defense theory, and usage examples are in the repository’s README.

Mithridatium: An Open-Source Toolkit for Verifying the Integrity of Pretrained Machine Learning Models

Why Mithridatium?

Offline Usage

Installation

Implemented Defenses

MMBD (Maximum Mean Backdoor Detection)

STRIP (Strong Intentional Perturbation)

Recent Advancements

What’s Next

Try it Yourself

Related posts

AWS re:Invent 2025 - Beyond web browsers: HITL and tool integration for Nova Agents (AIM3334)

AWS re:Invent 2025 - Zoox: Building Machine Learning Infrastructure for Autonomous Vehicles (AMZ304)

arreglar pinchazos cerca de mi en Alpedrete

AWS re:Invent 2025 - Intelligent security: Protection at scale from development to production-INV214