[Paper] Malicious ML Model Detection by Learning Dynamic Behaviors
Source: arXiv - 2604.19438v1
Overview
Pre‑trained machine learning models (PTMs) are shared widely through model hubs such as Hugging Face, often as serialized objects (e.g., Python pickles). While this convenience fuels rapid development, it also opens a supply‑chain attack surface: a malicious model can execute arbitrary code the moment it is loaded. Existing scanners (e.g., PickleScan) focus on static signatures or heuristics and ignore what the model actually does at runtime, leading to false negatives and false positives. The paper introduces DynaHug, a detection framework that learns the dynamic execution patterns of benign PTMs and flags deviations as potentially malicious.
Key Contributions
- Dynamic‑behavior‑based detection: First system to model runtime characteristics of PTMs rather than relying solely on static inspection.
- One‑Class SVM classifier: Trains an OCSVM on benign execution traces, enabling detection of out‑of‑distribution (i.e., malicious) behaviors without needing labeled malware samples.
- Large‑scale empirical evaluation: Tested on >25 k models (both benign and malicious) from Hugging Face and the MalHug repository.
- Performance boost: Achieves up to 44 % higher F1‑score compared with state‑of‑the‑art static, dynamic, and LLM‑based detectors.
- Ablation study: Demonstrates that each design choice—dynamic tracing, OCSVM, and clustering of model families—contributes measurably to overall effectiveness.
Methodology
- Data Collection – Gather a corpus of task‑specific PTMs (e.g., text classification, image captioning). For each model, record a dynamic trace during a typical inference run: system calls, file accesses, network activity, CPU/memory usage, and Python‑level API calls.
- Feature Engineering – Convert raw traces into a fixed‑length feature vector (e.g., histogram of syscall frequencies, timing statistics, sandbox‑exit codes).
- Model Training – Use only the benign traces to train a One‑Class Support Vector Machine (OCSVM), which learns the boundary of normal behavior.
- Clustering – Group models by task/domain and train separate OCSVMs per cluster, reducing noise from heterogeneous workloads.
- Detection – When a new PTM is loaded, the same tracing pipeline runs in a lightweight sandbox; the resulting feature vector is fed to the appropriate OCSVM. If the sample lies outside the learned boundary, it is flagged as suspicious.
- Evaluation – Compare DynaHug’s predictions against ground‑truth labels (benign vs. malicious) and against baselines (PickleScan, static code analyzers, LLM‑based classifiers).
Results & Findings
| Metric | DynaHug | Best Baseline |
|---|---|---|
| F1‑score | 0.92 (up to 44 % gain) | 0.64 – 0.71 |
| Precision | 0.90 | 0.58 – 0.68 |
| Recall | 0.94 | 0.61 – 0.73 |
| False‑Positive Rate | 3 % | 12 % – 18 % |
- Robustness across tasks: Separate OCSVMs per cluster kept detection accuracy high even when models differed dramatically (e.g., NLP vs. CV).
- Low overhead: Dynamic tracing added ~150 ms per inference on average, acceptable for a pre‑deployment security check.
- Ablation insights: Removing clustering dropped F1 by ~7 %; swapping OCSVM for a binary classifier (trained on both benign and malicious data) reduced recall by ~10 %, confirming the value of one‑class learning.
Practical Implications
- Supply‑chain hardening: Developers can integrate DynaHug into CI/CD pipelines for model ingestion, automatically vetting third‑party PTMs before they touch production environments.
- Sandbox‑as‑a‑service: Cloud providers could expose DynaHug as a managed API, offering “model safety scores” alongside model metadata on hubs.
- Compliance & Auditing: Organizations subject to security standards (e.g., ISO 27001, NIST 800‑53) can use dynamic behavior reports as evidence of due‑diligence in model procurement.
- Developer ergonomics: Because DynaHug works on generic runtime traces, it requires no changes to model code or format—just a short execution in a controlled environment.
Limitations & Future Work
- Coverage of exotic environments: The current tracing setup targets typical Python‑based inference; models running in other runtimes (e.g., TensorFlow C++, ONNX) need separate instrumentation.
- Evasion potential: An attacker could craft a model that mimics benign traces during the short sandbox run but activates malicious payload later; future work will explore longer‑duration or multi‑stage monitoring.
- Label scarcity for malicious samples: While OCSVM mitigates the need for many malware examples, a richer malicious dataset could enable hybrid detectors that combine one‑class and supervised signals.
- Scalability of clustering: As the number of task domains grows, maintaining per‑cluster classifiers may become cumbersome; automated clustering and model‑type inference are slated for follow‑up research.
Authors
- Sarang Nambiar
- Dhruv Pradhan
- Ezekiel Soremekun
Paper Information
- arXiv ID: 2604.19438v1
- Categories: cs.CR, cs.SE
- Published: April 21, 2026
- PDF: Download PDF