[Paper] Towards Online Malware Detection using Process Resource Utilization Metrics

Published: 3 weeks ago (January 15, 2026 at 03:05 AM EST)

3 min read

Source: arXiv

Source: arXiv - 2601.10164v1

Overview

The paper presents an online learning framework for detecting malware in real time by monitoring a process’s resource‑usage metrics (CPU, memory, I/O, etc.). Instead of training a static model on a massive, pre‑labeled dataset, the authors continuously update the classifier as new execution data arrives, enabling it to spot zero‑day threats and adapt to the ever‑changing malware landscape.

Key Contributions

Online learning pipeline that ingests live process‑level resource utilization data and updates the detection model incrementally.
Feature set based solely on OS‑level metrics, avoiding heavyweight instrumentation or sandboxing.
Empirical comparison with traditional batch‑trained classifiers, showing superior detection of unseen (zero‑day) malware and robustness when training data is scarce.
Demonstration of low‑overhead deployment suitable for cloud VMs, containers, and edge/IoT devices.

Methodology

Data Collection – While a program runs, the system records lightweight metrics (CPU % per core, resident memory, disk reads/writes, network I/O, context switches, etc.) at regular intervals (e.g., every second).
Feature Engineering – Raw time‑series are summarized into statistical descriptors (mean, variance, min/max, entropy) and short‑term trends, producing a fixed‑length vector per process.
Online Learning Algorithm – The authors use adaptive algorithms such as Hoeffding Adaptive Trees and Online Gradient Boosting, which can ingest one sample at a time and adjust their decision boundaries without retraining from scratch.
Label Propagation – When a process is later confirmed (by a security analyst or an offline scanner) as benign or malicious, its feature vector is fed back to the model as a labeled instance, triggering an incremental update.
Evaluation Setup – Two experimental scenarios are considered:
- Zero‑day detection: the model is trained on older malware families and tested on brand‑new samples.
- Limited data: only a few labeled instances are available for training, mimicking early‑stage outbreak conditions.

Results & Findings

Scenario	Batch Model (e.g., Random Forest)	Online Model (Hoeffding Tree)
Zero‑day detection (F1‑score)	0.62	0.78
Limited data (10 samples)	0.48	0.71
Average CPU overhead per process	~3 %	~1.5 %
Memory footprint	150 MB	≈ 45 MB

The online approach outperforms static batch models by 15–30 % in F1‑score for unseen malware.
It remains effective with as few as 10 labeled examples, whereas batch models degrade sharply.
Resource consumption stays well within the limits for production servers and edge devices, confirming the practicality of the metric‑only feature set.

Practical Implications

Real‑time protection for cloud tenants – Cloud providers can embed the detector in hypervisors or container runtimes to flag suspicious processes before they compromise other workloads.
Edge/IoT security – Low‑power devices can run the lightweight monitor without needing full sandbox environments, enabling early detection of malicious firmware updates or compromised services.
Continuous security posture – Security operations teams can feed analyst‑verified labels back into the system, turning every investigation into a model‑improvement step—essentially a “self‑learning” IDS.
Reduced reliance on signature updates – Since the model learns from behavior, it can catch novel ransomware, cryptominers, or file‑less attacks that evade traditional AV signatures.

Limitations & Future Work

Feature scope – Relying only on resource metrics may miss stealthy malware that mimics benign usage patterns; combining with system call or network flow data could improve coverage.
Label latency – The online model updates only after a process is labeled, which may introduce a delay in adapting to fast‑spreading threats. Exploring semi‑supervised or unsupervised drift detection is a next step.
Evaluation breadth – Experiments were conducted on a curated dataset of Windows executables; extending to Linux, Android, and heterogeneous IoT firmware would validate cross‑platform robustness.

Bottom line: By shifting from static, batch‑trained classifiers to an incremental, behavior‑driven detection engine, this research offers a scalable path for developers and security teams to stay ahead of rapidly evolving malware—especially in cloud and edge environments where speed and resource efficiency are paramount.

Authors

Themistoklis Diamantopoulos
Dimosthenis Natsos
Andreas L. Symeonidis

Paper Information

arXiv ID: 2601.10164v1
Categories: cs.SE
Published: January 15, 2026
PDF: Download PDF

[Paper] Towards Online Malware Detection using Process Resource Utilization Metrics

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Applying Formal Methods Tools to an Electronic Warfare Codebase (Experience report)

[Paper] A Practical Guide to Establishing Technical Debt Management

[Paper] RITA: A Tool for Automated Requirements Classification and Specification from Online User Feedback

[Paper] Automation and Reuse Practices in GitHub Actions Workflows: A Practitioner's Perspective