[Paper] Promoting Simple Agents: Ensemble Methods for Event-Log Prediction

Published: (April 23, 2026 at 08:49 AM EDT)
4 min read
Source: arXiv

Source: arXiv - 2604.21629v1

Overview

The paper “Promoting Simple Agents: Ensemble Methods for Event‑Log Prediction” pits classic, lightweight n‑gram automata against heavyweight neural networks (LSTM, Transformer) for the task of next‑activity prediction in streaming event logs. The authors show that, with the right context window, n‑grams can match neural‑model accuracy while using a fraction of the compute and memory, and they introduce a novel “promotion” ensemble that keeps the inference cost low.

Key Contributions

  • Empirical head‑to‑head comparison of n‑gram automata vs. LSTM/Transformer on synthetic patterns and five real‑world process‑mining datasets.
  • Demonstration of stability: n‑grams deliver consistent accuracy across runs, whereas windowed neural models exhibit volatile performance.
  • Ensemble baseline: classic voting ensembles improve n‑gram accuracy but inflate runtime memory and latency.
  • Promotion algorithm: a dynamic, two‑model selector that switches between the best‑performing agent at inference time, cutting overhead while preserving (or improving) predictive quality.
  • Resource‑efficiency analysis: quantifies CPU, GPU, and memory savings of n‑gram‑based ensembles compared with non‑windowed neural baselines.

Methodology

  1. Data preparation – Event logs are treated as sequences of activity symbols. Synthetic logs encode known patterns (e.g., loops, parallel branches) to stress‑test models; five public process‑mining logs provide realistic workloads.
  2. Model families
    • n‑gram automata: simple Markov‑style predictors that look back a fixed number k of activities (the context window).
    • Neural baselines: LSTM and Transformer architectures, both with and without sliding windows to limit sequence length.
  3. Training & evaluation – Models are trained on the first 70 % of each log and evaluated on the remaining 30 % using standard next‑activity accuracy. Multiple random seeds ensure statistical robustness.
  4. Ensembling
    • Voting: all candidate models predict; the majority vote decides the next activity.
    • Promotion: during inference a lightweight controller monitors recent prediction confidence and dynamically promotes the currently better‑performing model, keeping only two agents active at any moment.
  5. Resource measurement – CPU cycles, GPU utilization, memory footprint, and inference latency are logged for each configuration.

Results & Findings

Model / EnsembleAccuracy (avg.)CPU %GPU %Memory (MB)Latency (ms)
n‑gram (k=4)78.2 %120451.8
LSTM (full seq)79.0 %35206207.4
Transformer (full)80.1 %40308509.1
Voting (5 × n‑gram)80.5 %5502105.2
Promotion (2 × n‑gram)80.3 %280952.9
  • Accuracy parity: n‑grams with a context window of 4–5 achieve within 1 % of the best neural model on all real‑world logs.
  • Stability: standard deviation of accuracy across seeds is <0.3 % for n‑grams vs. >1.2 % for windowed LSTMs.
  • Efficiency: the promotion ensemble cuts memory usage by ~55 % and latency by ~60 % compared with a voting ensemble, while still beating non‑windowed neural baselines.

Practical Implications

  • Fast, low‑cost prediction services – Deploying n‑gram‑based predictors on edge devices or serverless functions becomes feasible; you can serve next‑activity recommendations with sub‑3 ms latency without GPU acceleration.
  • Scalable process‑mining pipelines – Organizations can ingest high‑velocity event streams (e.g., IoT telemetry, business workflow logs) and run real‑time analytics on commodity hardware.
  • Simplified model maintenance – n‑grams are interpretable (they are essentially lookup tables) and can be retrained instantly when new activity types appear, unlike deep nets that require costly re‑training.
  • Hybrid ensemble strategy – The promotion algorithm offers a blueprint for “smart” ensembles that balance accuracy and resource budgets, useful for any streaming prediction task (e.g., recommendation, anomaly detection).

Limitations & Future Work

  • Context‑window sensitivity – n‑gram performance hinges on selecting an appropriate k; the paper uses a grid search, but an automated adaptation mechanism is not explored.
  • Complex temporal dependencies – Very long‑range dependencies (e.g., patterns spanning dozens of steps) remain better captured by Transformers; the promotion scheme currently only switches between two simple agents.
  • Domain generality – Experiments focus on process‑mining logs; applicability to other sequential domains (e.g., natural language, clickstreams) needs validation.
  • Dynamic promotion criteria – The current confidence‑based selector is heuristic; future work could integrate reinforcement learning to learn optimal switching policies.

Authors

  • Benedikt Bollig
  • Matthias Függer
  • Thomas Nowak
  • Paul Zeinaty

Paper Information

  • arXiv ID: 2604.21629v1
  • Categories: cs.LG, cs.AI, cs.DC, cs.FL
  • Published: April 23, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »