[Paper] Temporal Tokenization Strategies for Event Sequence Modeling with Large Language Models
Source: arXiv - 2512.13618v1
Overview
The paper Temporal Tokenization Strategies for Event Sequence Modeling with Large Language Models investigates how best to feed time information into LLMs that are fine‑tuned on event‑stream data (e.g., logs, sensor readings, user actions). By systematically comparing five different ways of turning timestamps into tokens, the authors show that the “right” representation depends on the statistical shape of the underlying time gaps, rather than there being a one‑size‑fits‑all solution.
Key Contributions
- First large‑scale empirical comparison of temporal tokenization methods for LLM‑based sequence prediction.
- Five distinct encodings evaluated:
- Naïve numeric strings (e.g.,
"1623456789"). - High‑precision byte‑level representations (binary‑packed scalars).
- Human‑semantic calendar tokens (e.g.,
"Mon 09:45"). - Uniform binning (fixed‑width time buckets).
- Adaptive residual scalar quantization (dynamic bins + residual bits).
- Naïve numeric strings (e.g.,
- Dataset suite covering diverse temporal distributions: smooth log‑normal inter‑arrival times, heavy‑tailed spikes, periodic calendar‑driven patterns, and mixed‑modality streams.
- Guidelines for matching tokenization to data characteristics, highlighting when log‑based encodings or human‑readable tokens outperform others.
- Open‑source benchmark code and tokenizers to enable reproducibility and rapid experimentation.
Methodology
- Data preparation – The authors curated four real‑world event streams (e‑commerce click logs, IoT sensor alerts, system audit trails, and calendar‑driven meeting records). Each dataset was annotated with precise timestamps and split into training/validation/test folds.
- Tokenization pipelines – For each of the five strategies, timestamps were transformed into token sequences compatible with the base LLM’s vocabulary (a 30‑k token GPT‑NeoX model).
- Numeric strings were simply cast to decimal text.
- Byte‑level used little‑endian 64‑bit IEEE‑754 floats, then split into individual bytes.
- Calendar tokens mapped timestamps to discrete tokens like
"<MON>","<09:00>","<PM>". - Uniform binning divided the timeline into equal‑width intervals (e.g., 5‑minute bins) and replaced each timestamp with its bin index.
- Adaptive residual quantization first selected a coarse bin via k‑means on inter‑arrival times, then encoded the residual with a small fixed‑point suffix.
- Fine‑tuning – All tokenized streams were used to fine‑tune the same LLM architecture (12‑layer decoder, 768‑dim hidden size) for next‑event prediction. Training hyper‑parameters were held constant across experiments to isolate the effect of tokenization.
- Evaluation metrics – Predictive accuracy (top‑1/5), negative log‑likelihood, and calibration error were reported. Additionally, token‑level efficiency (average tokens per event) and inference latency were measured.
- Statistical analysis – Paired bootstrap tests assessed significance, while correlation analyses linked distribution skewness/kurtosis to the relative performance of each encoding.
Results & Findings
| Encoding | Best‑performing dataset | Accuracy Δ vs. baseline* | Tokens per event | Inference overhead |
|---|---|---|---|---|
| Numeric strings | Uniform‑bin dataset | +1.2 % | 12 | negligible |
| Byte‑level | High‑frequency IoT spikes | +3.8 % | 9 | +12 ms |
| Calendar tokens | Mixed‑modality calendar logs | +2.5 % | 8 | negligible |
| Uniform binning | Smooth log‑normal logs | +0.9 % | 6 | fastest |
| Adaptive residual quantization | Heavy‑tailed spiky data | +5.4 % | 7 | +5 ms |
*Baseline = naive numeric strings on the same dataset.
- No universal winner – Adaptive residual quantization shines on highly skewed, bursty streams, while human‑semantic calendar tokens are robust when the data contains periodic, human‑oriented patterns.
- Token efficiency matters – Strategies that compress timestamps into fewer tokens (uniform binning, calendar tokens) reduce latency without sacrificing accuracy on well‑behaved distributions.
- Alignment with distribution – A simple statistical check (e.g., skewness > 2) can predict when adaptive quantization will outperform simpler schemes.
Practical Implications
- LLM‑powered log analytics – Engineers can swap in a calendar‑tokenizer for system logs that contain business‑hour patterns, gaining a modest accuracy bump without extra compute.
- Edge‑device forecasting – For IoT deployments with bursty sensor spikes, using byte‑level or adaptive residual encodings can improve prediction quality while keeping model size unchanged.
- Rapid prototyping – The open‑source tokenizers let developers experiment with a “plug‑and‑play” approach: run a quick distribution analysis on a new event stream, then select the matching encoding per the paper’s guidelines.
- Cost‑aware inference – Fewer tokens per event translate directly into lower API usage fees on hosted LLM services; uniform binning or calendar tokens are attractive when latency or cost is a primary concern.
Limitations & Future Work
- Model scale – Experiments were limited to a 12‑layer, 770 M‑parameter decoder; results may shift with larger, instruction‑tuned LLMs.
- Single‑modal focus – The study only examined timestamp + categorical event payloads; multimodal streams (e.g., text + time) were not explored.
- Static tokenizers – All encodings were fixed after preprocessing; dynamic, context‑aware tokenization (e.g., learned embeddings for time) remains an open avenue.
- Real‑time adaptation – Future work could investigate online adjustment of quantization bins as the temporal distribution drifts in production environments.
Bottom line: Choosing the right temporal tokenization is as important as model architecture when building LLM‑driven event predictors. By matching the encoding to the data’s time‑distribution, developers can squeeze out measurable gains in accuracy, efficiency, and cost.
Authors
- Zefang Liu
- Nam Nguyen
- Yinzhu Quan
- Austin Zhang
Paper Information
- arXiv ID: 2512.13618v1
- Categories: cs.CL, cs.LG
- Published: December 15, 2025
- PDF: Download PDF