Parsing 2 GiB/s of AI token logs with Rust + simd-json
Source: Dev.to
The Problem
I use Claude Code, Codex CLI, and Gemini CLI daily. One day I checked my API bill — it was way higher than expected, but I had no idea where the tokens were going.
Existing tracking tools were too slow. Scanning my 3 GB of session files (9,000+ files across three CLIs) took over 40 seconds. I wanted something instant.
So I built toktrack — a terminal‑native token usage tracker that parses everything locally at 2 GiB/s.
The Data
Each AI CLI stores session data differently:
| CLI | Location | Format |
|---|---|---|
| Claude Code | ~/.claude/projects/**/*.jsonl | JSONL, per‑message usage |
| Codex CLI | ~/.codex/sessions/**/*.jsonl | JSONL, cumulative counters |
| Gemini CLI | ~/.gemini/tmp/*/chats/*.json | JSON, includes thinking_tokens |
A single Claude Code session file can look like this:
{
"timestamp":"2026-01-15T10:00:00Z",
"message":{
"model":"claude-sonnet-4-20250514",
"usage":{
"input_tokens":12000,
"output_tokens":3500,
"cache_read_input_tokens":8000,
"cache_creation_input_tokens":2000
}
},
"costUSD":0.042
}
Multiply this by thousands of sessions over months, and you’re looking at gigabytes of JSONL to parse.
Why simd-json
Standard serde_json is good, but when parsing 3 GB of line‑delimited JSON, every microsecond per line adds up.
simd-json is a Rust port of simdjson that uses SIMD instructions (AVX2, SSE4.2, NEON) to parse JSON significantly faster. The key trick: in‑place parsing with mutable buffers.
#[derive(Deserialize)]
struct ClaudeJsonLine<'a> {
timestamp: &'a str, // borrowed, zero‑copy
#[serde(rename = "requestId")]
request_id: Option<&'a str>, // borrowed, zero‑copy
message: Option<&'a str>,
#[serde(rename = "costUSD")]
cost_usd: Option<f64>,
}
By using &'a str instead of String, we avoid heap allocations for every field. simd-json parses the JSON in‑place on a mutable byte buffer, and our structs just borrow slices from that buffer.
The one gotcha: simd-json’s from_slice requires &mut [u8], so you need to own a mutable copy of each line:
let reader = BufReader::new(File::open(path)?);
for line in reader.lines() {
let line = line?;
let mut bytes = line.into_bytes(); // owned, mutable
if let Ok(parsed) = simd_json::from_slice(&mut bytes) {
// extract what we need, bytes are consumed
}
}
This gave a 17–25 % throughput improvement over standard serde_json on my dataset.
Adding Parallelism with rayon
A single‑threaded parser hit ~1 GiB/s. With 9,000+ files, we can parallelize at the file level trivially using rayon:
use rayon::prelude::*;
let entries: Vec<_> = files
.par_iter()
.flat_map(|f| parser.parse_file(f).unwrap_or_default())
.collect();
Rayon’s par_iter() distributes files across threads automatically. Combined with simd-json, this pushed throughput to ~2 GiB/s — a 3.2× improvement over sequential parsing.
| Stage | Throughput |
|---|---|
serde_json (baseline) | ~800 MiB/s |
simd-json (zero‑copy) | ~1.0 GiB/s |
simd-json + rayon | ~2.0 GiB/s |
The Hard Part: Each CLI is Different
The real complexity wasn’t parsing speed — it was handling three completely different data formats behind a single trait:
pub trait CLIParser: Send + Sync {
fn name(&self) -> &str;
fn data_dir(&self) -> PathBuf;
fn file_pattern(&self) -> &str;
fn parse_file(&self, path: &Path) -> Result<Vec<TokenRecord>, Box<dyn Error>>;
}
Claude Code
Straightforward — each JSONL line with a message.usage field is one API call.
Codex CLI
Tricky. Token counts are cumulative — each token_count event reports the running total, not a delta. The model name is in a separate turn_context line, so parsing is stateful:
line 1: session_meta → extract session_id
line 2: turn_context → extract model name
line 3: event_msg → token_count (cumulative total)
line 4: event_msg → token_count (larger cumulative total)
You need to keep only the last token_count per session.
Gemini CLI
Uses standard JSON (not JSONL) with a unique thinking_tokens field that no other CLI tracks.
TUI with ratatui
For the dashboard I used ratatui to build four views:
- Overview — Total tokens/cost with a GitHub‑style 52‑week heatmap
- Models — Per‑model breakdown with percentage bars
- Daily — Scrollable table with sparkline charts
- Stats — Key metrics in a card grid
The heatmap uses 2×2 Unicode block characters to fit 52 weeks of data in a compact space, with percentile‑based color intensity.
Results
On my machine (Apple Silicon, 9,000+ files, 3.4 GB total):
| Metric | Time |
|---|---|
| Cold start (no cache) | ~1.2 s |
| Warm start (cached) | ~0.05 s |
The caching layer stores daily summaries in ~/.toktrack/cache/. Past dates are immutable — only today is recomputed. This means even when Claude Code deletes session files after 30 days, your cost history survives.
Try It
npx toktrack
# or
cargo install toktrack
GitHub:
If you use Claude Code, Codex CLI, or Gemini CLI and want to know where your tokens are going — give it a try.