Parsing 2 GiB/s of AI token logs with Rust + simd-json

Published: 3 months ago (January 30, 2026 at 09:44 PM EST)

4 min read

Source: Dev.to

Source: Dev.to

The Problem

I use Claude Code, Codex CLI, and Gemini CLI daily. One day I checked my API bill — it was way higher than expected, but I had no idea where the tokens were going.

Existing tracking tools were too slow. Scanning my 3 GB of session files (9,000+ files across three CLIs) took over 40 seconds. I wanted something instant.

So I built toktrack — a terminal‑native token usage tracker that parses everything locally at 2 GiB/s.

The Data

Each AI CLI stores session data differently:

CLI	Location	Format
Claude Code	`~/.claude/projects/*/.jsonl`	JSONL, per‑message usage
Codex CLI	`~/.codex/sessions/*/.jsonl`	JSONL, cumulative counters
Gemini CLI	`~/.gemini/tmp//chats/.json`	JSON, includes `thinking_tokens`

A single Claude Code session file can look like this:

{
  "timestamp":"2026-01-15T10:00:00Z",
  "message":{
    "model":"claude-sonnet-4-20250514",
    "usage":{
      "input_tokens":12000,
      "output_tokens":3500,
      "cache_read_input_tokens":8000,
      "cache_creation_input_tokens":2000
    }
  },
  "costUSD":0.042
}

Multiply this by thousands of sessions over months, and you’re looking at gigabytes of JSONL to parse.

Why simd-json

Standard serde_json is good, but when parsing 3 GB of line‑delimited JSON, every microsecond per line adds up.

simd-json is a Rust port of simdjson that uses SIMD instructions (AVX2, SSE4.2, NEON) to parse JSON significantly faster. The key trick: in‑place parsing with mutable buffers.

#[derive(Deserialize)]
struct ClaudeJsonLine<'a> {
    timestamp: &'a str,               // borrowed, zero‑copy
    #[serde(rename = "requestId")]
    request_id: Option<&'a str>,      // borrowed, zero‑copy
    message: Option<&'a str>,
    #[serde(rename = "costUSD")]
    cost_usd: Option<f64>,
}

By using &'a str instead of String, we avoid heap allocations for every field. simd-json parses the JSON in‑place on a mutable byte buffer, and our structs just borrow slices from that buffer.

The one gotcha: simd-json’s from_slice requires &mut [u8], so you need to own a mutable copy of each line:

let reader = BufReader::new(File::open(path)?);
for line in reader.lines() {
    let line = line?;
    let mut bytes = line.into_bytes(); // owned, mutable
    if let Ok(parsed) = simd_json::from_slice(&mut bytes) {
        // extract what we need, bytes are consumed
    }
}

This gave a 17–25 % throughput improvement over standard serde_json on my dataset.

Adding Parallelism with rayon

A single‑threaded parser hit ~1 GiB/s. With 9,000+ files, we can parallelize at the file level trivially using rayon:

use rayon::prelude::*;

let entries: Vec<_> = files
    .par_iter()
    .flat_map(|f| parser.parse_file(f).unwrap_or_default())
    .collect();

Rayon’s par_iter() distributes files across threads automatically. Combined with simd-json, this pushed throughput to ~2 GiB/s — a 3.2× improvement over sequential parsing.

Stage	Throughput
`serde_json` (baseline)	~800 MiB/s
`simd-json` (zero‑copy)	~1.0 GiB/s
`simd-json` + rayon	~2.0 GiB/s

The Hard Part: Each CLI is Different

The real complexity wasn’t parsing speed — it was handling three completely different data formats behind a single trait:

pub trait CLIParser: Send + Sync {
    fn name(&self) -> &str;
    fn data_dir(&self) -> PathBuf;
    fn file_pattern(&self) -> &str;
    fn parse_file(&self, path: &Path) -> Result<Vec<TokenRecord>, Box<dyn Error>>;
}

Claude Code

Straightforward — each JSONL line with a message.usage field is one API call.

Codex CLI

Tricky. Token counts are cumulative — each token_count event reports the running total, not a delta. The model name is in a separate turn_context line, so parsing is stateful:

line 1: session_meta   → extract session_id
line 2: turn_context   → extract model name
line 3: event_msg      → token_count (cumulative total)
line 4: event_msg      → token_count (larger cumulative total)

You need to keep only the last token_count per session.

Gemini CLI

Uses standard JSON (not JSONL) with a unique thinking_tokens field that no other CLI tracks.

TUI with ratatui

For the dashboard I used ratatui to build four views:

Overview — Total tokens/cost with a GitHub‑style 52‑week heatmap
Models — Per‑model breakdown with percentage bars
Daily — Scrollable table with sparkline charts
Stats — Key metrics in a card grid

The heatmap uses 2×2 Unicode block characters to fit 52 weeks of data in a compact space, with percentile‑based color intensity.

Results

On my machine (Apple Silicon, 9,000+ files, 3.4 GB total):

Metric	Time
Cold start (no cache)	~1.2 s
Warm start (cached)	~0.05 s

The caching layer stores daily summaries in ~/.toktrack/cache/. Past dates are immutable — only today is recomputed. This means even when Claude Code deletes session files after 30 days, your cost history survives.

Try It

npx toktrack
# or
cargo install toktrack

GitHub:

If you use Claude Code, Codex CLI, or Gemini CLI and want to know where your tokens are going — give it a try.