클로드 코드와 콕스는 로컬에 토큰 사용량을 기록합니다. 확인 방법입니다

발행: 14시간 전 (2026년 6월 18일 AM 11:40 GMT+9)

5 분 소요

출처: Dev.to

전체 화면 모드
전체 화면 모드 종료

Your AI coding agent’s token data is already on your machine. You just haven’t looked at it yet.

당신의 AI 코딩 에이전트의 토큰 데이터는 이미 당신의 기기에 있습니다. 아직 보지 않았군요.

Claude Code and Codex both write local logs after every session. Those logs include detailed token breakdowns: uncached input, cache hits, cache writes, output. No API call needed, no provider dashboard, no guessing. The number that matters most, your prompt cache hit rate, has been sitting on your disk every time you wondered why you were burning through your weekly limit so fast.

Claude Code와 Codex 모두 세션마다 로컬 로그를 작성합니다. 이 로그에는 상세한 토큰 분해가 포함되어 있습니다: 비캐시 입력, 캐시 히트, 캐시 쓰기, 출력. API 호출이 필요 없으며, 제공업체 대시보드도 없고, 추측도 없습니다. 가장 중요한 숫자, 프롬프트 캐시 적합률은 매번 로그에 기록되어 있어 주간 한도를 빠르게 소진하는 이유를 설명해 줍니다.

The logs you already have

Claude Code writes a JSONL transcript for every session under ~/.claude/projects/. Each assistant message carries a usage block:

{
   "type": "assistant",
   "uuid": "f0c8...",
   "message": {
     "model": "claude-opus-4-...",
     "usage": {
       "input_tokens": 137,
       "cache_read_input_tokens": 815193,
       "cache_creation_input_tokens": 5521,
       "output_tokens": 4260
     }
   }
}

전체 화면 모드
전체 화면 모드 종료

That split is the whole game. input_tokens is uncached input. cache_read_input_tokens is context served from the prompt cache. cache_creation_input_tokens is context written to cache. output_tokens is the response. On a long agent session, cache_read should dwarf input_tokens. If it does not, you are re‑paying for the same context on every single turn.

Codex writes rollouts under ~/.codex/sessions/. It emits token_count events with a cumulative running total per session:

{ "type": "token_count", "info": { "total_token_usage": {
    "input_tokens": 0, "cached_input_tokens": 0,
    "output_tokens": 0, "reasoning_output_tokens": 0
}}}

전체 화면 모드
전체 화면 모드 종료

Because Codex counts are cumulative, you take the delta between events rather than summing them.

Reading the logs without reading your prompts

A few lines of Node walk the JSONL, sum usage per model per day, and dedupe by message uuid for Claude and by session delta for Codex, so you never double‑count:

import { readFileSync } from 'node:fs'

for (const line of readFileSync(file, 'utf8').split('\n')) {
  if (!line.trim()) continue
  const o = JSON.parse(line)
  const u = o.message?.usage
  if (!u || seen.has(o.uuid)) continue
  seen.add(o.uuid)
  // accumulate u.input_tokens, u.cache_read_input_tokens,
  // u.cache_creation_input_tokens, u.output_tokens by o.message.model
}

Notice what you do not need: the prompt text, the response text, or any API key. Model names and token counts are enough to compute everything useful. A usage tool should never have to read what you typed, and this one does not.

The number that actually matters

Once you aggregate, one metric matters more than the rest: your prompt cache hit rate.

hit_rate = cache_read / (cache_read + cache_creation + uncached_input)

On a flat plan, this is your real efficiency lever. A high hit rate means you are reusing context instead of resending it. A low one means you are burning tokens, and your usage limit, on the same context over and over. The fix is usually structural: stabilize the front of your prompt so the cache prefix stays intact, keep tool definitions lean, and stop reshuffling system context between turns.

One honest caveat: on a subscription you do not pay per token, so any dollar figure is an API list‑price equivalent, not your actual cost. It is a useful sense of scale, nothing more. The signals that genuinely matter are token volume and cache hit rate. Any tool that flashes a “you spent $X this month” number at a flat‑plan user is being a little loose with what that number means.

Turning it into a live dashboard

I wrapped all of this into ModelMeter. A one‑line collector reads those local logs and sends the token counts, and only the token counts, to a dashboard that shows your cache hit rate, ranks where your tokens are going, and labels every figure by how it was derived: computed from real tokens, a gated estimate, or “coming” when it needs request‑level data the logs do not contain.

npx modelmeter-collect init 
npx modelmeter-collect

전체 화면 모드
전체 화면 모드 종료

Add a Claude Code Stop hook or a 60‑second cron job and it stays live, updating after each prompt. It works for Claude Code, Codex, or both. It also accepts usage from a metered API key via a copy‑paste snippet, or from a CSV export if you would rather not run the collector at all.

Free to try at modelmeter.dev.

The point

Whether or not you use ModelMeter: you are not flying blind. Your subscription coding tool has been writing detailed usage data to your local disk after every session. Go read it. You will almost certainly find that your biggest efficiency lever is a single number you have never once looked at.

클로드 코드와 콕스는 로컬에 토큰 사용량을 기록합니다. 확인 방법입니다

관련 글

메인넷 진입: XRPL 대출 프로토콜의 보안 우선 접근법

코드 리뷰가 잘못됐다

의존성 고정 vs 변동 버전 — 보안팀이 반드시 알아야 할 내용

러시아 EGRUL 조회, FNS가 실제 공개한 내용