I built react-native-llm-meter, LLM cost tracking for Expo apps
Source: Dev.to
Overview
If you ship Claude, GPT, or Gemini calls from a React Native app, you don’t have a good way to see what’s happening on the device. Server‑side observability tools (Langfuse, Helicone, LangSmith, Stripe’s token‑meter) work great for Node back‑ends, but they assume a server environment, rely on Node‑only APIs, and don’t ship AsyncStorage adapters—streaming even breaks under Hermes.
Enter react-native-llm-meter.
The library tracks LLM usage directly on Expo apps, offering provider‑agnostic metrics, budgeting, and a developer overlay.
Installation
npm install react-native-llm-meter @react-native-async-storage/async-storage
Basic Usage
import { Meter, MeterProvider } from "react-native-llm-meter";
import Anthropic from "@anthropic-ai/sdk";
const anthropic = new Anthropic({
apiKey: process.env.EXPO_PUBLIC_ANTHROPIC_API_KEY,
});
const meter = new Meter();
const client = meter.wrap(anthropic);
export default function App() {
return (
// Your app UI goes here
);
}
Every call made through the wrapped client is recorded with:
- provider & model
- input / output token counts
- latency
- time‑to‑first‑token (TTFT) for streams
- computed cost (USD)
The wrapper keeps the original SDK’s interface; you only change how the client is constructed.
Getting a Summary
meter.summary();
// {
// count: 47,
// totalCostUsd: 0.0894,
// inputTokens: 24103,
// outputTokens: 7379,
// latencyP50: 612,
// latencyP95: 1840,
// ttftP50: 287,
// ttftP95: 612,
// byModel: { … }
// }
You can also retrieve live metrics with useMetrics() (React hook) or fetch raw events via meter.getEvents({ from, to }).
Streaming TTFT
TTFT (time‑to‑first‑token) is captured separately from total latency because each provider streams differently. It reflects perceived responsiveness—how long the user waited before any output appeared.
Detection rules
| Provider | First‑token signal |
|---|---|
| Anthropic | First content_block_delta chunk |
| OpenAI | First chunk where choices[0].delta.content is non‑empty |
First chunk where candidates[0].content.parts[0].text is non‑empty |
For OpenAI streaming you must include stream_options: { include_usage: true } to receive usage data. The library warns when usage is missing.
Storage Adapters
- AsyncStorageAdapter – works everywhere, retains data in day‑bucketed chunks.
- SqliteAdapter – for higher‑volume use cases, built on
expo-sqlite.
A migration helper lets you move between adapters. If you skip both, events are kept only in memory.
Budgets
meter.setBudget({
daily: 5,
weekly: 25,
onCross: ({ period, threshold, spend }) => {
Alert.alert(`${period} limit hit`, `$${spend.toFixed(2)} / $${threshold}`);
},
});
Soft alerts invoke the callback without blocking the request. Hard circuit‑breakers (which would abort calls) are on the roadmap.
Developer Overlay
import { MeterOverlay } from "react-native-llm-meter/overlay";
A floating, draggable overlay appears only in __DEV__ builds, so it never ships to production. Importing from the overlay sub‑path keeps react-native out of non‑RN bundles.
What It Deliberately Doesn’t Do
| Feature | Reason |
|---|---|
| Prompt content | Never logs raw prompts; only token counts, latency, model name, cost, and optional metadata. |
| Server‑side observability | Use Langfuse, Helicone, etc., for Node‑based calls. |
| Web support | Core is platform‑agnostic, but the build isn’t ready for browsers yet. |
| Hosted dashboard | It’s a library; you can POST events to any endpoint (Sentry, Datadog, etc.) via the optional remote sink. |
Model Token Costs
Pricing tables are hard‑coded in src/pricing/table.ts and reflect published rates. A PR template makes updates quick (≈2 minutes). Unknown models emit a one‑time warning per provider/model pair during development.
Try It
npm install react-native-llm-meter @react-native-async-storage/async-storage
Repository:
Bugs, PRs, and pricing updates are welcome. If you’ve shipped Claude or GPT in an Expo app and hit an edge case, let the maintainer know.
Built by Ankit Virdi.