I built two high-performance Python libraries for production AI: LLM log analytics and vector similarity search

Published: 2 days ago (December 3, 2025 at 05:45 AM EST)

4 min read

Source: Dev.to

Source: Dev.to

What My Projects Do

llmlog_engine: Columnar Analytics for LLM Logs

A specialized embedded database for analyzing LLM application logs stored as JSONL.

Core capabilities

Fast JSONL ingestion into columnar storage format
Efficient filtering on numeric and string columns
Group‑by aggregations (COUNT, SUM, AVG, MIN, MAX)
Dictionary encoding for low‑cardinality strings (model names, routes)
SIMD‑friendly memory layout for performance
pandas DataFrame integration

Performance

6.8× faster than pure Python on 100 k rows
- Benchmark: filter by model + latency, group by route, compute 6 metrics
- Pure Python: 0.82 s
- C++ Engine: 0.12 s

mini_faiss: Lightweight Vector Similarity Search

A focused, high‑performance library for similarity search in dense embeddings.

Core capabilities

SIMD‑accelerated distance computation (L2 and inner product)
NumPy‑friendly API with clean type signatures
~1500 lines of readable C++ code
Support for both Euclidean and cosine similarity
Heap‑based top‑k selection

Performance

≈ 7× faster than pure NumPy on typical workloads
- Benchmark: 100 k vectors, 768 dimensions
- mini_faiss: 0.067 s
- NumPy: 0.48 s

Architecture Philosophy

Both libraries follow the same design pattern:

Core logic in C++17 – performance‑critical operations using modern C++
Python bindings via pybind11 – zero‑copy data transfer with NumPy
Minimal dependencies – no heavy frameworks or complex build chains
Columnar / SIMD‑friendly layouts – data structures optimized for CPU cache
Type safety – strict validation at the Python/C++ boundary

This approach delivers near‑native performance while preserving Python’s developer experience.

Syntax Examples

llmlog_engine

Load and analyze logs

from llmlog_engine import LogStore

# Load JSONL logs
store = LogStore.from_jsonl("production_logs.jsonl")

# Analyze slow responses by model
slow_by_model = (
    store.query()
    .filter(min_latency_ms=500)
    .aggregate(
        by=["model"],
        metrics={
            "count": "count",
            "avg_latency": "avg(latency_ms)",
            "max_latency": "max(latency_ms)",
        },
    )
)

print(slow_by_model)  # Returns pandas DataFrame

Error analysis

# Analyze error rates by model and route
errors = (
    store.query()
    .filter(status="error")
    .aggregate(
        by=["model", "route"],
        metrics={"count": "count"},
    )
)

Combined filters

# Filter by multiple conditions (AND logic)
result = (
    store.query()
    .filter(
        model="gpt-4.1",
        min_latency_ms=1000,
        route="chat",
    )
    .aggregate(
        by=["model"],
        metrics={"avg_tokens": "avg(tokens_output)"},
    )
)

Expected JSONL format

{"ts": "2024-01-01T12:00:00Z", "model": "gpt-4.1", "latency_ms": 423, "tokens_input": 100, "tokens_output": 921, "route": "chat", "status": "ok"}
{"ts": "2024-01-01T12:00:15Z", "model": "gpt-4.1-mini", "latency_ms": 152, "tokens_input": 50, "tokens_output": 214, "route": "rag", "status": "ok"}

mini_faiss

Basic similarity search

import numpy as np
from mini_faiss import IndexFlatL2

# Create index for 768‑dimensional vectors
d = 768
index = IndexFlatL2(d)

# Add vectors to index
xb = np.random.randn(10000, d).astype("float32")
index.add(xb)

# Search for nearest neighbors
xq = np.random.randn(5, d).astype("float32")
distances, indices = index.search(xq, k=10)

print(distances.shape)  # (5, 10) - 5 queries, 10 neighbors each
print(indices.shape)    # (5, 10)

Cosine similarity search

from mini_faiss import IndexFlatIP

# Create inner product index
index = IndexFlatIP(d=768)

# Normalize vectors for cosine similarity
xb = np.random.randn(10000, 768).astype("float32")
xb /= np.linalg.norm(xb, axis=1, keepdims=True)

index.add(xb)

# Assume xq_normalized is similarly normalized
distances, indices = index.search(xq_normalized, k=10)
# Higher distances = more similar

Implementation Highlights

llmlog_engine

Columnar storage with dictionary encoding

String columns (model, route, status) mapped to int32 IDs
Numeric columns stored as contiguous arrays
Filtering operates on compact integer representations

Query execution

Build boolean mask from filter predicates (AND logic)
Group matching rows by specified columns
Compute aggregations only on filtered rows
Return a pandas DataFrame

Example internal representation

Column: model       [0, 1, 0, 2, 0, ...] (int32 IDs)
Column: latency_ms  [423, 1203, 512, ...] (int32)
Dictionary: model   {0: "gpt-4.1-mini", 1: "gpt-4.1", 2: "gpt-4-turbo"}

mini_faiss

Distance computation (L2)

||q - db||² = ||q||² - 2·q·db + ||db||²

Precomputes database norms for efficiency
Vectorizable loops enable SIMD auto‑vectorization

Top‑k selection

Heap‑based algorithm: O(N log k) per query
Efficient when k << N
Separate implementations for min (L2) and max (inner product)

Row‑major storage

data = [v_0[0], v_0[1], ..., v_0[d-1],
        v_1[0], v_1[1], ..., v_1[d-1],
        ...]

Cache‑friendly for batch distance computation.

Installation

Both libraries use standard Python packaging:

# llmlog_engine
git clone https://github.com/yuuichieguchi/llmlog_engine.git
cd llmlog_engine
pip install -e .

# mini_faiss
git clone https://github.com/yuuichieguchi/mini_faiss.git
cd mini_faiss
pip install .

Requirements

Python 3.8+
C++17 compiler (GCC, Clang, MSVC)
CMake 3.15+
pybind11 (installed via pip)

Use Cases

llmlog_engine

Monitor LLM application health in production
Analyze latency patterns by model and endpoint
Track error rates and failure modes
Debug performance regressions
Generate usage reports for cost analysis

mini_faiss

Dense retrieval for RAG systems
Document similarity search
Image search using vision model embeddings
Recommendation systems (nearest‑neighbor recommendations)
Prototyping before scaling to full FAISS

I built two high-performance Python libraries for production AI: LLM log analytics and vector similarity search

What My Projects Do

llmlog_engine: Columnar Analytics for LLM Logs

mini_faiss: Lightweight Vector Similarity Search

Architecture Philosophy

Syntax Examples

llmlog_engine

Load and analyze logs

Error analysis

Combined filters

Expected JSONL format

mini_faiss

Basic similarity search

Cosine similarity search

Implementation Highlights

llmlog_engine

mini_faiss

Installation

Use Cases

llmlog_engine

mini_faiss

Known Limitations

llmlog_engine

mini_faiss

Related posts

AWS re:Invent 2025 - Beyond web browsers: HITL and tool integration for Nova Agents (AIM3334)

AWS re:Invent 2025 - Zoox: Building Machine Learning Infrastructure for Autonomous Vehicles (AMZ304)

arreglar pinchazos cerca de mi en Alpedrete

AWS re:Invent 2025 - Intelligent security: Protection at scale from development to production-INV214