I built two high-performance Python libraries for production AI: LLM log analytics and vector similarity search
Source: Dev.to
What My Projects Do
llmlog_engine: Columnar Analytics for LLM Logs
A specialized embedded database for analyzing LLM application logs stored as JSONL.
Core capabilities
- Fast JSONL ingestion into columnar storage format
- Efficient filtering on numeric and string columns
- Group‑by aggregations (COUNT, SUM, AVG, MIN, MAX)
- Dictionary encoding for low‑cardinality strings (model names, routes)
- SIMD‑friendly memory layout for performance
- pandas DataFrame integration
Performance
- 6.8× faster than pure Python on 100 k rows
- Benchmark: filter by model + latency, group by route, compute 6 metrics
- Pure Python: 0.82 s
- C++ Engine: 0.12 s
mini_faiss: Lightweight Vector Similarity Search
A focused, high‑performance library for similarity search in dense embeddings.
Core capabilities
- SIMD‑accelerated distance computation (L2 and inner product)
- NumPy‑friendly API with clean type signatures
- ~1500 lines of readable C++ code
- Support for both Euclidean and cosine similarity
- Heap‑based top‑k selection
Performance
- ≈ 7× faster than pure NumPy on typical workloads
- Benchmark: 100 k vectors, 768 dimensions
- mini_faiss: 0.067 s
- NumPy: 0.48 s
Architecture Philosophy
Both libraries follow the same design pattern:
- Core logic in C++17 – performance‑critical operations using modern C++
- Python bindings via pybind11 – zero‑copy data transfer with NumPy
- Minimal dependencies – no heavy frameworks or complex build chains
- Columnar / SIMD‑friendly layouts – data structures optimized for CPU cache
- Type safety – strict validation at the Python/C++ boundary
This approach delivers near‑native performance while preserving Python’s developer experience.
Syntax Examples
llmlog_engine
Load and analyze logs
from llmlog_engine import LogStore
# Load JSONL logs
store = LogStore.from_jsonl("production_logs.jsonl")
# Analyze slow responses by model
slow_by_model = (
store.query()
.filter(min_latency_ms=500)
.aggregate(
by=["model"],
metrics={
"count": "count",
"avg_latency": "avg(latency_ms)",
"max_latency": "max(latency_ms)",
},
)
)
print(slow_by_model) # Returns pandas DataFrame
Error analysis
# Analyze error rates by model and route
errors = (
store.query()
.filter(status="error")
.aggregate(
by=["model", "route"],
metrics={"count": "count"},
)
)
Combined filters
# Filter by multiple conditions (AND logic)
result = (
store.query()
.filter(
model="gpt-4.1",
min_latency_ms=1000,
route="chat",
)
.aggregate(
by=["model"],
metrics={"avg_tokens": "avg(tokens_output)"},
)
)
Expected JSONL format
{"ts": "2024-01-01T12:00:00Z", "model": "gpt-4.1", "latency_ms": 423, "tokens_input": 100, "tokens_output": 921, "route": "chat", "status": "ok"}
{"ts": "2024-01-01T12:00:15Z", "model": "gpt-4.1-mini", "latency_ms": 152, "tokens_input": 50, "tokens_output": 214, "route": "rag", "status": "ok"}
mini_faiss
Basic similarity search
import numpy as np
from mini_faiss import IndexFlatL2
# Create index for 768‑dimensional vectors
d = 768
index = IndexFlatL2(d)
# Add vectors to index
xb = np.random.randn(10000, d).astype("float32")
index.add(xb)
# Search for nearest neighbors
xq = np.random.randn(5, d).astype("float32")
distances, indices = index.search(xq, k=10)
print(distances.shape) # (5, 10) - 5 queries, 10 neighbors each
print(indices.shape) # (5, 10)
Cosine similarity search
from mini_faiss import IndexFlatIP
# Create inner product index
index = IndexFlatIP(d=768)
# Normalize vectors for cosine similarity
xb = np.random.randn(10000, 768).astype("float32")
xb /= np.linalg.norm(xb, axis=1, keepdims=True)
index.add(xb)
# Assume xq_normalized is similarly normalized
distances, indices = index.search(xq_normalized, k=10)
# Higher distances = more similar
Implementation Highlights
llmlog_engine
Columnar storage with dictionary encoding
- String columns (model, route, status) mapped to
int32IDs - Numeric columns stored as contiguous arrays
- Filtering operates on compact integer representations
Query execution
- Build boolean mask from filter predicates (AND logic)
- Group matching rows by specified columns
- Compute aggregations only on filtered rows
- Return a pandas DataFrame
Example internal representation
Column: model [0, 1, 0, 2, 0, ...] (int32 IDs)
Column: latency_ms [423, 1203, 512, ...] (int32)
Dictionary: model {0: "gpt-4.1-mini", 1: "gpt-4.1", 2: "gpt-4-turbo"}
mini_faiss
Distance computation (L2)
||q - db||² = ||q||² - 2·q·db + ||db||²
- Precomputes database norms for efficiency
- Vectorizable loops enable SIMD auto‑vectorization
Top‑k selection
- Heap‑based algorithm:
O(N log k)per query - Efficient when
k << N - Separate implementations for min (L2) and max (inner product)
Row‑major storage
data = [v_0[0], v_0[1], ..., v_0[d-1],
v_1[0], v_1[1], ..., v_1[d-1],
...]
Cache‑friendly for batch distance computation.
Installation
Both libraries use standard Python packaging:
# llmlog_engine
git clone https://github.com/yuuichieguchi/llmlog_engine.git
cd llmlog_engine
pip install -e .
# mini_faiss
git clone https://github.com/yuuichieguchi/mini_faiss.git
cd mini_faiss
pip install .
Requirements
- Python 3.8+
- C++17 compiler (GCC, Clang, MSVC)
- CMake 3.15+
- pybind11 (installed via pip)
Use Cases
llmlog_engine
- Monitor LLM application health in production
- Analyze latency patterns by model and endpoint
- Track error rates and failure modes
- Debug performance regressions
- Generate usage reports for cost analysis
mini_faiss
- Dense retrieval for RAG systems
- Document similarity search
- Image search using vision model embeddings
- Recommendation systems (nearest‑neighbor recommendations)
- Prototyping before scaling to full FAISS
Known Limitations
llmlog_engine
- In‑memory only (no persistence yet)
- Single‑threaded query execution
- No complex expressions or advanced query features at this time
mini_faiss
- Limited to flat indexes (no IVF, HNSW, etc.)
- No built‑in persistence; index must be rebuilt or serialized manually
- Primarily optimized for CPU; GPU acceleration not provided