Building a Sovereign AI Stack: From Zero to POC
Source: Dev.to

In an era where data privacy is paramount, relying on cloud‑based AI providers isn’t always an option. Whether for compliance, security, or just peace of mind, running a Sovereign AI Stack—a completely local, self‑controlled AI infrastructure—is the ultimate goal for many organizations.
Today, we built a Proof of Concept (POC) for such a stack, leveraging open‑source tools to create a private, observable, and searchable AI environment. Here is our journey.
The Architecture
Our stack consists of three core components, orchestrated by a Node.js application:
- AI Server – a local LLM running on
llama.cpp(serving an OpenAI‑compatible API). This provides the intelligence without data leaving the network. - Search Engine – Manticore Search (running in Docker). Chosen for its lightweight footprint and powerful full‑text search capabilities, essential for RAG (Retrieval‑Augmented Generation).
- Observability – AI Observer (running in Docker). Captures traces and metrics of our AI interactions.
Architecture Visualized
┌─────────────────┐ ┌──────────────────┐
│ │──(1)──▶│ Manticore Search │
│ Orchestrator │ │ (Docker) │
│ (Node.js) │ └──────────────────┘
│ │ ┌──────────────────┐
│ │──(2)──▶│ AI Server LLM │
│ │ │ (192.168.0.2) │
│ │ └──────────────────┘
│ │ ┌──────────────────┐
│ │──(3)──▶│ AI Observer │
└─────────────────┘ │ (Docker) │
└──────────────────┘
│
(4)
▼
(Monitors AI Server)
Component State Flow
[*] ──▶ Init ──▶ Indexing: Create Table (RT)
│
▼
Searching: Documents Added
/ \
/ \
Error: No Hits (Retry) RAG_Construction: Hits Found
│ │
[*] ▼
Inference: Context + Prompt
/ \
/ \
Timeout: Model Slow Success: Answer Generated
│ │
[*] [*]
The Implementation
1. Setting the Foundation (Docker)
We containerized Manticore and AI Observer using docker‑compose. A key challenge was networking: ensuring the orchestrator (client) could talk to the containers and the external AI server. Mapping ports (9308, 9312, 3001) was crucial.
Lesson: Manticore’s SQL interface over HTTP (
/sql) is powerful but slightly different from the JSON‑only/searchendpoint many clients expect. We had to adapt our client to parse the SQL response structure properly.
2. The Orchestrator
A simple TypeScript orchestrator mimics a real‑world application flow:
- Ingest – index sovereign data into Manticore.
- Retrieve – search Manticore for relevant context (
MATCH('Ensures data privacy')). - Augment – combine the retrieved context with a user prompt.
- Generate – send the augmented prompt to the local LLM.
- Observe – log every step to AI Observer.
3. Verification & Testing
- Integration Tests – using
vitest, we verified that documents are indexed correctly and retrievable (fixing a zero‑hit issue by understanding RT index flushing). - End‑to‑End – the full pipeline generated a coherent explanation of “Sovereign AI” using our local setup.
- Visual Validation – AI Observer UI was checked via browser automation to ensure telemetry was landing.
Real‑World Experience
The most striking realization was the latency trade‑off. Our local LLM took ~18–80 seconds for a comprehensive answer. While slower than cloud APIs, the trade‑off buys total privacy—no token costs, no data leaks.
Manticore proved incredibly fast for retrieval, often returning hits in milliseconds, making it a perfect companion for the slower LLM.
Conclusion & What’s Next
This POC demonstrates that a Sovereign AI Stack is not only possible but also accessible. With tools like Manticore and AI Observer, you can build a robust, private RAG pipeline in an afternoon.
What’s Next
- Implement a persistent vector store for semantic search.
- Optimize LLM inference speed (quantization, GPU offloading).
- Build a chat UI on top of the orchestrator.