Building a Sovereign AI Stack: From Zero to POC

Published: 1 hour ago (March 7, 2026 at 10:12 AM EST)

3 min read

Source: Dev.to

Cover image for Building a Sovereign AI Stack: From Zero to POC

In an era where data privacy is paramount, relying on cloud‑based AI providers isn’t always an option. Whether for compliance, security, or just peace of mind, running a Sovereign AI Stack—a completely local, self‑controlled AI infrastructure—is the ultimate goal for many organizations.

Today, we built a Proof of Concept (POC) for such a stack, leveraging open‑source tools to create a private, observable, and searchable AI environment. Here is our journey.

The Architecture

Our stack consists of three core components, orchestrated by a Node.js application:

AI Server – a local LLM running on llama.cpp (serving an OpenAI‑compatible API). This provides the intelligence without data leaving the network.
Search Engine – Manticore Search (running in Docker). Chosen for its lightweight footprint and powerful full‑text search capabilities, essential for RAG (Retrieval‑Augmented Generation).
Observability – AI Observer (running in Docker). Captures traces and metrics of our AI interactions.

Architecture Visualized

┌─────────────────┐        ┌──────────────────┐
│                 │──(1)──▶│ Manticore Search │
│  Orchestrator   │        │     (Docker)     │
│    (Node.js)    │        └──────────────────┘
│                 │        ┌──────────────────┐
│                 │──(2)──▶│  AI Server LLM   │
│                 │        │  (192.168.0.2)   │
│                 │        └──────────────────┘
│                 │        ┌──────────────────┐
│                 │──(3)──▶│   AI Observer    │
└─────────────────┘        │     (Docker)     │
                           └──────────────────┘
                                     │
                                    (4)
                                     ▼
                           (Monitors AI Server)

Component State Flow

[*] ──▶ Init ──▶ Indexing: Create Table (RT)
                    │
                    ▼
              Searching: Documents Added
              /                     \
             /                       \
   Error: No Hits (Retry)      RAG_Construction: Hits Found
           │                              │
          [*]                             ▼
                              Inference: Context + Prompt
                              /                     \
                             /                       \
             Timeout: Model Slow            Success: Answer Generated
                     │                               │
                    [*]                             [*]

The Implementation

1. Setting the Foundation (Docker)

We containerized Manticore and AI Observer using docker‑compose. A key challenge was networking: ensuring the orchestrator (client) could talk to the containers and the external AI server. Mapping ports (9308, 9312, 3001) was crucial.

Lesson: Manticore’s SQL interface over HTTP (/sql) is powerful but slightly different from the JSON‑only /search endpoint many clients expect. We had to adapt our client to parse the SQL response structure properly.

2. The Orchestrator

A simple TypeScript orchestrator mimics a real‑world application flow:

Ingest – index sovereign data into Manticore.
Retrieve – search Manticore for relevant context (MATCH('Ensures data privacy')).
Augment – combine the retrieved context with a user prompt.
Generate – send the augmented prompt to the local LLM.
Observe – log every step to AI Observer.

3. Verification & Testing

Integration Tests – using vitest, we verified that documents are indexed correctly and retrievable (fixing a zero‑hit issue by understanding RT index flushing).
End‑to‑End – the full pipeline generated a coherent explanation of “Sovereign AI” using our local setup.
Visual Validation – AI Observer UI was checked via browser automation to ensure telemetry was landing.

Real‑World Experience

The most striking realization was the latency trade‑off. Our local LLM took ~18–80 seconds for a comprehensive answer. While slower than cloud APIs, the trade‑off buys total privacy—no token costs, no data leaks.

Manticore proved incredibly fast for retrieval, often returning hits in milliseconds, making it a perfect companion for the slower LLM.

Conclusion & What’s Next

This POC demonstrates that a Sovereign AI Stack is not only possible but also accessible. With tools like Manticore and AI Observer, you can build a robust, private RAG pipeline in an afternoon.

What’s Next

Implement a persistent vector store for semantic search.
Optimize LLM inference speed (quantization, GPU offloading).
Build a chat UI on top of the orchestrator.

Building a Sovereign AI Stack: From Zero to POC

The Architecture

Architecture Visualized

Component State Flow

The Implementation

1. Setting the Foundation (Docker)

2. The Orchestrator

3. Verification & Testing

Real‑World Experience

Conclusion & What’s Next

What’s Next

Related posts

FAQ: The HIPAA Illusion — Your Medical Data Privacy Questions Answered

We Ship Production Apps in Weeks, Not Months. Here's the Engineering Behind It.

LLMs work best when the user defines their acceptance criteria first

Beginning of My Journey