Building a Sovereign AI Stack: From Zero to POC

Published: (March 7, 2026 at 10:12 AM EST)
3 min read
Source: Dev.to

Source: Dev.to

Cover image for Building a Sovereign AI Stack: From Zero to POC

In an era where data privacy is paramount, relying on cloud‑based AI providers isn’t always an option. Whether for compliance, security, or just peace of mind, running a Sovereign AI Stack—a completely local, self‑controlled AI infrastructure—is the ultimate goal for many organizations.

Today, we built a Proof of Concept (POC) for such a stack, leveraging open‑source tools to create a private, observable, and searchable AI environment. Here is our journey.

The Architecture

Our stack consists of three core components, orchestrated by a Node.js application:

  • AI Server – a local LLM running on llama.cpp (serving an OpenAI‑compatible API). This provides the intelligence without data leaving the network.
  • Search EngineManticore Search (running in Docker). Chosen for its lightweight footprint and powerful full‑text search capabilities, essential for RAG (Retrieval‑Augmented Generation).
  • ObservabilityAI Observer (running in Docker). Captures traces and metrics of our AI interactions.

Architecture Visualized

┌─────────────────┐        ┌──────────────────┐
│                 │──(1)──▶│ Manticore Search │
│  Orchestrator   │        │     (Docker)     │
│    (Node.js)    │        └──────────────────┘
│                 │        ┌──────────────────┐
│                 │──(2)──▶│  AI Server LLM   │
│                 │        │  (192.168.0.2)   │
│                 │        └──────────────────┘
│                 │        ┌──────────────────┐
│                 │──(3)──▶│   AI Observer    │
└─────────────────┘        │     (Docker)     │
                           └──────────────────┘

                                    (4)

                           (Monitors AI Server)

Component State Flow

[*] ──▶ Init ──▶ Indexing: Create Table (RT)


              Searching: Documents Added
              /                     \
             /                       \
   Error: No Hits (Retry)      RAG_Construction: Hits Found
           │                              │
          [*]                             ▼
                              Inference: Context + Prompt
                              /                     \
                             /                       \
             Timeout: Model Slow            Success: Answer Generated
                     │                               │
                    [*]                             [*]

The Implementation

1. Setting the Foundation (Docker)

We containerized Manticore and AI Observer using docker‑compose. A key challenge was networking: ensuring the orchestrator (client) could talk to the containers and the external AI server. Mapping ports (9308, 9312, 3001) was crucial.

Lesson: Manticore’s SQL interface over HTTP (/sql) is powerful but slightly different from the JSON‑only /search endpoint many clients expect. We had to adapt our client to parse the SQL response structure properly.

2. The Orchestrator

A simple TypeScript orchestrator mimics a real‑world application flow:

  • Ingest – index sovereign data into Manticore.
  • Retrieve – search Manticore for relevant context (MATCH('Ensures data privacy')).
  • Augment – combine the retrieved context with a user prompt.
  • Generate – send the augmented prompt to the local LLM.
  • Observe – log every step to AI Observer.

3. Verification & Testing

  • Integration Tests – using vitest, we verified that documents are indexed correctly and retrievable (fixing a zero‑hit issue by understanding RT index flushing).
  • End‑to‑End – the full pipeline generated a coherent explanation of “Sovereign AI” using our local setup.
  • Visual Validation – AI Observer UI was checked via browser automation to ensure telemetry was landing.

Real‑World Experience

The most striking realization was the latency trade‑off. Our local LLM took ~18–80 seconds for a comprehensive answer. While slower than cloud APIs, the trade‑off buys total privacy—no token costs, no data leaks.

Manticore proved incredibly fast for retrieval, often returning hits in milliseconds, making it a perfect companion for the slower LLM.

Conclusion & What’s Next

This POC demonstrates that a Sovereign AI Stack is not only possible but also accessible. With tools like Manticore and AI Observer, you can build a robust, private RAG pipeline in an afternoon.

What’s Next

  • Implement a persistent vector store for semantic search.
  • Optimize LLM inference speed (quantization, GPU offloading).
  • Build a chat UI on top of the orchestrator.
0 views
Back to Blog

Related posts

Read more »

Beginning of My Journey

About Me Hello world! My name is Henry Han, and my Korean name is Sehun Han. I lived in Nicaragua for 15 years and attended an American school, which made me f...