A-Modular-Kingdom - The Infrastructure Layer AI Agents Deserve

Published: (December 2, 2025 at 11:46 PM EST)
3 min read
Source: Dev.to

Source: Dev.to

Introduction

Every AI agent I built had the same problem: I kept rebuilding the same infrastructure from scratch.
RAG system? Build it again.
Long‑term memory? Implement it again.
Web search, code execution, vision? Wire them up again.

After the third project I extracted everything into a single, production‑ready foundation that any agent can plug into via the Model Context Protocol (MCP).
A‑Modular‑Kingdom is that foundation.

Getting Started

Start the MCP server:

python src/agent/host.py

Now any AI agent—Claude Desktop, Gemini, custom chatbots—instantly gets:

  • Document retrieval (RAG) with Qdrant + BM25 + Cross‑Encoder reranking
  • Hierarchical memory that persists across sessions and projects
  • 10+ tools: web search, browser automation, code execution, vision, TTS/STT

One server. Unlimited applications.

Tools

ToolWhat It Does
query_knowledge_baseSearch documents with hybrid retrieval (vector + keyword + reranking)
save_memoryStore memories with automatic scope inference
search_memoriesRetrieve with priority: global rules → preferences → project context
save_factStructured fact storage with metadata
set_global_rulePersistent instructions across all sessions
list_all_memoriesView everything stored
delete_memoryRemove by ID
web_searchDuckDuckGo integration
browser_automationPlaywright scraping (text + screenshots)
code_executeSafe Python sandbox
analyze_mediaOllama vision for images/videos
text_to_speechMultiple engines (pyttsx3, gtts, kokoro)
speech_to_textWhisper transcription

Retrieval Architecture

Most RAG implementations are naive: embed documents, find nearest neighbors, return results. This works for demos but fails in production.

A‑Modular‑Kingdom uses a three‑stage pipeline:

  1. Vector search (Qdrant Cloud) finds semantically similar chunks.
  2. BM25 keyword search catches exact term matches vectors miss.
  3. Results from both methods are combined with configurable weights (RRF fusion).

A cross‑encoder model (ms-marco-MiniLM-L-6-v2) scores each result against the query; the top 5 most relevant results are returned.

V3 RAG Architecture

Hybrid retrieval with RRF fusion and Cross‑Encoder reranking.

Accuracy

DatasetScore
Focused FAQ100 %
Real documents83‑86 %
LLM‑as‑Judge84‑98 %

Performance

VersionCold startWarm query
V226.8 s0.31 s
V313.9 s0.02 s

Supported document types: Python, Markdown, PDF, Jupyter notebooks, JavaScript, TypeScript.

Memory Architecture

Inspired by Mem0, the memory system is hierarchical, scoped, and persistent.

Flat memory systems don’t scale—hundreds of memories become noisy. A‑Modular‑Kingdom organizes memory into scopes:

ScopePersistenceExample
global_rulesForever, all projects“Always use type hints”
global_preferencesForever, all projects“Prefer concise responses”
global_personasForever, all projectsReusable agent personalities
project_contextCurrent project only“Uses FastAPI backend”

You don’t need to specify scopes manually; the system infers them:

save_memory("User prefers dark mode")      # → global_preferences
save_memory("Always validate input")      # → global_rules
save_memory("Uses PostgreSQL")            # → project_context

When searching, results are returned in priority order:

  1. Global rules (highest)
  2. Global preferences
  3. Global personas
  4. Project context

Thus persistent instructions always surface first.

Configuration Example (Claude Desktop)

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "a-modular-kingdom": {
      "command": "python",
      "args": ["/path/to/src/agent/host.py"]
    }
  }
}

Python usage:

from smolagents import ToolCallingAgent, ToolCollection
from mcp import StdioServerParameters

params = StdioServerParameters(
    command="python",
    args=["/path/to/host.py"]
)

with ToolCollection.from_mcp(params) as tools:
    agent = ToolCallingAgent(tools=list(tools.tools))
    result = agent.run("Search the codebase for auth logic")

Using the Library Without the Full Server

Install only the RAG and memory components:

pip install rag-mem
from memory_mcp import RAGPipeline, MemoryStore

# RAG
pipeline = RAGPipeline(document_paths=["./docs"])
pipeline.index()
results = pipeline.search("How does auth work?")

# Memory
memory = MemoryStore()
memory.add("Important fact")
mem_results = memory.search("facts")

CLI shortcuts:

memory-mcp init
memory-mcp serve --docs ./documents
memory-mcp index ./path/to/files

Component Overview

ComponentProvider / Implementation
EmbeddingsOllama, sentence-transformers, OpenAI
Vector DBQdrant (local or cloud)
Keyword SearchBM25 (rank-bm25)
RerankingCross‑Encoder (ms-marco-MiniLM-L-6-v2)
Memory StoreQdrant with hierarchical scoping
ProtocolModel Context Protocol (MCP)

Example Use Case: Multi‑Agent Emotional AI (Kaggle Hackathon)

A‑Modular‑Kingdom powered a multi‑agent system built on Gemma 3n:

  • Vocal Emotion Detection – analyzes speech.
  • Vision – Gemma 3n assesses facial expressions.
  • Combined emotion tag + transcribed query → Router Agent → specialist sub‑agents.

Each sub‑agent uses A‑Modular‑Kingdom’s RAG and Memory modules for personalized, context‑aware responses. Additional tools (Playwright MCP) enable web interactions.

Stop rebuilding. Start building.

Back to Blog

Related posts

Read more »