A-Modular-Kingdom - The Infrastructure Layer AI Agents Deserve

Published: 3 days ago (December 2, 2025 at 11:46 PM EST)

3 min read

Source: Dev.to

Introduction

Every AI agent I built had the same problem: I kept rebuilding the same infrastructure from scratch.
RAG system? Build it again.
Long‑term memory? Implement it again.
Web search, code execution, vision? Wire them up again.

After the third project I extracted everything into a single, production‑ready foundation that any agent can plug into via the Model Context Protocol (MCP).
A‑Modular‑Kingdom is that foundation.

Getting Started

Start the MCP server:

python src/agent/host.py

Now any AI agent—Claude Desktop, Gemini, custom chatbots—instantly gets:

Document retrieval (RAG) with Qdrant + BM25 + Cross‑Encoder reranking
Hierarchical memory that persists across sessions and projects
10+ tools: web search, browser automation, code execution, vision, TTS/STT

One server. Unlimited applications.

Tools

Tool	What It Does
`query_knowledge_base`	Search documents with hybrid retrieval (vector + keyword + reranking)
`save_memory`	Store memories with automatic scope inference
`search_memories`	Retrieve with priority: global rules → preferences → project context
`save_fact`	Structured fact storage with metadata
`set_global_rule`	Persistent instructions across all sessions
`list_all_memories`	View everything stored
`delete_memory`	Remove by ID
`web_search`	DuckDuckGo integration
`browser_automation`	Playwright scraping (text + screenshots)
`code_execute`	Safe Python sandbox
`analyze_media`	Ollama vision for images/videos
`text_to_speech`	Multiple engines (`pyttsx3`, `gtts`, `kokoro`)
`speech_to_text`	Whisper transcription

Retrieval Architecture

Most RAG implementations are naive: embed documents, find nearest neighbors, return results. This works for demos but fails in production.

A‑Modular‑Kingdom uses a three‑stage pipeline:

Vector search (Qdrant Cloud) finds semantically similar chunks.
BM25 keyword search catches exact term matches vectors miss.
Results from both methods are combined with configurable weights (RRF fusion).

A cross‑encoder model (ms-marco-MiniLM-L-6-v2) scores each result against the query; the top 5 most relevant results are returned.

V3 RAG Architecture

Hybrid retrieval with RRF fusion and Cross‑Encoder reranking.

Accuracy

Dataset	Score
Focused FAQ	100 %
Real documents	83‑86 %
LLM‑as‑Judge	84‑98 %

Performance

Version	Cold start	Warm query
V2	26.8 s	0.31 s
V3	13.9 s	0.02 s

Supported document types: Python, Markdown, PDF, Jupyter notebooks, JavaScript, TypeScript.

Memory Architecture

Inspired by Mem0, the memory system is hierarchical, scoped, and persistent.

Flat memory systems don’t scale—hundreds of memories become noisy. A‑Modular‑Kingdom organizes memory into scopes:

Scope	Persistence	Example
`global_rules`	Forever, all projects	“Always use type hints”
`global_preferences`	Forever, all projects	“Prefer concise responses”
`global_personas`	Forever, all projects	Reusable agent personalities
`project_context`	Current project only	“Uses FastAPI backend”

You don’t need to specify scopes manually; the system infers them:

save_memory("User prefers dark mode")      # → global_preferences
save_memory("Always validate input")      # → global_rules
save_memory("Uses PostgreSQL")            # → project_context

When searching, results are returned in priority order:

Global rules (highest)
Global preferences
Global personas
Project context

Thus persistent instructions always surface first.

Configuration Example (Claude Desktop)

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "a-modular-kingdom": {
      "command": "python",
      "args": ["/path/to/src/agent/host.py"]
    }
  }
}

Python usage:

from smolagents import ToolCallingAgent, ToolCollection
from mcp import StdioServerParameters

params = StdioServerParameters(
    command="python",
    args=["/path/to/host.py"]
)

with ToolCollection.from_mcp(params) as tools:
    agent = ToolCallingAgent(tools=list(tools.tools))
    result = agent.run("Search the codebase for auth logic")

Using the Library Without the Full Server

Install only the RAG and memory components:

pip install rag-mem

from memory_mcp import RAGPipeline, MemoryStore

# RAG
pipeline = RAGPipeline(document_paths=["./docs"])
pipeline.index()
results = pipeline.search("How does auth work?")

# Memory
memory = MemoryStore()
memory.add("Important fact")
mem_results = memory.search("facts")

CLI shortcuts:

memory-mcp init
memory-mcp serve --docs ./documents
memory-mcp index ./path/to/files

Component Overview

Component	Provider / Implementation
Embeddings	Ollama, `sentence-transformers`, OpenAI
Vector DB	Qdrant (local or cloud)
Keyword Search	BM25 (`rank-bm25`)
Reranking	Cross‑Encoder (`ms-marco-MiniLM-L-6-v2`)
Memory Store	Qdrant with hierarchical scoping
Protocol	Model Context Protocol (MCP)

Example Use Case: Multi‑Agent Emotional AI (Kaggle Hackathon)

A‑Modular‑Kingdom powered a multi‑agent system built on Gemma 3n:

Vocal Emotion Detection – analyzes speech.
Vision – Gemma 3n assesses facial expressions.
Combined emotion tag + transcribed query → Router Agent → specialist sub‑agents.

Each sub‑agent uses A‑Modular‑Kingdom’s RAG and Memory modules for personalized, context‑aware responses. Additional tools (Playwright MCP) enable web interactions.

A-Modular-Kingdom - The Infrastructure Layer AI Agents Deserve

Introduction

Getting Started

Tools

Retrieval Architecture

V3 RAG Architecture

Memory Architecture

Configuration Example (Claude Desktop)

Using the Library Without the Full Server

Component Overview

Example Use Case: Multi‑Agent Emotional AI (Kaggle Hackathon)

Links

Related posts

AWS re:Invent 2025 - Beyond web browsers: HITL and tool integration for Nova Agents (AIM3334)

AWS re:Invent 2025 - Zoox: Building Machine Learning Infrastructure for Autonomous Vehicles (AMZ304)

arreglar pinchazos cerca de mi en Alpedrete

AWS re:Invent 2025 - Intelligent security: Protection at scale from development to production-INV214