A-Modular-Kingdom - The Infrastructure Layer AI Agents Deserve
Source: Dev.to
Introduction
Every AI agent I built had the same problem: I kept rebuilding the same infrastructure from scratch.
RAG system? Build it again.
Long‑term memory? Implement it again.
Web search, code execution, vision? Wire them up again.
After the third project I extracted everything into a single, production‑ready foundation that any agent can plug into via the Model Context Protocol (MCP).
A‑Modular‑Kingdom is that foundation.
Getting Started
Start the MCP server:
python src/agent/host.py
Now any AI agent—Claude Desktop, Gemini, custom chatbots—instantly gets:
- Document retrieval (RAG) with Qdrant + BM25 + Cross‑Encoder reranking
- Hierarchical memory that persists across sessions and projects
- 10+ tools: web search, browser automation, code execution, vision, TTS/STT
One server. Unlimited applications.
Tools
| Tool | What It Does |
|---|---|
query_knowledge_base | Search documents with hybrid retrieval (vector + keyword + reranking) |
save_memory | Store memories with automatic scope inference |
search_memories | Retrieve with priority: global rules → preferences → project context |
save_fact | Structured fact storage with metadata |
set_global_rule | Persistent instructions across all sessions |
list_all_memories | View everything stored |
delete_memory | Remove by ID |
web_search | DuckDuckGo integration |
browser_automation | Playwright scraping (text + screenshots) |
code_execute | Safe Python sandbox |
analyze_media | Ollama vision for images/videos |
text_to_speech | Multiple engines (pyttsx3, gtts, kokoro) |
speech_to_text | Whisper transcription |
Retrieval Architecture
Most RAG implementations are naive: embed documents, find nearest neighbors, return results. This works for demos but fails in production.
A‑Modular‑Kingdom uses a three‑stage pipeline:
- Vector search (Qdrant Cloud) finds semantically similar chunks.
- BM25 keyword search catches exact term matches vectors miss.
- Results from both methods are combined with configurable weights (RRF fusion).
A cross‑encoder model (ms-marco-MiniLM-L-6-v2) scores each result against the query; the top 5 most relevant results are returned.
V3 RAG Architecture
Hybrid retrieval with RRF fusion and Cross‑Encoder reranking.
Accuracy
| Dataset | Score |
|---|---|
| Focused FAQ | 100 % |
| Real documents | 83‑86 % |
| LLM‑as‑Judge | 84‑98 % |
Performance
| Version | Cold start | Warm query |
|---|---|---|
| V2 | 26.8 s | 0.31 s |
| V3 | 13.9 s | 0.02 s |
Supported document types: Python, Markdown, PDF, Jupyter notebooks, JavaScript, TypeScript.
Memory Architecture
Inspired by Mem0, the memory system is hierarchical, scoped, and persistent.
Flat memory systems don’t scale—hundreds of memories become noisy. A‑Modular‑Kingdom organizes memory into scopes:
| Scope | Persistence | Example |
|---|---|---|
global_rules | Forever, all projects | “Always use type hints” |
global_preferences | Forever, all projects | “Prefer concise responses” |
global_personas | Forever, all projects | Reusable agent personalities |
project_context | Current project only | “Uses FastAPI backend” |
You don’t need to specify scopes manually; the system infers them:
save_memory("User prefers dark mode") # → global_preferences
save_memory("Always validate input") # → global_rules
save_memory("Uses PostgreSQL") # → project_context
When searching, results are returned in priority order:
- Global rules (highest)
- Global preferences
- Global personas
- Project context
Thus persistent instructions always surface first.
Configuration Example (Claude Desktop)
Add to claude_desktop_config.json:
{
"mcpServers": {
"a-modular-kingdom": {
"command": "python",
"args": ["/path/to/src/agent/host.py"]
}
}
}
Python usage:
from smolagents import ToolCallingAgent, ToolCollection
from mcp import StdioServerParameters
params = StdioServerParameters(
command="python",
args=["/path/to/host.py"]
)
with ToolCollection.from_mcp(params) as tools:
agent = ToolCallingAgent(tools=list(tools.tools))
result = agent.run("Search the codebase for auth logic")
Using the Library Without the Full Server
Install only the RAG and memory components:
pip install rag-mem
from memory_mcp import RAGPipeline, MemoryStore
# RAG
pipeline = RAGPipeline(document_paths=["./docs"])
pipeline.index()
results = pipeline.search("How does auth work?")
# Memory
memory = MemoryStore()
memory.add("Important fact")
mem_results = memory.search("facts")
CLI shortcuts:
memory-mcp init
memory-mcp serve --docs ./documents
memory-mcp index ./path/to/files
Component Overview
| Component | Provider / Implementation |
|---|---|
| Embeddings | Ollama, sentence-transformers, OpenAI |
| Vector DB | Qdrant (local or cloud) |
| Keyword Search | BM25 (rank-bm25) |
| Reranking | Cross‑Encoder (ms-marco-MiniLM-L-6-v2) |
| Memory Store | Qdrant with hierarchical scoping |
| Protocol | Model Context Protocol (MCP) |
Example Use Case: Multi‑Agent Emotional AI (Kaggle Hackathon)
A‑Modular‑Kingdom powered a multi‑agent system built on Gemma 3n:
- Vocal Emotion Detection – analyzes speech.
- Vision – Gemma 3n assesses facial expressions.
- Combined emotion tag + transcribed query → Router Agent → specialist sub‑agents.
Each sub‑agent uses A‑Modular‑Kingdom’s RAG and Memory modules for personalized, context‑aware responses. Additional tools (Playwright MCP) enable web interactions.
Links
- Website – https://masihmoafi.com/blog/a-modular-kingdom
- GitHub – https://github.com/masihmoafi/a-modular-kingdom
- PyPI – https://pypi.org/project/a-modular-kingdom
- Medium article – https://medium.com/@masihmoafi/a-modular-kingdom
- YouTube demo – https://www.youtube.com/watch?v=…
Stop rebuilding. Start building.