Why I Wrote an AI Workflow Engine
Source: Dev.to
📖 A Recurring Pattern
“I kept building the same things over and over.”
Over the past couple of years I’ve been working on a handful of applications that all share a common thread:
| Application | Core Need |
|---|---|
| Memoir Writing Assistant | Long‑running, conversational workflows that guide users through deeply personal storytelling. |
| Small‑Business Finance App | Business‑process workflows (approvals, reconciliations, recurring tasks) layered with AI for categorisation and insights. |
| Customer Chatbot (current client) | RAG‑driven knowledge‑base navigation, support handling, parts selection & ordering. |
| Academic Research Assistant | LLM‑powered analysis and synthesis of research projects. |
All of them required three things I kept having to rebuild from scratch:
- Durable workflow orchestration
- Integration with AI agents
- Control over AI costs
Each new project boiled down to the same steps:
- Queue the work
- Call the LLM
- Track progress & retries
- Persist state
- Notify the user
I knew there had to be a better way, so I started looking.
🔎 Existing Tools – What’s Missing?
| Tool | Strengths | Weaknesses (for AI‑centric workloads) |
|---|---|---|
| Temporal | Gold‑standard durable execution | Heavy maintenance overhead, no awareness of token usage or LLM pricing. |
| Airflow | Mature batch scheduler | Designed for cron‑based DAGs, not event‑driven interactive flows; massive Docker image & RAM footprint. |
| LangChain / LangGraph | AI‑focused, rich LLM tooling | Python‑only, no native durability; production requires LangSmith (proprietary, ~$1k per M executions). |
Bottom line: Existing solutions are either operationally heavy & AI‑ignorant or AI‑aware but lack durability & independence.
💡 Core Realisation
In AI applications, LLM cost is an application‑infrastructure concern.
When a workflow calls an LLM, the cost of that call should influence the next step:
- Can I afford Claude Sonnet, or should I fall back to Haiku?
- Has this workflow already burned through its budget?
- Can I reuse a cached result instead of making another API call?
These decisions must be made during execution, not after the fact. Therefore, cost tracking belongs in the orchestrator itself, not as an after‑the‑fact dashboard.
🚀 Introducing Kruxia Flow
A durable workflow engine purpose‑built for AI applications.
| Characteristic | Details |
|---|---|
| Binary | Single Rust binary (~7.5 MiB) |
| Persistence | PostgreSQL (no Kafka, Elasticsearch, Cassandra) |
| Deployment | One binary + one DB → tiny Docker image (≈ 63 MiB) |
| Portability | Runs on a Raspberry Pi Zero, cloud VMs, or any container host |
AI‑Native Core Features
-
Automatic Cost Tracking
- Tokens counted & priced in real‑time.
- Supports Anthropic, OpenAI, Google Gemini, and self‑hosted Ollama models.
-
Budget Enforcement
- Set a budget per workflow or per activity.
- Engine checks budget before each LLM call; can abort or alert if exceeded.
-
Cost‑Aware Model Fallback
- Define a fallback chain (e.g., Claude Sonnet → Claude Haiku → Ollama).
- Orchestrator picks the most capable model that fits the remaining budget.
-
Semantic Caching
- Similar queries hit a cache instead of the API.
- Reduces redundant calls by 50‑80 % for common patterns (FAQ, RAG).
-
Python SDK
- Define workflows & custom workers in Python.
- Built‑in support for
pandas,DuckDB,scikit‑learn.
📊 Benchmarks (Kruxia Flow vs. Temporal & Airflow)
| Metric | Kruxia Flow | Temporal | Airflow |
|---|---|---|---|
| Throughput | 93 workflows / s | 66 workflows / s | 8 workflows / s |
| Peak Memory | 328 MiB | ~1 GiB (varies) | 7.2 GiB |
| Docker Image Size | 63 MiB | > 500 MiB | > 1 GiB |
| Hardware Tested | 2 vCPU, 4 GiB RAM (Linux) | Same | Same |
| Raspberry Pi Zero | ✅ Runs | ❌ Too heavy | ❌ Too heavy |
👥 Who Is Kruxia Flow For?
- AI startups shipping agents to production who need real‑time cost visibility.
- Small businesses wanting workflow automation without a five‑figure infrastructure bill.
- Data teams that combine batch pipelines, ML training, NLP processing, and LLM agents while staying operationally lightweight.
If you fit any of the above, Kruxia Flow was built for you.
📜 Licensing & Availability
- Core engine, LLM cost tracking, budget enforcement, multi‑provider support, token streaming – AGPL‑3.0 (open source).
- Python SDK – MIT‑licensed.
Repository: https://github.com/kruxia/kruxia‑flow (public, actively maintained)
🛠️ Getting Started (Quick‑Start)
# 1️⃣ Pull the Docker image (63 MiB)
docker pull ghcr.io/kruxia/kruxia-flow:latest
# 2️⃣ Run PostgreSQL (if you don’t have one already)
docker run -d \
-e POSTGRES_PASSWORD=secret \
-e POSTGRES_USER=kruxia \
-e POSTGRES_DB=kruxia \
-p 5432:5432 \
postgres:15
# 3️⃣ Start Kruxia Flow
docker run -d \
-e DATABASE_URL=postgresql://kruxia:secret@host.docker.internal:5432/kruxia \
-p 8080:8080 \
ghcr.io/kruxia/kruxia-flow:latest
# 4️⃣ Install the Python SDK
pip install kruxia-flow-sdk
Now you can define a workflow in Python:
from kruxia import workflow, activity, ModelSelector
@workflow
def memoir_assistant(user_id: str):
# Step 1 – fetch user profile (cached)
profile = await activity(fetch_profile, user_id)
# Step 2 – generate outline (cost‑aware)
outline = await activity(
generate_outline,
profile,
model=ModelSelector(preferred="claude-sonnet", fallback="claude-haiku")
)
# Step 3 – store result
await activity(save_outline, user_id, outline)
🎉 Closing Thoughts
Existing workflow engines either burden you with operational complexity or ignore the economics of LLM usage.
Kruxia Flow bridges that gap by delivering a tiny, Rust‑powered, AI‑aware orchestrator that:
- Tracks costs in real time
- Enforces budgets before they’re exceeded
- Falls back to cheaper models automatically
- Caches semantically similar queries
All while staying lightweight, open source, and easy to run for solo developers or small teams.
Happy building! 🚀
Roadmap
- Semantic caching, a web dashboard, and a TypeScript SDK are on the roadmap.
I’m still building, and there’s a lot of road ahead. But the foundation is solid, and I’m already using it in my own projects — which, after all, is why I built it in the first place.
If any of this resonates, I’d be glad to have you take a look. The code is on GitHub, and there’s a Discord where I’m happy to talk about the design, the roadmap, or working with AI.