Homelab AI stack 2026 — what to run and in what order
Source: Dev.to
TL;DR
Stop running your AI brain on someone else’s servers.
Here’s the exact stack I run on my homelab — in the order that actually makes sense to deploy it.
The models have crossed a threshold: qwen2.5:32b running locally on a decent machine beats GPT‑3.5 on most developer tasks. It’s free, private, offline, and you own every token.
Self‑hosting your AI stack isn’t a nerd flex anymore; it’s good engineering hygiene. You wouldn’t run production on someone else’s laptop—why run your reasoning on their servers?
1. Reverse proxy & TLS (Traefik)
Before anything else gets internet‑exposed, set up Traefik. It provides automatic TLS, reverse proxy, and a single entry point.
docker run -d \
-p 80:80 -p 443:443 \
-v /var/run/docker.sock:/var/run/docker.sock \
traefik:v3.0
Don’t skip this step. Everything else sits behind Traefik.
2. Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
Run your first model:
ollama run qwen2.5:32b
You can swap model names freely (e.g., gemma3, mistral, phi4, llama3.2). All are free and require no API key.
Minimum viable hardware
| Model size | Recommended RAM |
|---|---|
| 7 B | 16 GB |
| 32 B | 32 GB+ |
Apple Silicon M‑series handles this well.
3. Chat‑style UI (Open WebUI)
A ChatGPT‑style interface that connects directly to Ollama, supports multiple models, conversation history, and document upload.
docker run -d -p 3000:8080 \
--add-host=host.docker.internal:host-gateway \
-v open-webui:/app/backend/data \
ghcr.io/open-webui/open-webui:main
This is where local AI stops being a toy and becomes a workflow tool.
4. Automation (n8n)
Connect your LLM to everything: email, webhooks, APIs, databases, smart home, etc.
docker run -d -p 5678:5678 \
-v n8n_data:/home/node/.n8n \
n8nio/n8n
Example workflow
- Email arrives → n8n sends it to Ollama.
- Ollama categorizes and drafts a reply.
- You review the draft.
Zero cloud, full privacy.
5. Unified OpenAI‑compatible endpoint (LiteLLM)
Once you have multiple models, LiteLLM gives you a single OpenAI‑compatible endpoint so your apps stop caring which backend they hit.
model_list:
- model_name: local-fast
litellm_params:
model: ollama/qwen2.5:7b
api_base: http://localhost:11434
- model_name: local-heavy
litellm_params:
model: ollama/qwen2.5:32b
api_base: http://localhost:11434
6. What the stack enables
- Anyone can run
ollama run llama3.2and ask it questions. - The real power appears when your homelab starts acting autonomously—reading your emails, monitoring services, briefing you every morning—with no data leaving your network.
That’s the stack that gets you there.
Signal covers AI tools, automation, and homelab setups—what actually works, tested on real hardware. No hype.