Homelab AI stack 2026 — what to run and in what order

Published: 1 day ago (March 8, 2026 at 03:08 PM EDT)

3 min read

Source: Dev.to

TL;DR

Stop running your AI brain on someone else’s servers.
Here’s the exact stack I run on my homelab — in the order that actually makes sense to deploy it.

The models have crossed a threshold: qwen2.5:32b running locally on a decent machine beats GPT‑3.5 on most developer tasks. It’s free, private, offline, and you own every token.

Self‑hosting your AI stack isn’t a nerd flex anymore; it’s good engineering hygiene. You wouldn’t run production on someone else’s laptop—why run your reasoning on their servers?

1. Reverse proxy & TLS (Traefik)

Before anything else gets internet‑exposed, set up Traefik. It provides automatic TLS, reverse proxy, and a single entry point.

docker run -d \
  -p 80:80 -p 443:443 \
  -v /var/run/docker.sock:/var/run/docker.sock \
  traefik:v3.0

Don’t skip this step. Everything else sits behind Traefik.

2. Install Ollama

curl -fsSL https://ollama.ai/install.sh | sh

Run your first model:

ollama run qwen2.5:32b

You can swap model names freely (e.g., gemma3, mistral, phi4, llama3.2). All are free and require no API key.

Minimum viable hardware

Model size	Recommended RAM
7 B	16 GB
32 B	32 GB+

Apple Silicon M‑series handles this well.

3. Chat‑style UI (Open WebUI)

A ChatGPT‑style interface that connects directly to Ollama, supports multiple models, conversation history, and document upload.

docker run -d -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  ghcr.io/open-webui/open-webui:main

This is where local AI stops being a toy and becomes a workflow tool.

4. Automation (n8n)

Connect your LLM to everything: email, webhooks, APIs, databases, smart home, etc.

docker run -d -p 5678:5678 \
  -v n8n_data:/home/node/.n8n \
  n8nio/n8n

Example workflow

Email arrives → n8n sends it to Ollama.
Ollama categorizes and drafts a reply.
You review the draft.

Zero cloud, full privacy.

5. Unified OpenAI‑compatible endpoint (LiteLLM)

Once you have multiple models, LiteLLM gives you a single OpenAI‑compatible endpoint so your apps stop caring which backend they hit.

model_list:
  - model_name: local-fast
    litellm_params:
      model: ollama/qwen2.5:7b
      api_base: http://localhost:11434
  - model_name: local-heavy
    litellm_params:
      model: ollama/qwen2.5:32b
      api_base: http://localhost:11434

6. What the stack enables

Anyone can run ollama run llama3.2 and ask it questions.
The real power appears when your homelab starts acting autonomously—reading your emails, monitoring services, briefing you every morning—with no data leaving your network.

That’s the stack that gets you there.

Signal covers AI tools, automation, and homelab setups—what actually works, tested on real hardware. No hype.

Homelab AI stack 2026 — what to run and in what order

TL;DR

1. Reverse proxy & TLS (Traefik)

2. Install Ollama

Minimum viable hardware

3. Chat‑style UI (Open WebUI)

4. Automation (n8n)

5. Unified OpenAI‑compatible endpoint (LiteLLM)

6. What the stack enables

Related posts

Legal vs Legitimate: How AI Reimplementation is Undermining Copyleft and Open Source Ethics

I built MLShip — deploy your Streamlit or Gradio ML app in 60 seconds. No Docker. No AWS.

Zero-Friction Publishing: A Human-in-the-Loop Agentic CMS powered by Notion MCP

The AI Cold Start That Breaks Kubernetes Autoscaling