The Homelab AI Stack in 2026: What Self-Hosters Are Actually Running

Published: 2 days ago (March 4, 2026 at 05:58 PM EST)

5 min read

Source: Dev.to

Spend five minutes on r/selfhosted and you’ll notice: the conversations have changed.

Two years ago everyone asked “what should I run?” Now they’re sharing sophisticated stacks that rival small‑business infrastructure. The self‑hosting AI movement has matured. Here’s what’s actually worth deploying in 2026.

The Core Stack (What Stayed)

Ollama — Local LLM Runtime

Ollama won. It beat LocalAI on simplicity, beat llama.cpp on UX, and the model library makes pulling new models trivial.

# Install
curl -fsSL https://ollama.ai/install.sh | sh

# Pull the best‑value model for 16 GB RAM
ollama pull qwen2.5:14b

# Or for 24 GB+ (M4 Mac mini, high‑RAM PC)
ollama pull qwen2.5:32b

# Test immediately
ollama run qwen2.5:14b "Explain what makes a good Docker Compose file"

Hardware reality check

RAM	Practical model size	Typical use
8 GB	7 B	Basic tasks
16 GB	14 B	Solid capability
24 GB (M4 Mac mini sweet spot)	32 B	Near GPT‑4 quality
32 GB+	70 B	Excellent for everything

Open WebUI — The Interface

Deploys in ~2 minutes and gives you a ChatGPT‑equivalent UI locally.

# docker-compose.yml
services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    volumes:
      - open-webui:/app/backend/data
    environment:
      - OLLAMA_BASE_URL=http://host.docker.internal:11434
    ports:
      - "3000:8080"
    extra_hosts:
      - "host.docker.internal:host-gateway"
    restart: unless-stopped

volumes:
  open-webui:

n8n — Automation Brain

For connecting AI to everything else. Self‑hosted, no per‑workflow limits, full control.

Killer use case in 2026: n8n + Ollama = private AI automations that cost $0/month to run.

My actual running workflows:

Gmail → Ollama triage → priority flag → Telegram alert
RSS feeds → Ollama summary → daily digest at 7 am
Server logs → Ollama anomaly check → alert if weird

What Got Replaced in 2026

Replaced	Replaced by
LocalAI	Ollama
Flowise	n8n
Custom Python scripts	n8n workflows

Why? Ollama is more feature‑complete, n8n handles AI and everything else, and n8n workflows are inspectable, editable, and debuggable without touching code.

What Got Added in 2026

Whisper.cpp — Local Audio Transcription

brew install whisper-cpp   # or build from source for max performance

# Transcribe any audio file
whisper-cpp --model base.en audio.mp3

Use cases: meeting transcription, voice‑notes → text, local podcast search.

LiteLLM — The Unified Proxy

LiteLLM sits in front of all your AI models and presents a single OpenAI‑compatible API endpoint.

# docker-compose.yml (excerpt)
services:
  litellm:
    image: ghcr.io/berriai/litellm:main-latest
    ports:
      - "4000:4000"
    environment:
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
      - OPENAI_API_KEY=${OPENAI_API_KEY}
    volumes:
      - ./litellm_config.yaml:/app/config.yaml

Now every app in your stack — n8n, Open WebUI, your scripts — points to http://litellm:4000 and you switch models by editing a single config file.

ChromaDB + LlamaIndex — Private RAG

Search your own documents with AI. All local, all private.

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.vector_stores.chroma import ChromaVectorStore
import chromadb

# Index your documents
docs = SimpleDirectoryReader('/your/docs/folder').load_data()
db = chromadb.PersistentClient(path='./chroma_db')
collection = db.get_or_create_collection('my_docs')
store = ChromaVectorStore(chroma_collection=collection)

# Query them
index = VectorStoreIndex.from_documents(docs, vector_store=store)
engine = index.as_query_engine()
response = engine.query('What did we decide about the API architecture?')
print(response)

The Hardware Question

GPU server vs. Apple Silicon?

In 2026, for pure AI inference at homelab scale, Apple Silicon wins on value.

Device	Typical performance	Pros	Cons
M4 Mac mini (24 GB, ~$800)	32 B models @ 10‑15 tokens / sec	Silent, 30 W idle, no separate GPU, macOS = easy maintenance	Limited to Apple ecosystem
NVIDIA RTX 4090 server (24 GB VRAM)	Faster on large batches, better for fine‑tuning	Superior raw throughput, good for training	Loud, 450 W under load, Linux‑only, higher cost

Homelab with 1‑5 concurrent users (text tasks): Mac mini M4.
Serious inference throughput or training: GPU server.

The Monitoring Stack

Don’t run AI services without knowing when they break.

Uptime Kuma – health checks for Ollama, n8n, Open WebUI, etc.
Netdata – per‑container resource usage.
Loki + Grafana – aggregate logs from all containers.

# Example snippet for log collection (docker‑compose)
labels:
  - logging=promtail
  - logging_jobname=containerlogs

What I’d Set Up First on a New Server

In order, if starting from scratch:

Traefik – reverse proxy + automatic HTTPS (everything else goes behind it).
Ollama – pull qwen2.5:14b first, add others as needed.
Open WebUI – UI for chatting with the models.
n8n – automation workflows.
LiteLLM – unified API endpoint.
ChromaDB + LlamaIndex – private RAG.
Whisper.cpp – local transcription.
Monitoring stack – Uptime Kuma, Netdata, Loki + Grafana.

That’s the practical, battle‑tested stack many self‑hosters are running in 2026. Happy building!

Immediate Usable Interface

n8n — automation brain
LiteLLM — unified API proxy
Uptime Kuma — monitoring
Vaultwarden — password manager (you’ll need it)

The One Thing Most People Miss

Running models locally is only half the value.

The other half is connecting them to your actual workflow — your email, your calendar, your codebase, your documents. A local LLM that just answers questions in a chat window is the same as a very slow, private version of ChatGPT.

A local LLM wired into n8n that automatically triages your email, monitors your servers, and summarizes your notes — that’s actual leverage.

SIGNAL publishes weekly. Follow @signal-weekly for more practical builder content.

Next: How I use AI agents to automate the boring parts of running a homelab — specific n8n workflows, working code.

The Homelab AI Stack in 2026: What Self-Hosters Are Actually Running

The Core Stack (What Stayed)

Ollama — Local LLM Runtime

Hardware reality check

Open WebUI — The Interface

n8n — Automation Brain

What Got Replaced in 2026

What Got Added in 2026

Whisper.cpp — Local Audio Transcription

LiteLLM — The Unified Proxy

ChromaDB + LlamaIndex — Private RAG

The Hardware Question

The Monitoring Stack

What I’d Set Up First on a New Server

Immediate Usable Interface

The One Thing Most People Miss

Related posts

About Invisibility, Propaganda, and Assumptions of Incompetence

OpenID Connect Discovery 1.0 Deep Dive: OP's 'Self-Introduction' and Dynamic Configuration Retrieval

AI, Humanity, and the Loops We Break

PowerSkills: Giving AI Agents Control Over Windows with PowerShell

The Core Stack (What Stayed)

Ollama — Local LLM Runtime

Hardware reality check

Open WebUI — The Interface

n8n — Automation Brain

What Got Replaced in 2026

What Got Added in 2026

Whisper.cpp — Local Audio Transcription

LiteLLM — The Unified Proxy

ChromaDB + LlamaIndex — Private RAG

The Hardware Question

The Monitoring Stack

What I’d Set Up First on a New Server

Immediate Usable Interface

The One Thing Most People Miss

Related posts

About Invisibility, Propaganda, and Assumptions of Incompetence

OpenID Connect Discovery 1.0 Deep Dive: OP's 'Self-Introduction' and Dynamic Configuration Retrieval

AI, Humanity, and the Loops We Break

PowerSkills: Giving AI Agents Control Over Windows with PowerShell

Ollama — Local LLM Runtime

Open WebUI — The Interface

n8n — Automation Brain

Whisper.cpp — Local Audio Transcription

LiteLLM — The Unified Proxy

ChromaDB + LlamaIndex — Private RAG