The Homelab AI Stack in 2026: What Self-Hosters Are Actually Running

Published: (March 4, 2026 at 05:58 PM EST)
5 min read
Source: Dev.to

Source: Dev.to

Spend five minutes on r/selfhosted and you’ll notice: the conversations have changed.

Two years ago everyone asked “what should I run?” Now they’re sharing sophisticated stacks that rival small‑business infrastructure. The self‑hosting AI movement has matured. Here’s what’s actually worth deploying in 2026.

The Core Stack (What Stayed)

Ollama — Local LLM Runtime

Ollama won. It beat LocalAI on simplicity, beat llama.cpp on UX, and the model library makes pulling new models trivial.

# Install
curl -fsSL https://ollama.ai/install.sh | sh

# Pull the best‑value model for 16 GB RAM
ollama pull qwen2.5:14b

# Or for 24 GB+ (M4 Mac mini, high‑RAM PC)
ollama pull qwen2.5:32b

# Test immediately
ollama run qwen2.5:14b "Explain what makes a good Docker Compose file"

Hardware reality check

RAMPractical model sizeTypical use
8 GB7 BBasic tasks
16 GB14 BSolid capability
24 GB (M4 Mac mini sweet spot)32 BNear GPT‑4 quality
32 GB+70 BExcellent for everything

Open WebUI — The Interface

Deploys in ~2 minutes and gives you a ChatGPT‑equivalent UI locally.

# docker-compose.yml
services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    volumes:
      - open-webui:/app/backend/data
    environment:
      - OLLAMA_BASE_URL=http://host.docker.internal:11434
    ports:
      - "3000:8080"
    extra_hosts:
      - "host.docker.internal:host-gateway"
    restart: unless-stopped

volumes:
  open-webui:

n8n — Automation Brain

For connecting AI to everything else. Self‑hosted, no per‑workflow limits, full control.

Killer use case in 2026: n8n + Ollama = private AI automations that cost $0/month to run.

My actual running workflows:

  • Gmail → Ollama triage → priority flag → Telegram alert
  • RSS feeds → Ollama summary → daily digest at 7 am
  • Server logs → Ollama anomaly check → alert if weird

What Got Replaced in 2026

ReplacedReplaced by
LocalAIOllama
Flowisen8n
Custom Python scriptsn8n workflows

Why? Ollama is more feature‑complete, n8n handles AI and everything else, and n8n workflows are inspectable, editable, and debuggable without touching code.

What Got Added in 2026

Whisper.cpp — Local Audio Transcription

brew install whisper-cpp   # or build from source for max performance

# Transcribe any audio file
whisper-cpp --model base.en audio.mp3

Use cases: meeting transcription, voice‑notes → text, local podcast search.

LiteLLM — The Unified Proxy

LiteLLM sits in front of all your AI models and presents a single OpenAI‑compatible API endpoint.

# docker-compose.yml (excerpt)
services:
  litellm:
    image: ghcr.io/berriai/litellm:main-latest
    ports:
      - "4000:4000"
    environment:
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
      - OPENAI_API_KEY=${OPENAI_API_KEY}
    volumes:
      - ./litellm_config.yaml:/app/config.yaml

Now every app in your stack — n8n, Open WebUI, your scripts — points to http://litellm:4000 and you switch models by editing a single config file.

ChromaDB + LlamaIndex — Private RAG

Search your own documents with AI. All local, all private.

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.vector_stores.chroma import ChromaVectorStore
import chromadb

# Index your documents
docs = SimpleDirectoryReader('/your/docs/folder').load_data()
db = chromadb.PersistentClient(path='./chroma_db')
collection = db.get_or_create_collection('my_docs')
store = ChromaVectorStore(chroma_collection=collection)

# Query them
index = VectorStoreIndex.from_documents(docs, vector_store=store)
engine = index.as_query_engine()
response = engine.query('What did we decide about the API architecture?')
print(response)

The Hardware Question

GPU server vs. Apple Silicon?

In 2026, for pure AI inference at homelab scale, Apple Silicon wins on value.

DeviceTypical performanceProsCons
M4 Mac mini (24 GB, ~$800)32 B models @ 10‑15 tokens / secSilent, 30 W idle, no separate GPU, macOS = easy maintenanceLimited to Apple ecosystem
NVIDIA RTX 4090 server (24 GB VRAM)Faster on large batches, better for fine‑tuningSuperior raw throughput, good for trainingLoud, 450 W under load, Linux‑only, higher cost
  • Homelab with 1‑5 concurrent users (text tasks): Mac mini M4.
  • Serious inference throughput or training: GPU server.

The Monitoring Stack

Don’t run AI services without knowing when they break.

  • Uptime Kuma – health checks for Ollama, n8n, Open WebUI, etc.
  • Netdata – per‑container resource usage.
  • Loki + Grafana – aggregate logs from all containers.
# Example snippet for log collection (docker‑compose)
labels:
  - logging=promtail
  - logging_jobname=containerlogs

What I’d Set Up First on a New Server

In order, if starting from scratch:

  1. Traefik – reverse proxy + automatic HTTPS (everything else goes behind it).
  2. Ollama – pull qwen2.5:14b first, add others as needed.
  3. Open WebUI – UI for chatting with the models.
  4. n8n – automation workflows.
  5. LiteLLM – unified API endpoint.
  6. ChromaDB + LlamaIndex – private RAG.
  7. Whisper.cpp – local transcription.
  8. Monitoring stack – Uptime Kuma, Netdata, Loki + Grafana.

That’s the practical, battle‑tested stack many self‑hosters are running in 2026. Happy building!

Immediate Usable Interface

  • n8n — automation brain
  • LiteLLM — unified API proxy
  • Uptime Kuma — monitoring
  • Vaultwarden — password manager (you’ll need it)

The One Thing Most People Miss

Running models locally is only half the value.

The other half is connecting them to your actual workflow — your email, your calendar, your codebase, your documents. A local LLM that just answers questions in a chat window is the same as a very slow, private version of ChatGPT.

A local LLM wired into n8n that automatically triages your email, monitors your servers, and summarizes your notes — that’s actual leverage.

SIGNAL publishes weekly. Follow @signal-weekly for more practical builder content.

Next: How I use AI agents to automate the boring parts of running a homelab — specific n8n workflows, working code.

0 views
Back to Blog

Related posts

Read more »

AI, Humanity, and the Loops We Break

🌅 Echoes of Experience — Standing in the Horizon There was a time when chaos shaped me. But the moment I chose myself—truly chose myself—everything shifted. I...