Solved: I built an automated Talos + Proxmox + GitOps homelab starter (ArgoCD + Workflows + DR)
Source: Dev.to
Executive Summary
TL;DR: This blog post solves the problem of manual, inconsistent, and fragile homelab setups by detailing an automated, resilient system. It integrates Talos Linux, Proxmox, and a GitOps approach using ArgoCD and Argo Workflows for infrastructure provisioning, application management, and strategic disaster recovery.
đŻ Key Takeaways
- Proxmox VE + Talos Linux â a robust, APIâdriven foundation for automated VM provisioning and a secure, immutable Kubernetes OS.
- ArgoCD â implements a GitOps workflow that continuously syncs Kubernetes cluster configurations and applications from a Git repository, eliminating configuration drift and enabling automated deployments.
- Argo Workflows â orchestrates complex operational tasks such as automated backups (Proxmox VMs via PBS, Kubernetes apps via Velero) and disasterârecovery testing, greatly enhancing homelab resilience and recovery capabilities.
Building a robust, automated homelab or smallâscale IT environment presents unique challenges. This post details how integrating Talos Linux, Proxmox, and a GitOps approach with ArgoCD, Argo Workflows, and strategic Disaster Recovery (DR) can transform a manual, fragile setup into a resilient, selfâhealing system.
Symptoms: The Homelab Headache
Many IT professionals building or maintaining homelabs encounter a recurring set of frustrations that hinder scalability, reliability, and efficient management. These symptoms usually stem from a lack of automation and a reactive approach to infrastructure.
1. Manual VM & Kubernetes provisioning
- What happens: New virtual machines are created on hypervisors (e.g., Proxmox) â OS is installed manually â networking is configured â Kubernetes cluster is bootstrapped.
- Impact:
- Extremely timeâconsuming.
- Prone to human error.
- Each node becomes a âsnowflake,â making consistency impossible.
2. Configuration drift & inconsistency
- What happens: Manual tweaks to VMs, Kubernetes manifests, or network settings diverge from the intended state.
- Impact:
- Environments quickly lose alignment with the desired configuration.
- Troubleshooting becomes difficult.
- Deployments become unreliable because the desired state isnât codified or enforced.
3. Lack of automated deployments & updates
- What happens: Deploying new apps, updating services, or patching the OS requires manual SSH sessions, adâhoc scripts, or dashboard clicks.
- Impact:
- Slow, inefficient workflow.
- Increased risk of downtime or unexpected failures.
4. Fragile disasterârecovery (DR) strategy
- What happens: No clear, automated DR plan; backups are manual, often outdated, and recovery procedures are untested.
- Impact:
- A single hardware failure or misconfiguration can cause data loss.
- Service outages become prolonged and complex to resolve.
5. Operational burden of Kubernetes
- What happens: Managing the control plane, keeping nodes upâtoâdate, and ensuring application resilience require constant attention.
- Impact:
- High operational overhead.
- Complexity can quickly overwhelm a homelab enthusiast without automation.
SolutionâŻ1: ProxmoxâŻ+âŻTalos for a Robust & Minimalist Infrastructure Base
The foundation of a reliable homelab begins with a solid, automated infrastructure layer. This solution combines Proxmox VE for virtualization with Talos Linux for a secure, minimal, and immutable Kubernetes operating system.
ProxmoxâŻVE â The Virtualization Workhorse
Proxmox VE provides a powerful, openâsource platform for managing virtual machines, containers, and storage. Its APIâdriven nature makes it an ideal candidate for infrastructure automation, allowing you to provision VMs programmatically instead of relying on manual GUI clicks.
Example: Automating VM Provisioning (Conceptual)
#!/usr/bin/env bash
# Basic VM creation using qm (simplified for illustration)
# In practice, wrap this in Terraform, Ansible, etc.
VMID="101"
VMNAME="talos-node-01"
MEM="4096" # 4âŻGB RAM
CPUS="2"
DISK_SIZE="32G"
ISO_STORAGE="local:iso"
OS_TYPE="l26"
NET_BRIDGE="vmbr0"
# 1ď¸âŁ Create the VM
qm create "$VMID" \
--name "$VMNAME" \
--memory "$MEM" \
--cores "$CPUS" \
--ostype "$OS_TYPE"
# 2ď¸âŁ Attach storage
qm set "$VMID" \
--scsihw virtio-scsi-pci \
--scsi0 "local-lvm:$DISK_SIZE"
# 3ď¸âŁ Add network
qm set "$VMID" \
--net0 "virtio,bridge=$NET_BRIDGE"
# 4ď¸âŁ CloudâInit CDâROM
qm set "$VMID" \
--ide2 "local:cloudinit" \
--boot "order=ide2"
# 5ď¸âŁ Set boot order
qm set "$VMID" \
--boot "order=ide2;scsi0"
# 6ď¸âŁ Start the VM
qm start "$VMID"
Note: The CloudâInit payload should contain the Talos installer command and any required ignition files.
Talos Linux â KubernetesâNative OS
Talos Linux is a secure, minimal, and immutable operating system designed specifically for running Kubernetes. It eliminates unnecessary components, reducing the attack surface and operational overhead. Its APIâdriven management model aligns perfectly with a GitOps approach.
- Minimal Footprint: No shell, no package manager, no unnecessary services.
- Immutability: The OS never drifts; all changes are applied via atomic updates.
- APIâDriven: Configuration and operations are performed via a gRPC API, ideal for automation.
- Enhanced Security: Reduced attack surface and cryptographic integrity checks.
Example: Generating Talos Configuration
#!/usr/bin/env bash
# 1ď¸âŁ Generate cluster config (controlâplane + workers)
talosctl gen config my-cluster https://<CONTROL_PLANE_IP>:6443
# 2ď¸âŁ Apply to each node
talosctl apply-config \
--insecure \
--nodes <NODE_IP> \
--file worker.yaml
# 3ď¸âŁ Bootstrap control plane
talosctl bootstrap \
--nodes <CONTROL_PLANE_IP>
These commands are typically wrapped in CI/CD pipelines so the entire provisioningâtoâbootstrap process is fully automated.
Whatâs Next?
The upcoming sections will cover:
- GitOps with ArgoCD â Keeping Kubernetes manifests in sync with a Git repository.
- Argo Workflows for Automation â Orchestrating backups, restores, and DR drills.
- Disaster Recovery Strategy â Using Proxmox Backup Server (PBS) and Velero to protect both VM and Kubernetes workloads.
SolutionâŻ1 â Talos Configuration (Bootstrap the Cluster)
talosctl gen config my-talos-cluster https://192.168.1.10:6443 \
--control-plane 192.168.1.10,192.168.1.11,192.168.1.12 \
--workers 192.168.1.13,192.168.1.14 \
--output ./cluster-configs \
--with-kubespan
The command creates controlplane.yaml and worker.yaml in ./cluster-configs. Apply them with:
# Controlâplane node
talosctl apply-config \
--nodes 192.168.1.10 \
--file ./cluster-configs/controlplane.yaml \
--preserve-client-id \
--wait
# Worker node
talosctl apply-config \
--nodes 192.168.1.13 \
--file ./cluster-configs/worker.yaml \
--preserve-client-id \
--wait
SolutionâŻ2 â GitOps with ArgoâŻCD (Automated Configuration Management)
GitOps Principles
| Principle | Description |
|---|---|
| Declarative | Desired state is declared in Git (YAML manifests). |
| Versionâcontrolled | All changes are committed, providing audit history and easy rollbacks. |
| Automated | Git changes automatically trigger cluster updates. |
| Reconciled | A controller continuously aligns the actual cluster state with the Gitâdefined desired state. |
ArgoâŻCD â The GitOps Controller
Key features include automated sync, rollback/rollâforward, health monitoring, and multiâcluster support.
Deploying an Application with ArgoâŻCD
# applications/argocd/application-nginx.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: nginx-hello-world
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/your-org/my-homelab-gitops.git
targetRevision: HEAD
path: applications/nginx-hello-world
destination:
server: https://kubernetes.default.svc
namespace: default
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
Repository layout (example)
my-homelab-gitops/
âââ infrastructure/
â âââ talos/
â âââ cluster-config-patches/
âââ applications/
â âââ nginx-hello-world/
â â âââ deployment.yaml
â â âââ service.yaml
â âââ argocd/
â âââ application-nginx.yaml
âââ argocd-apps/
âââ homelab-infra.yaml
âââ homelab-apps.yaml
When the Application manifest is committed, ArgoâŻCD automatically deploys and manages the nginxâhelloâworld app, keeping it in sync with Git.
SolutionâŻ3 â Argo⯠Workflows & Integrated DR (Operational Automation & Resilience)
Typical Homelab Use Cases
| Use Case | Description |
|---|---|
| Automated Backups | Trigger Proxmox VM backups and Velero Kubernetes backups. |
| DR Testing | Spin up test environments, restore backups, validate services. |
| Infrastructure Provisioning | Orchestrate creation of new Talos nodes on Proxmox. |
| Application Release Pipelines | Manage complex deployments with preâ/postâhooks. |
Example: Conceptual Backup Workflow
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: backup-and-verify-
spec:
entrypoint: backup-and-verify
templates:
- name: backup-and-verify
steps:
- - name: snapshot-vm
template: vm-snapshot
- - name: backup-k8s
template: k8s-backup
- - name: verify-backup
template: verify-backup
- name: vm-snapshot
container:
image: your-registry/proxmox-cli:latest
command: ["/bin/sh", "-c"]
args:
- |
echo "Creating snapshot for VM 101..."
proxmox-cli snapshot create --vm-id 101 --name backup-$(date +%s)
- name: k8s-backup
container:
image: velero/velero:latest
command: ["/velero", "backup", "create", "daily-backup", "--wait"]
- name: verify-backup
container:
image: appropriate/curl:latest
command: ["/bin/sh", "-c"]
args:
- |
echo "Verifying Proxmox snapshot..."
proxmox-cli snapshot list --vm-id 101 | grep backup-
echo "Verifying Velero backup..."
velero backup get daily-backup | grep Completed
Schedule this workflow with a CronWorkflow for nightly execution, add alerting, and extend it with restoration steps for full DR testing.
Integrated Disaster Recovery (DR) Overview
- Infrastructure as Code: Rebuild ProxmoxâŻ+âŻTalos from Git after a disaster.
- ArgoCD: Sync applications automatically to a fresh cluster.
- Proxmox Backup Server (PBS): VMâlevel backups for base OS and stateful workloads.
- Velero: Kubernetesânative backups of resources and persistent volumes.
- Argo Workflows: Automate the entire recovery pipelineâfrom VM provisioning to backup restoration and health verification.
Feature Comparison
| Feature | Manual DR Strategy | Automated GitOps DR Strategy |
|---|---|---|
| RTO | Hours to days | Minutes to hours |
| RPO | Variable, depends on last manual backup | Low, frequent automated backups |
| Consistency | Highly variable, prone to human error | High, enforced by Git and automation |
| Testing | Infrequent, disruptive | Frequent, automated, nonâdisruptive (sandbox) |
| Infrastructure Recovery | Manual VM recreation, OS install | Automated provisioning via IaC |
| Application Recovery | Manual redeployment, config, data restore | ArgoCD autoâsync, Velero restore |
| Complexity | High for large environments | High initial setup, low ongoing maintenance |
| Operational Cost | High labor, extended downtime | Lower labor, quicker recovery, reduced impact |
Conclusion
By adopting a comprehensive strategy that leverages
- Proxmox for virtualization
- Talos Linux for a minimalist Kubernetes OS
- GitOps driven by ArgoâŻCD and ArgoâŻWorkflows for automation and disaster recovery
you can transform your homelab into a selfâhealing, consistent, and secure environment. The upfront effort pays off in stability, scalability, and peace of mind, letting you focus on experimentation rather than firefighting.
