Skip to content

JeremiahM37/homelab-blueprint

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Homelab Blueprint

A three-node Proxmox cluster running media automation, gaming (with GPU passthrough + game streaming), AI/ML workloads, and self-hosted productivity tools — all on consumer hardware.

This repo documents the architecture, services, and lessons learned. No credentials or personal info — just the blueprint.


Cluster Overview

┌─────────────────────────────────────────────────────────────────────────────┐
│                     Proxmox VE Cluster ("HomeServer")                       │
│                        3 nodes · PVE 9.1.1                                  │
├──────────────────┬──────────────────────┬───────────────────────────────────┤
│    Node: pve     │  Node: MediaServer   │     Node: AIServer                │
│  (Gaming/Dev)    │  (Media Stack Host)  │   (AI/ML Workloads)               │
│                  │                      │                                   │
│  CPU: i7-9700K   │  CPU: Ryzen 7 8845HS │  CPU: Ryzen AI MAX+ 395          │
│  RAM: 32 GB      │  RAM: 28 GB          │  RAM: 128 GB                     │
│  GPU: RTX 2070*  │  iGPU: Radeon 780M   │  iGPU: Radeon 8060S              │
│                  │                      │                                   │
│  ┌────────────┐  │  ┌────────────────┐  │  ┌───────────────────────────┐   │
│  │ VM 103     │  │  │ LXC 200        │  │  │ LXC 101  Dev Workspace   │   │
│  │ Bazzite    │  │  │ Docker Host    │  │  │ LXC 102  Ollama + WebUI  │   │
│  │ Gaming VM  │  │  │ 35+ containers │  │  │ LXC 104  Work Env        │   │
│  │ 4c/24GB    │  │  │ 12c/24GB       │  │  │ LXC 105  ML Research     │   │
│  │ through    │  │  │ + nginx SSO    │  │  │                           │   │
│  │            │  │  │ + SearXNG      │  │  │                           │   │
│  └────────────┘  │  └────────────────┘  │  ├───────────────────────────┤   │
│                  │                      │  │ Homelab API   :9105       │   │
│  * Only GPU in   │  DAS: 8TB btrfs      │  │  └─ AI Agent (Jarvis)    │   │
│    system —      │  (USB TerraMaster)   │  │  └─ Download Guardian    │   │
│    host goes     │                      │  │  └─ Library Verification │   │
│    headless      │                      │  │  └─ Diagnostic Tools     │   │
│    when VM runs  │                      │  │ Doc RAG         :9103    │   │
│                  │                      │  │ Terraform       :9104    │   │
│                  │                      │  └───────────────────────────┘   │
├──────────────────┴──────────────────────┴───────────────────────────────────┤
│                                                                             │
│   ┌── AI Agent Brain ──────────────────────────────────────────────────┐    │
│   │  qwen3.5:35b-a3b on Ollama (native tool calling, 64+ tools)      │    │
│   │                                                                    │    │
│   │  Interfaces:                                                       │    │
│   │    Discord bot (*ai) ──┐                                           │    │
│   │    Homepage chat ──────┼── /api/ai/jarvis ── tool loop ── execute  │    │
│   │    Open WebUI (MCP) ──┘                                           │    │
│   │                                                                    │    │
│   │  Subsystems:                                                       │    │
│   │    Librarr (Go, 13 sources, Torznab/Newznab, OPDS, embedded UI)  │    │
│   │    Sentinel (Go, download guardian, library verification)         │    │
│   │    Diagnostics (file ops, log reading, library rescans)           │    │
│   │    SearXNG (self-hosted web search) ─── Open WebUI + Homepage     │    │
│   │    Homelab Agent (proactive: 7 modules, 3-tier AI repair,         │    │
│   │      every 5min, port 9106)                                      │    │
│   │    Nightly Tests (88 tests at 5 AM, Discord results)             │    │
│   └────────────────────────────────────────────────────────────────────┘    │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Hardware

Node CPU Cores/Threads RAM GPU Role
pve Intel i7-9700K 8c/8t 32 GB NVIDIA RTX 2070 (passthrough) Gaming / dev
MediaServer AMD Ryzen 7 8845HS 8c/16t 28 GB AMD Radeon 780M (iGPU) Media stack
AIServer AMD Ryzen AI MAX+ 395 16c/32t 128 GB AMD Radeon 8060S (iGPU) AI/ML workloads

Storage

  • Boot drives: Local LVM-thin on each node (~100 GB each)
  • DAS: TerraMaster TDAS enclosure, USB-attached to MediaServer, 8 TB btrfs
    • Mounted at /mnt/storage on the MediaServer host
    • Bind-mounted into LXC 200 at /data/media
    • All media services depend on this mount — they won't start if the DAS is disconnected

Network Topology

Internet
  │
  ├── Cloudflare Tunnel (cloudflared container)
  │     └── Reverse proxy to select services
  │
  ├── Tailscale mesh (node-to-node, stable IPs)
  │
  └── LAN (flat /24 network)
        │
        ├── pve node
        │     └── VM 103 (Bazzite) — bridged LAN + Tailscale
        │
        ├── MediaServer node
        │     └── LXC 200 — bridged LAN
        │           ├── nginx reverse proxy (*.homelab.internal)
        │           │     └── Authelia SSO (3-tier auth)
        │           ├── gluetun VPN (Mullvad WireGuard)
        │           │     ├── qBittorrent
        │           │     ├── Librarr
        │           │     └── Gamarr
        │           ├── dnsmasq (local DNS for *.homelab.internal)
        │           └── SearXNG (self-hosted web search)
        │
        └── AIServer node
              ├── LXC 101-106 — bridged LAN
              ├── Homelab API + AI Agent (port 9105)
              └── MCP server (Proxmox management)

VPN Architecture

Download clients (qBittorrent, Librarr, Gamarr) route through a gluetun container running Mullvad WireGuard. Services that need VPN protection use network_mode: "service:gluetun" in Docker Compose and expose their ports through gluetun.


AI Assistant

The homelab is controlled by a tool-calling AI agent powered by local LLMs (qwen3.5:35b-a3b / gemma4:e4b) running on Ollama with GPU-accelerated inference via the AMD 8060S iGPU's GTT unified memory. The agent has 66+ tools for managing every aspect of the homelab, and hits a 10/10 mean score on the internal eval harness across a canned set of real-world prompts.

On top of the basic tool-calling loop, the stack adds:

  • Semantic tool routing — embedding-similarity hybrid replaces keyword matching (catches "prove it" → verify_in_library)
  • Episodic memory — summaries of past conversations are embedded and retrieved on new messages, so context persists across sessions and interfaces
  • LLM observability — every Ollama call traced to SQLite with latency / token / tool-success metadata, visible in the PWA
  • Code execution sandboxexecute_code tool runs Python in a hardened bubblewrap namespace (no network, 5s CPU, 512MB RAM, fs-isolated)
  • Unified homelab RAG — ChromaDB ingest of Sonarr/Radarr/Jellyfin/git/agent-failures for "ask anything about my homelab"
  • Tier 2 verify step — after the smart fixer declares a fix, syntax/container-health/LLM-judge checks run; any failure reverts file edits from backups
  • Eval harness — canned prompts + LLM judge nightly, with a regression gate in the nightly test suite

A proactive Homelab Agent with 7 modules scans every 5 minutes and uses a 3-tier AI repair system (qwen3:1.7b fast tools → qwen3.5:35b smart fixer with verify → Claude Code backstop) to autonomously detect and fix issues.

How It Works

User (Discord / Homepage / Open WebUI)
  └── /api/ai/jarvis
        └── LLM decides which tools to call
              └── Executes against homelab APIs
                    └── Feeds results back to LLM
                          └── Generates natural language response

Interfaces

All three interfaces share the same agent brain:

Interface How Use Case
Discord bot *ai <anything> command Mobile / quick commands
Homepage widget Floating chat bubble (custom.js) Dashboard integration
Open WebUI MCP tools proxy Full chat UI with history

Key Subsystems

System Purpose
Librarr Go binary (17 MB), 13 search sources, Torznab/Newznab API, OPDS feed, Usenet/SABnzbd, modern Tailwind dark UI, series grouping, wishlist
Sentinel Go binary (11 MB), download guardian with SQLite persistence, definitive library verification
Homelab Agent Proactive monitoring (5min), 7 modules (container doctor, source intelligence, import watchdog, torrent doctor, system monitor, notifications, AI escalation), 3-tier repair system, failure memory
Diagnostic Tools File ops, log reading, permission fixes, library rescans — for AI escalation
SearXNG Self-hosted web search for AI agent, Homepage, Open WebUI
Paperless Tagging AI-driven document tagging and correspondent assignment
Gaming API Game search, ROM download, sync status, Bazzite VM control
Nightly Tests 88 end-to-end tests at 5 AM (~60s), Discord results notification

See AI Stack for full details.


Guests (VMs & Containers)

VMID Name Node Type Resources Purpose
101 project-env AIServer LXC 4c / 4 GB Development workspace
102 openclaw AIServer LXC 16c / 28 GB Local LLM chat (Ollama + Open-WebUI)
103 gaming-bazzite pve VM 7c / 24 GB Gaming VM with GPU passthrough
104 work-env AIServer LXC 4c / 4 GB Claude Code, Docker, dev tools
105 research-env AIServer LXC 16c / 16 GB AI/ML research with GPU passthrough
200 docker-server MediaServer LXC 12c / 24 GB Main Docker host (55+ containers)

Documentation

Doc Description
Docker Services All 55+ containers running on LXC 200
Gaming VM Bazzite setup, GPU passthrough, Sunshine/Moonlight streaming
Game Pipeline Automated game download → install → Steam library pipeline
AI Stack Tool-calling agent, Download Guardian, verification, diagnostics, RAG, SearXNG, Homelab Agent, nightly tests
Automation Download Guardian, Homelab Agent, backups, nightly tests, CrowdSec, Terraform, dual-channel alerts
Monitoring Homelab Agent (7 modules, 3-tier AI repair), n8n watchdog workflows, Homepage dashboard, storage monitoring
Media Stack Jellyfin, *arr apps, download automation
Networking VPN, Cloudflare tunnel, Tailscale mesh, nginx + Authelia SSO
Lessons Learned Gotchas, debugging tips, things that broke
Docker Compose (example) Sanitized compose file

Quick Stats

  • 7 guests across 3 nodes (6 LXC + 1 VM)
  • 55+ Docker containers on a single LXC
  • ~188 GB total RAM across the cluster
  • 8 TB DAS for media storage
  • GPU passthrough on 2 nodes (NVIDIA for gaming, AMD iGPU shared across 3 LXCs for ML)
  • AI tool-calling agent — 66+ tools, local LLMs (qwen3.5:35b-a3b + gemma4:e4b), GPU-accelerated via GTT unified memory, 10/10 stable on internal eval harness
  • Semantic tool routing — embedding-similarity tool retrieval (catches paraphrases the old keyword router missed); hybrid with keyword hits as a baseline floor
  • Episodic memory — past chats are summarized + embedded + retrieved cross-interface, so the assistant remembers context between Discord, PWA, and Open WebUI sessions
  • LLM observability — SQLite trace of every Ollama call (latency/tokens/tool success/errors); real-time stats rendered on the mobile PWA
  • Code execution sandbox — Python/bash tool runs in a bubblewrap-isolated namespace (no network, fs-isolated, resource-capped, timeout-enforced)
  • Unified homelab RAG — ChromaDB ingest of Sonarr/Radarr/Jellyfin/git/agent-failures; natural-language queries against every source with ?source= filter
  • Tier 2 verify step — smart fixer's fixes are independently validated (syntax / container health / LLM judge); file edits auto-revert from backup on failure
  • Eval harness — 10 canned prompts replayed nightly with LLM judge scoring, SQLite history, regression gate in nightly tests
  • 4 agent interfaces — Discord bot, Homepage chat widget, mobile PWA, Open WebUI (same brain, same tools)
  • Librarr (Go) — 18 MB binary, 13 search sources, Torznab/Newznab API, OPDS feed, Usenet/SABnzbd, multi-user with TOTP 2FA + OIDC/SSO, modern dark Tailwind UI with series grouping and wishlist
  • Sentinel (Go) — 11 MB binary, download guardian with SQLite persistence, definitive library verification (Jellyfin/ABS/Kavita/Sonarr/Radarr)
  • Homelab Agent — proactive monitoring every 5min, 7 modules (container doctor, source intelligence, import watchdog, torrent doctor, system monitor, notifications, AI escalation), 3-tier AI repair system, failure memory (SQLite)
  • Service integrations — Mealie recipe import, Changedetection URL watches, Linkwarden bookmarks, AI auto-tagging for Paperless, Docker container control (restart/stop/start)
  • 100+ nightly tests — comprehensive end-to-end tests at 5 AM, covers all services + smart fixer + escalation + AI stack (traces/memory/sandbox/RAG/evals/semantic routing), plus an eval-score regression gate; 128 unit tests across homelab-api/doc-rag/homelab-agent, Discord results notification
  • SearXNG — self-hosted web search for AI agent, Homepage dashboard, Open WebUI
  • Diagnostic toolkit — file ops, log reading, permission fixes, library rescans for AI escalation
  • Unified API — single FastAPI endpoint aggregating all services (Swagger docs included)
  • Document RAG — vector search over 169+ documents via local embeddings + LLM
  • Automated backups — Restic to DAS, 4 nodes, daily, encrypted, deduplicated
  • SSO reverse proxy — nginx + Authelia, 34 subdomains on *.homelab.internal, 3-tier auth (true SSO / gate / passthrough), self-signed wildcard cert, dnsmasq for LAN + Tailscale split DNS for remote
  • CrowdSec IPS — 1400+ malicious IPs blocked at firewall, community threat intel
  • Terraform IaC — entire cluster defined as code, importable state
  • 9 n8n workflows — dual-channel Discord alerts, watchdogs, health checks
  • AI self-healing — consolidated Homelab Agent with 3-tier repair (1.7b fast tools → 35b smart fixer → Claude Code backstop) auto-fixes containers, torrents, VPN, permissions, imports, configs
  • Dual-channel Discord alerts — all watchdogs and bots report to both Discord servers
  • Zero cloud dependencies — everything self-hosted (except Cloudflare tunnel for external access)

Open Source Projects

Custom Go services built for this homelab, available as standalone projects:

Project Language Description
Librarr Go Book/audiobook/manga search + download, 13 sources, Torznab API, OPDS feed
Sentinel Go Download guardian with library verification (Jellyfin/ABS/Kavita/Sonarr/Radarr)
Gamarr Go Game/ROM search + download, 24 platforms, 3 sources, 43 e2e tests
Homelab Blueprint Docs This repo — architecture documentation

License

MIT — use this as inspiration for your own homelab.

About

Three-node Proxmox homelab — media automation, GPU passthrough gaming, AI/ML workloads, 35+ self-hosted services

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors