VDB Manor · An AI-collaborated homelab
Stop searching. Start asking.
Many ‘AI homelab’ writeups answer the wrong question. They tell you what they have built, what hardware they picked, which AI tools they are using – and leave you to figure out how to pull this off and which of those decisions translate to your house, your budget, your skill level, your family.
I’m going to suggest you do something different.
Stop searching for the right guide. Open Claude, give it your context – your network, your goals, your constraints, your skill level, the thing you’re actually trying to do – and ask. The plan you get back is shaped to you, not to the median person on r/homelab. That’s the whole thesis of this page.
Over the last ~5 months, I’ve gone through my own personal ‘AI transformation journey’ and found that building out a homelab was exactly what I needed to fully learn, understand, and apply these new AI tools. Prior to this I was using AI as a chatbot. I knew there were real unlocks with this technology but didn’t quite understand exactly what that meant, or how to get there.
I have built and now operate the VDB Manor homelab with Claude Code as my co-engineer. My family uses it every day without knowing it exists. Every change is in git. There’s no monthly fee for any of it.
What follows is three things:
Part One
The Methodology.
How I work with Claude on infrastructure. The part most worth copying.
01 · Co-engineer, not code generator
Claude as co-engineer.
Two beats matter here: the workflow, and who brings what.
Claude executes. I stay in the loop.
I describe a confirmed change in plain English. Claude executes it via tools: SSH into a Proxmox host, edit a config, query Home Assistant, fetch a secret from 1Password. I review every diff before it is committed. Claude does the typing. The judgment stays with me. There is no autonomous mode, no agent churning in the background. Every commit is something I understood at the moment it went in.
The architectural mindset is the actual unlock.
My day job is product architecture. For fifteen years I have shaped how systems fit together: boundaries, separated concerns, what to specify versus leave open, failure modes anticipated before code exists. Claude is fluent in the syntax of every tool on this page. What it cannot do is decide the right boundaries between my systems, which failure modes are acceptable for my family, or how the pieces stay comprehensible six months from now. I bring the shape. Claude brings execution and a catalog of options I would never find alone.
The shaping never stops.
The active work, asking questions, pushing back, proposing approaches the model did not surface, is what separates "I used Claude to do this" from "Claude built this for me." If you sit back and let the AI run, you get the median outcome. Driving is the work. If you have ever asked "wait, what are the boundaries here?", you already have the most important skill for working with AI on infrastructure.
02 · The project brain
CLAUDE.md is the brain.
CLAUDE.md is the first file Claude reads at the start of every conversation: the difference between a smart AI giving a generic answer and a co-engineer who already knows your network. Mine is around 240 lines. It contains:
- Personas and family roles: who uses what and what tradeoffs matter to them (no names on this page).
- Core principles: local-first, no monthly fees, reliability over cleverness, WAF (the spouse-acceptance factor).
- Critical safety rules: operational footguns the AI must respect unprompted.
- Credential handling: how secrets move between 1Password, the macOS Keychain, and deploy scripts. What is allowed where.
- Access matrix: every host, the SSH alias, the user, how privileged ops work.
- File index: every doc in the repo with a one-line description of what is in it.
- Workflows: change protocol, deployment patterns, when to update docs.
- Spec discipline: how plans get written and reviewed before anything runs.
- Planning conventions: how I override the default Claude workflow for this project.
- Domain routing: "IP question? Check
devices.md. Camera question?smarthome.md."
That list looks like a knowledge dump. Most of it is. But the safety rules are not suggestions. They are the things I have already been burned on, written down so Claude does not repeat them. The domain routing is not filler either: when Claude sees a VLAN question, it goes to vlans.md without being asked, because the file says so. Some of it is context, some of it is policy, and the policy parts are where it actually changes how Claude behaves.
A small excerpt, to give you the shape:
## Core Principles (apply to all advice) 1. Local-first / privacy — video and automations stay on the network. No cloud. 2. Reliability over cleverness — wired PoE beats Wi-Fi; proven beats bleeding-edge. 3. No monthly fees — zero-subscription is a hard constraint, not a preference. 4. WAF — clean installs, no visible wires, usable by the family without a manual. ## File Index | File | Contents | | :------------- | :------------------------------------------------ | | CLAUDE.md | This file — context, personas, protocols | | CHANGELOG.md | Running log of every physical and config change | | network.md | Physical topology, switches, ports, QoS | | vlans.md | VLAN IDs, subnets, isolation rules | | firewall.md | Firewall rules, DNS blocks, inter-VLAN policy | | devices.md | All hardware — nodes, APs, NVR, UPS | | smarthome.md | Home Assistant, cameras, integrations | | access.md | SSH / sudo / secrets patterns | ## Domain knowledge — where to look | Question | File | | :---------------------------------- | :------------- | | IP address of a device | devices.md | | Which VLAN a device is on | vlans.md | | Switch port assignment | network.md | | "X seems broken" — first look | diagnostics.md |
The protocol
Surgical edits only. Change only the specific items affected. Update immediately when a value lands (IP, port, VM ID, endpoint) before advancing. Never hold a confirmed value only in conversation context.
Large file handling
Never read the full CHANGELOG.md. It is around 1,800 lines and growing. Grep for the relevant section first, then read only the range you need. Context windows are finite. Treat them like the constrained resource they are.
The file does more than describe the house. It tells Claude how to work in it.
03 · The source of truth
Living documentation.
This is the part I would most want someone to copy. It compounds the value of every other practice on this page. Every domain has one canonical file. No information lives in two places: if a device IP shows up in two files, one is the source of truth and the other is wrong.
CLAUDE.md
Project context, personas, protocols, domain routing
CHANGELOG.md
Every change, dated, with affected files (the single intake point)
network.md
Physical topology, switch ports, QoS, recovery
vlans.md
VLAN IDs, subnets, SSIDs, isolation rules
firewall.md
DNS blocks, inter-VLAN policy, firewall rules
devices.md
Every device: IP, VLAN, group, static reservations
smarthome.md
Server specs, peripherals, cameras, services
ha-notes.md
Home Assistant annotations the JSON cannot capture
automations.md
Index of automations (git-backed YAML lives beside it)
dashboards.md
Index of Lovelace dashboards (also git-backed YAML)
access.md
SSH, sudo, 1Password patterns, key rotation
diagnostics.md
First-look runbook: where to query when something seems wrong
The protocol: surgical edits, immediate on confirmation.
Change only the specific items affected. Never rewrite sections. Update immediately when a value lands (IP, port, VM ID, endpoint), before advancing. Never hold a confirmed value only in conversation context. And no doc updates during brainstorming: speculation stays in chat, docs change only when a real-world thing has changed.
Three views of the same truth.
Every confirmed change produces a dated CHANGELOG.md entry listing every file touched, landing with the git commit. Any one of these reconstructs what happened: the git history, the changelog, or the doc files themselves. They do not contradict.
The docs live in a private GitHub repo, which means:
- They travel with me. Work laptop, gaming PC, phone: same docs, no sync ceremony.
- Claude in other projects can read them. Point Claude at the repo URL and pull live docs as reference. No copy-paste, no drift.
- The house could burn down and the rebuild plan would still exist. Hardware is replaceable. The documented state is not.
The loop.
Every change, no exceptions. Tedious for the first week. Compounding from week two onward: every session opens with full context, and the system gets smarter as it grows instead of harder to maintain.
All of this is a public repo, too. Real docs from this house, sanitized, plus blank versions to start your own. More in the Starter Pack below ↓
04 · Spec-driven configuration
Spec it before you run it.
Single-model improvising works fine for small, reversible changes. It falls apart on multi-step infra projects where one mis-ordered command costs you a recovery weekend. The answer is to spec the project before any command runs.
The structure.
Plans live under docs/superpowers/plans/<date-project-name>/: a 30-line README that maps the phases, then one phase file per session, around 300 lines each. One phase, one session, one validation gate at the boundary. Specs live in a sibling directory and end with a Pre-Approval Verification section, where every concrete reference (a version, a file:line, an IP, a port) is verified live against the running infrastructure during the writing pass. Not from memory. Catching drift at write time costs one command. Catching it post-deploy costs hours.
Dual-Opus adversarial review.
Once a spec is drafted, I open a second Opus session and have it read the spec cold, prompted to find failure modes rather than validate the plan. It catches things the first one rationalized past. The merge of both passes becomes the approved spec.
Task tagging.
Inside each phase file, every task gets one tag:
- DIRECT execute in the main conversation. The default: SSH, config edits, doc updates, commits.
- SUBAGENT dispatch a subagent. Only when a task makes significant verification chatter or a logic-bearing artifact worth its own review.
- HANDOFF I do it. Physical actions, Touch ID, browser UI, anything outside Claude's reach.
Pre-deciding who executes each step removes a whole class of mid-execution confusion.
Validation gates.
Phases do not close until their gates ratify. Gates are objective: this command returns this value, this service responds on this port, this metric stays under X for 48 hours. If the gate does not pass, I do not move on. The closure is binary.
A deliberate override of the off-the-shelf workflow.
The default "superpowers" workflow produces one monolithic plan file and expects subagents for most execution. That broke down at homelab scale: a multi-day plan in one file is unreadable, blows past a session's context window, and an LXC rebuild does not want a subagent (the verification overhead is higher than the work). So I rewired it, documented in CLAUDE.md: a 30-line README nav plus per-phase files of ~300 lines, every task tagged with where it runs, one phase as one session. Same discipline, shaped for the actual constraint.
Proof case · Frigate iGPU passthrough
Five phases over six days. Bind the integrated GPU away from the kernel driver to VFIO, pass it through to the Services VM, cut Frigate's detector from a USB Coral to OpenVINO on the iGPU, bump face recognition to large, rework per-camera streams so 10 of 12 record main-stream while detecting on the sub-stream. Every gate ratified. A 48-hour soak passed cleanly: p99 inference under 100 ms, 5,000+ events a day, zero detector restarts. The Coral stayed plugged in as a 30-second rollback path.
05 · Capability-bounded AI
Secrets and boundaries.
The fastest way to derail a homelab project is to give the AI too much access or too little. Both have the same symptom: you stop trusting the work. I solve it with hard boundaries the AI cannot widen on its own. I use 1Password with two paths into it:
Default vault · AI cannot read
SSH keys, sudo passwords, anything I paste interactively. The service-account token Claude uses cannot reach this vault. If a script tries to read it, the read silently fails. There is no client-side trick that widens the scope.
AI vault · deliberate grants
Only the secrets Claude must read for automated injection into deploy scripts: HA tokens, Slack webhooks, monitoring passwords. Each entry is a deliberate grant. The token lives in the macOS Keychain and is injected into a single op invocation, never inherited by child processes.
SSH discipline.
Bare SSH aliases only, no raw IPs. The alias picks up the user, the key, and the connection multiplexer in one go. The agent is the 1Password agent: every signing operation prompts for Touch ID, with multiplexing batching commands inside a 10-minute window. Private key text never leaves the agent process.
Tell the AI what it cannot do.
The Home Assistant MCP integration is deliberately non-admin: a dedicated read-only user, twenty-plus deny rules blocking every write tool, service call, and entity update. Claude can read HA and that is the full extent of it. To change something, the change goes through the same workflow as everything else: edit a YAML file in git, run the deploy script, watch the reload. The MCP is not a control plane. It is a sensor. Capability-bounded AI is more useful than fully-capable AI, because I never have to wonder whether a session has gone sideways and started silently rewriting or turning things on or off in my house.
06 · Continuity
Memory that compounds.
Claude has a persistent memory system. Used well, every future session starts smarter than the last. Used badly, it accumulates noise. Four types:
User
Who I am, how I work, my role, my technical level. My desire to learn how things work along the way instead of Claude just doing it.
Feedback
Corrections and confirmations from past sessions. "Limited MCP access is by design." "Doc commits don't need approval, pushing does."
Project
Facts about the homelab not derivable from code or git. Why a workaround exists, what's ratified, what's in flight.
Reference
Pointers to where information lives. Which dashboard for which thing. Which channel alerts land in.
The triage rule.
Memory persists only what future sessions need that they cannot reconstruct from files. Preferences. Project facts. External references. Not in-progress work, not derivable patterns, not anything already in the docs. The docs are the system. Memory is the meta: the things about how I work with the system that the system itself does not know.
The compounding effect.
Most workflows get harder as projects accumulate. This one gets easier. Every project starts with a more thoroughly documented base, every memory entry refines how Claude collaborates with me, every CHANGELOG entry is one more bit of recall the next session opens with. The more I build, the less I have to re-explain. I have done all of this on the $20/month Claude Pro plan, because the context layer does the work the tokens would otherwise have to.
Part Two
The Proof.
What that methodology has produced. Hardware, network, services.
07 · The real unlock
Context is the product.
The reason this works is not that Claude is smart. Claude is smart in a generic, internet-shaped way. What is specific is the context layer about my situation loaded on every session: my network, my hardware, my constraints, my family, my workflows, my past decisions and why I made them. The methodology in Part One is how that context layer gets built and kept honest. Everything in Part Two is what it makes possible. The mindset shift is small and load-bearing:
Wrong question
"How should I set up and run Home Assistant?"
Right question
"I want to understand what's possible. I've seen folks mention bare metal, or Proxmox. I have a $500 budget but don't know how best to spend it. Ask me questions about my goals and help me shape what's right for me."
The first gets you a Reddit thread and a five-hour rabbit hole. The second gets you a deployment plan that fits your actual constraints. Three worked examples, two homelab, one from outside the genre to show this generalizes.
"My internet keeps dropping at 9pm."
HomelabNo context
Generic checklist. Reboot the router. Check for firmware. Run a speedtest. Maybe it's congestion. Maybe it's your ISP.
Context loaded
Run this query against your router logs for the 8:30–9:30 window. Cross-reference the WAN-latency Grafana panel. Three of your IoT plugs phone home at the top of the hour: check whether the timing correlates. If it does, here's the AdGuard rewrite that pins that vendor's flaky DNS to a known-good IP.
"I want to add a smart lock."
HomelabNo context
Compare Z-Wave, Zigbee, Matter, Thread, vendor clouds. Five hours of forum reading. Pick wrong, return it.
Context loaded
You already have a Zigbee coordinator and an IoT VLAN with outbound blocked and an HA dual-home that reaches in. Zigbee is right for you: same radio, no new infra, no cloud, no firewall exception. Here are three locks that pair cleanly. Add it to the IoT VLAN, expose the entity through HA, done.
"HSA or 401k bump this year?"
Outside the genreNo context
Generic advice. Both are tax-advantaged. Depends on your situation.
Context loaded
Your marginal bracket is X, state tax Y, expected medical spend Z on a high-deductible plan. Your employer matches 4%; you're at 6%, clearing the match. The HSA has a triple-tax advantage you're not getting elsewhere. Bump the HSA first, revisit the 401k in Q4 when your bonus lands.
Same lesson every time. The methodology in Part One is how you build the context layer. The rest of Part Two is what that layer makes possible at home.
08 · Stack at a glance
By the numbers.
A snapshot of the current state.
Version-controlled
308
Git commits in the homelab repo
12
Living docs, one per domain
10
Adversarially-reviewed specs completed
Running today
~2,000
Home Assistant entities
33
Uptime monitoring probes
5
Grafana dashboards in git
$0
Recurring monthly cost. No subscriptions, no cloud bill, self-hosted on hardware already owned.
09 · Hardware
Hardware and physical architecture
Two-rack layout, primary up top, secondary below. The top rack carries the networking backbone: switches, patch panels, and the NVR. The bottom rack holds the two compute nodes, the UPS, and a tool drawer.
I had Claude help me research what to buy, how to install, and where to logically place (and place again... many times as I kept adding devices) everything.


From sketch to shipping. The plan held. Nothing on the right doesn't appear on the left.
▸ Compute
Two nodes, two purposes.
HP Elite Mini 800 G9
CPU
Intel Core i5-13500
Memory
48 GB DDR5
Storage
2 TB Samsung 990 Pro · boot
2 TB Samsung 990 Pro · PBS datastore
Uplink
2.5G RJ45 to core switch
Minisforum MS-02 Ultra
CPU
Intel Core Ultra 9 285HX
Memory
64 GB DDR5
Storage
2 TB Samsung 990 Pro · boot
4 TB WD BLACK SN7100 · data
Uplink
10G SFP+ DAC to core switch
▸ Power
Always on.
UPS
Unit
CyberPower CP2000PFCRM2U · 2U rackmount · sine-wave
Shutdown
NUT triggers a graceful shutdown on sustained outage
Feed
Circuit
Dedicated 20A circuit to the rack
Distribution
1U rackmount PDU feeding peripherals
▸ Storage
Three paths.
System + backup
Boot
2 TB Samsung 990 Pro per node · OS and VM disks
Backup datastore
2 TB 990 Pro on G9 · Proxmox Backup Server target
Bulk data
Drive
4 TB WD BLACK SN7100 on Ultra at /mnt/data
Holds
Frigate clips + the Immich photo library
▸ Surveillance
Cameras, kept local.
Cameras + NVR
Cameras
12 Reolink PoE cameras
NVR
Reolink RLN16-410 · 24/7 local recording
Isolation
No internet · only HA and admins can reach in
Frigate
AI layer
Frigate on the Services VM · all 12 cameras at native resolution
Detection
Arc 140T iGPU · Arcface facial recognition · semantic search
Events
MQTT to Home Assistant · detections drive automations
10 · Network and VLANs
One tree, four VLANs.
The topology is a tree. The ISP feeds the router, the router feeds the core switch, and the core switch feeds every branch in the house.
PoE · Direct
Wi-Fi Access Points
2× GWN7674 · 2× GWN7665
Wi-Fi 7 + Wi-Fi 6E
PoE
Security Switch
Netgear GS316EPP
Cameras + NVR only
PoE
Panel Trunk
Netgear GS308EP
Living room, master bed, dashboard tablet
2.5G RJ45
HP Elite Mini 800 G9
Proxmox node 1 · dual-homed
Home Assistant, AdGuard primary
10G SFP+ DAC
Minisforum MS-02 Ultra
Proxmox node 2
Services VM, AdGuard secondary
Access
Office Switch
Netgear MS305
Unmanaged 2.5G · single-VLAN
▸ Segmentation
Four VLANs.
X · Management
Infrastructure only.
Switches, APs, router, hypervisors, HA, AdGuard. No client devices. Ever. Dedicated management drop to office available.
Y · Main
The daily segment.
Family devices, work laptops, gaming, phones. Internet allowed. Inter-VLAN access only for admin devices.
Z · Cameras
No outbound internet.
Reolink fleet + NVR, except scoped push-notification endpoints for alerts. Only HA and admin devices can reach in.
A · IoT
Also, no outbound internet.
Smart plugs, TVs, anything IoT. HA reaches in via a dual-home interface; the devices cannot reach out.
VLAN A is the architectural answer to "I want a smart thing but I don't want it phoning home." The device works locally with HA and is cut off from the internet entirely. That is the rule, not the exception.
HA dual-home.
Home Assistant runs as a single VM but sits on two VLANs at once, management plus IoT, via a tagged virtual interface. The router-on-a-stick pattern applied to a hypervisor guest. It lets HA see local broadcast traffic (mDNS, SSDP) on both networks natively without weakening segmentation. No cross-VLAN reflector to maintain. HA is just on both segments.
Inter-VLAN policy at the firewall.
L3 enforcement happens on the router. Cameras cannot initiate outbound except to the few explicitly allowed endpoints. IoT devices cannot initiate outbound at all. The guest VLAN, when used, is fully isolated. The default for new things is closed.
Remote access via VPN, not exposed services.
Nothing on the homelab is publicly reachable. To hit HA, Frigate, or the reverse proxy from outside, I connect through WireGuard on the router and land inside the home network on my own VPN subnet. Internal hostnames resolve via the home AdGuard instances over the tunnel, and the wildcard TLS cert still validates (issued via DNS-01 against a Cloudflare-hosted apex, no public port exposure). Four devices have WireGuard profiles. Zero services on the public internet. Full remote access whenever I want it.
11 · Software stack
What runs on it.
▸ Virtual machines
Node 1 · G9
Home Assistant OS
Dual-homed on management + IoT. Zigbee (ZBT-2) + Z-Wave (ZWA-2) radios via USB.
Node 1 · G9
Proxmox Backup Server
Dedicated 2 TB datastore. 7×daily / 4×weekly / 6×monthly retention, all four guests daily.
Node 2 · Ultra
Services VM
Debian 13, 28 GB / 8 vCPU. Arc 140T iGPU passed through for decode + OpenVINO + CLIP.
▸ Containers (LXC)
Node 1 · G9
AdGuard Home (primary)
Network-wide DNS filtering. Firewalla WAN primary DNS.
Node 2 · Ultra
AdGuard Home (secondary)
Secondary failover, synced from primary every 15 minutes.
▸ Containers on the Services VM
Surveillance
Frigate + Mosquitto
OpenVINO detector on the iGPU, VA-API decode for all 12 streams, ArcFace large, semantic search.
Media
Immich
Local photo library + Postgres + ML on CPU. Immich Kiosk drives the hallway tablet.
Edge / proxy
Nginx Proxy Manager
16 internal HTTPS hosts behind a Let's Encrypt wildcard via DNS-01.
Observability
Prometheus · Grafana · Loki
cAdvisor, node_exporter on every host, Blackbox, 3 Promtails. 5 Grafana dashboards in git.
Monitoring
Uptime Kuma 2.x
33 monitors across 5 groups, webhook for Slack alerts.
DNS sync · Tools
adguardhome-sync + IT Tools
Primary → secondary every 15 min, plus a small static-site stack.
The pattern: every container here is a git-backed compose stack. The compose file lives in the repo. A deploy script rsyncs the tree, resolves secrets from the AI vault at runtime, and runs docker compose up -d. No hand-edited compose on the host. Lose the VM, git pull on a fresh one and redeploy.
12 · What this unlocks
A foundation, not a finish line.
The hard part is done. VLAN segmentation, a hypervisor on each node, dual-home networking, local AI hardware, and a living documentation system mean every future project plugs into a foundation instead of starting from scratch.
Local AI-powered surveillance
Frigate on an integrated GPU. ArcFace face recognition on the large model. CLIP semantic search. ~5,000 events a day at sub-100 ms p99 inference. Zero cloud.
Debug-anything observability
Loki for logs, Prometheus for metrics, Grafana for dashboards, Uptime Kuma and Blackbox for health. Most investigations resolve in one query.
Ad and tracker blocking
Two AdGuard nodes answer DNS for the whole house, so every phone, TV, and tablet gets the same filtering with no app and no account to set up.
Infinite IoT without risk
Every new device lands on the IoT VLAN with no outbound internet, talks only to HA, and inherits the security posture. Add fifty more; none can see the family network.
Source-of-truth recovery
Rebuild from docs, not memory. The current state of the homelab is reconstructable from my private repo on Github.
Complexity without burden
Every future session opens with the full picture. The docs grow with the system. The more there is, the more useful they become.
13 · The journey
Five months of confirmed changes.
Five months ago I hadn’t done literally any of this. I didn’t even know what Proxmox or Home Assistant were. I’d heard of Docker, knew people did “homelab” stuff, but had zero real exposure to any of it. I’ve completely replaced my 1–2 hours of gaming every night (a 20-plus-year habit, we can fairly call it an addiction) with learning and building in my homelab with Claude. Every item below was the first time I did it.
February - March 2026
Foundation
Migrated all docs from scratchpads and Gemini gems into structured markdown in a single repo. Authored CLAUDE.md. Set up the change-workflow protocol. The first week of typing up what was already running was the most boring part. Everything since has compounded from it.
Bare-metal Home Assistant → Proxmox VM
First virtualization project. Took HA from bare metal to one VM among several on a hypervisor. Zero downtime. Dual-Opus planned. Full restore from snapshot.
Hardening & AI workflow maturity
Full security audit. HA MCP integration (read-only, deny rules on every write tool). Migrated HA automations and dashboards to git-backed YAML with their own deploy scripts.
April 2026
Services emerge
Frigate, Immich, the reverse proxy, Uptime Kuma, all deployed in quick succession on the same VM. The Services VM went from empty to running the whole stack in a few weeks.
The cluster era
Added a second Proxmox node (Minisforum MS-02 Ultra) over a 10G SFP+ DAC. Brought up a secondary AdGuard for DNS failover. Stood up a real Proxmox Backup Server on its own drive. Then physically replaced the original G9 boot drive. Fresh hypervisor install, networking rebuilt, every VM restored, every service revalidated end to end. The kind of “tear down the foundation under a running house” project I’d have paid someone else to do six months earlier.
May 2026
The big cutover: Services from G9 to Ultra
Migrated the entire Services VM and everything on it from G9 to Ultra: Frigate (Coral TPU passthrough following it), Immich (the 4 TB NVMe physically moving sockets between machines), the reverse proxy, Uptime Kuma, MQTT, the works. The static IP and MAC followed the VM, so every external reference updated atomically. Not a single piece of the family-facing stack went dark for more than a planned-restart window.
(I genuinely didn’t think I could pull that one off until I was halfway through it.)
NUT + UPS graceful shutdown
Coordinated power-loss handling across both nodes. Threshold-based forced-shutdown trigger. State-aware Slack and mobile-push alerts. Six UPS sensors live in HA.
Observability stack + log aggregation
Prometheus, Grafana, Loki, three Promtails, blackbox probes, Uptime Kuma scrape, Slack alerting. The “first place I look when something seems wrong” became a single dashboard.
Frigate iGPU passthrough
Arc 140T bound to VFIO on the Ultra host, passed through to the Services VM. OpenVINO detector replaced the Coral TPU. Dual-input architecture so 10 of 12 cameras record at main-stream quality while detecting on the sub-stream.
Access standardization
One SSH key in the default 1Password vault, serving all four homelab hosts plus GitHub. One vault rule. Deploy scripts that self-export their token so they work from any shell. NOPASSWD sudo for automated deploys; interactive sudo still prompts.
Ongoing
None of the projects above ran on a finished workflow. The workflow itself kept getting refined in parallel. Every time a session went sideways I figured out why and wrote the fix into CLAUDE.md. Every time a pattern earned its keep, I encoded it the same way. The version you’re reading about isn’t what I started with in March. It’s what survived three months of real use telling me what worked and what didn’t.
Part Three
A Starter Pack.
A drop-in prompt, a doc skeleton, a repo to fork, and the practices that compound.
14 · The starter pack
Build your own version.
The prompt isn’t the magic. The context you build is.
You can copy the prompt below verbatim and get a head start. But the prompt is the spark, not the engine. The engine is the documentation system you build with Claude on the other side of that first conversation. Treat the prompt as the seed.
Step 0
Decide what you’re documenting.
Pick one scope. Not all of them.
- Home network and smart home.
- A single server (your daily-driver, a homelab box, a Mac mini doing dev work).
- A side project codebase.
Don’t try to document your whole digital life on day one. Pick the system you actually touch most often. Build the muscle there. Expand later.
The drop-in prompt
Paste this into a fresh Claude conversation.
Fill in the bracketed sections honestly. The more specific you are, the more tailored the response.
I want to start working with you the way a product architect collaborates with an engineering team - I describe what I want, you do the execution via tools, I review every change before it gets committed. (If you don't have tool access in this conversation, help me plan it and I'll run things myself.) Before we do anything, help me set up the foundation. Here's my context: ROLE: [your job / how technical you are / how you learn best] SCOPE: [the one system you're going to document and operate together - home network, a single server, a codebase, etc.] GOALS: [what you want this system to do well over the next 6 months] CONSTRAINTS: [budget, time, family/WAF, anything you won't compromise on, anything you can't change] WHAT EXISTS TODAY: [hardware you own, services that run, accounts you use, how it's currently set up - rough is fine] WHAT I WANT HELP WITH FIRST: [the thing that pushed me to do this today] Start by asking me at least five questions about my context, goals, or constraints that would change your recommendation. Push on things I haven't told you. Assume I don't know what I don't know - the questions are how I find out what I should be thinking about. In your first reply, only ask the questions. Don't propose anything yet. Once I've answered, propose three things: 1. A CLAUDE.md skeleton - section headings + one-line descriptions of what goes in each, tailored to my context. 2. A starting file index - which living docs make sense for my scope. 3. Five "critical safety rules" relevant to my context - operational footguns you'd want to be warned about unprompted in future sessions. Don't write the full files yet. Show me the structure first. We'll iterate before anything gets created.
The single most useful move
Tell Claude to ask you questions.
Most people use AI like a search engine. One shot, one answer. The bigger unlock is treating it like a senior practitioner who’s about to scope a project with you: ask it to ask you questions before recommending anything. This is the move that does the most work whenever I’m in territory where I don’t know what I don’t know.
At the end of any prompt where you’re not sure you’ve framed the problem right, add: “before you answer, ask me five questions about what you’d need to know to give me a better recommendation.” The questions reframe the whole thing more often than you’d expect.
Suggested CLAUDE.md skeleton
Section headings, and what goes in each.
Who uses this system. Your technical level. The family or team members who’ll touch it. How you want explanations framed.
The non-negotiables. Privacy posture, cost ceiling, reliability priorities, family-friendliness rules.
The operational footguns. Specific. Each rule names an artifact (a port, a file, a host) and a consequence. Include the why.
Where secrets live. Which secrets the AI can read and which it can’t. The boundary, in writing.
Every host or service the AI might touch. The user. The connection pattern. The privilege model.
Every doc in the project with a one-line description. The AI uses this to route lookups without re-reading everything.
When docs change, how they change, who decides. The “surgical edits only” and “no updates during brainstorming” rules go here.
The loop. Confirmed change, surgical edit, changelog entry, review, commit, push.
First-week practices
Build the muscle early.
- Open a CHANGELOG.md on day one. Add an entry every time you confirm a change.
- Commit every change. Even the boring ones. Especially the boring ones.
- Never let a fact live only in conversation context. If it’s a real fact, it goes into the docs.
- Surgical edits only. Don’t let Claude rewrite a whole section it didn’t need to touch.
- Keep secrets out of git from day one, even on a private repo. Set up the vault pattern early. Migrating later is annoying.
Common pitfalls
What to avoid.
- Secrets in committed YAML. Even on a private repo. Future-you will share that repo with someone and forget what’s in it. Use the vault pattern from day one.
- Vague safety rules. “Be careful with this” is not a safety rule. “Before changing the VLAN on port N, warn me, losing the tagged VLAN silently breaks IoT” is a safety rule.
- “I’ll document it later.” No. Document it now or it doesn’t exist.
- Letting the AI improvise multi-step changes. Spec it. Then run it. The 10 minutes you spend on a spec saves the 4 hours of recovery from an out-of-order step.
- Reading the full CHANGELOG every session. Grep first. Read the range. Context windows are finite.
- Buy once, cry once. If you’re even remotely interested in homelab tinkering, do your research and go bigger than you think you should. I know a guy who could have saved some money making smarter first purchase decisions.
Get a head start
Or fork the whole thing.
You don’t have to start from a blank page. Everything I’ve described is a public repo: the real docs from this house with the secrets stripped, plus a blank version of each to fill in as you go.
The docs are the system.
Build them as you build it.