VDB Manor · An AI-collaborated homelab

Stop searching. Start asking.

Many ‘AI homelab’ writeups answer the wrong question. They tell you what they have built, what hardware they picked, which AI tools they are using – and leave you to figure out how to pull this off and which of those decisions translate to your house, your budget, your skill level, your family.

I’m going to suggest you do something different.

Stop searching for the right guide. Open Claude, give it your context – your network, your goals, your constraints, your skill level, the thing you’re actually trying to do – and ask. The plan you get back is shaped to you, not to the median person on r/homelab. That’s the whole thesis of this page.

Over the last ~5 months, I’ve gone through my own personal ‘AI transformation journey’ and found that building out a homelab was exactly what I needed to fully learn, understand, and apply these new AI tools. Prior to this I was using AI as a chatbot. I knew there were real unlocks with this technology but didn’t quite understand exactly what that meant, or how to get there.

I have built and now operate the VDB Manor homelab with Claude Code as my co-engineer. My family uses it every day without knowing it exists. Every change is in git. There’s no monthly fee for any of it.

What follows is three things:

Part One

The Methodology.

How I work with Claude on infrastructure. The part most worth copying.

01 · Co-engineer, not code generator

Claude as co-engineer.

Two beats matter here: the workflow, and who brings what.

Claude executes. I stay in the loop.

I describe a confirmed change in plain English. Claude executes it via tools: SSH into a Proxmox host, edit a config, query Home Assistant, fetch a secret from 1Password. I review every diff before it is committed. Claude does the typing. The judgment stays with me. There is no autonomous mode, no agent churning in the background. Every commit is something I understood at the moment it went in.

The architectural mindset is the actual unlock.

My day job is product architecture. For fifteen years I have shaped how systems fit together: boundaries, separated concerns, what to specify versus leave open, failure modes anticipated before code exists. Claude is fluent in the syntax of every tool on this page. What it cannot do is decide the right boundaries between my systems, which failure modes are acceptable for my family, or how the pieces stay comprehensible six months from now. I bring the shape. Claude brings execution and a catalog of options I would never find alone.

The shaping never stops.

The active work, asking questions, pushing back, proposing approaches the model did not surface, is what separates "I used Claude to do this" from "Claude built this for me." If you sit back and let the AI run, you get the median outcome. Driving is the work. If you have ever asked "wait, what are the boundaries here?", you already have the most important skill for working with AI on infrastructure.

02 · The project brain

CLAUDE.md is the brain.

CLAUDE.md is the first file Claude reads at the start of every conversation: the difference between a smart AI giving a generic answer and a co-engineer who already knows your network. Mine is around 240 lines. It contains:

  • Personas and family roles: who uses what and what tradeoffs matter to them (no names on this page).
  • Core principles: local-first, no monthly fees, reliability over cleverness, WAF (the spouse-acceptance factor).
  • Critical safety rules: operational footguns the AI must respect unprompted.
  • Credential handling: how secrets move between 1Password, the macOS Keychain, and deploy scripts. What is allowed where.
  • Access matrix: every host, the SSH alias, the user, how privileged ops work.
  • File index: every doc in the repo with a one-line description of what is in it.
  • Workflows: change protocol, deployment patterns, when to update docs.
  • Spec discipline: how plans get written and reviewed before anything runs.
  • Planning conventions: how I override the default Claude workflow for this project.
  • Domain routing: "IP question? Check devices.md. Camera question? smarthome.md."

That list looks like a knowledge dump. Most of it is. But the safety rules are not suggestions. They are the things I have already been burned on, written down so Claude does not repeat them. The domain routing is not filler either: when Claude sees a VLAN question, it goes to vlans.md without being asked, because the file says so. Some of it is context, some of it is policy, and the policy parts are where it actually changes how Claude behaves.

A small excerpt, to give you the shape:

## Core Principles (apply to all advice)
1. Local-first / privacy — video and automations stay on the network. No cloud.
2. Reliability over cleverness — wired PoE beats Wi-Fi; proven beats bleeding-edge.
3. No monthly fees — zero-subscription is a hard constraint, not a preference.
4. WAF — clean installs, no visible wires, usable by the family without a manual.

## File Index
| File           | Contents                                          |
| :------------- | :------------------------------------------------ |
| CLAUDE.md      | This file — context, personas, protocols          |
| CHANGELOG.md   | Running log of every physical and config change   |
| network.md     | Physical topology, switches, ports, QoS           |
| vlans.md       | VLAN IDs, subnets, isolation rules                |
| firewall.md    | Firewall rules, DNS blocks, inter-VLAN policy     |
| devices.md     | All hardware — nodes, APs, NVR, UPS               |
| smarthome.md   | Home Assistant, cameras, integrations             |
| access.md      | SSH / sudo / secrets patterns                     |

## Domain knowledge — where to look
| Question                            | File           |
| :---------------------------------- | :------------- |
| IP address of a device              | devices.md     |
| Which VLAN a device is on           | vlans.md       |
| Switch port assignment              | network.md     |
| "X seems broken" — first look | diagnostics.md |

The protocol

Surgical edits only. Change only the specific items affected. Update immediately when a value lands (IP, port, VM ID, endpoint) before advancing. Never hold a confirmed value only in conversation context.

Large file handling

Never read the full CHANGELOG.md. It is around 1,800 lines and growing. Grep for the relevant section first, then read only the range you need. Context windows are finite. Treat them like the constrained resource they are.

The file does more than describe the house. It tells Claude how to work in it.

03 · The source of truth

Living documentation.

This is the part I would most want someone to copy. It compounds the value of every other practice on this page. Every domain has one canonical file. No information lives in two places: if a device IP shows up in two files, one is the source of truth and the other is wrong.

CLAUDE.md

Project context, personas, protocols, domain routing

CHANGELOG.md

Every change, dated, with affected files (the single intake point)

network.md

Physical topology, switch ports, QoS, recovery

vlans.md

VLAN IDs, subnets, SSIDs, isolation rules

firewall.md

DNS blocks, inter-VLAN policy, firewall rules

devices.md

Every device: IP, VLAN, group, static reservations

smarthome.md

Server specs, peripherals, cameras, services

ha-notes.md

Home Assistant annotations the JSON cannot capture

automations.md

Index of automations (git-backed YAML lives beside it)

dashboards.md

Index of Lovelace dashboards (also git-backed YAML)

access.md

SSH, sudo, 1Password patterns, key rotation

diagnostics.md

First-look runbook: where to query when something seems wrong

The protocol: surgical edits, immediate on confirmation.

Change only the specific items affected. Never rewrite sections. Update immediately when a value lands (IP, port, VM ID, endpoint), before advancing. Never hold a confirmed value only in conversation context. And no doc updates during brainstorming: speculation stays in chat, docs change only when a real-world thing has changed.

Three views of the same truth.

Every confirmed change produces a dated CHANGELOG.md entry listing every file touched, landing with the git commit. Any one of these reconstructs what happened: the git history, the changelog, or the doc files themselves. They do not contradict.

The docs live in a private GitHub repo, which means:

  • They travel with me. Work laptop, gaming PC, phone: same docs, no sync ceremony.
  • Claude in other projects can read them. Point Claude at the repo URL and pull live docs as reference. No copy-paste, no drift.
  • The house could burn down and the rebuild plan would still exist. Hardware is replaceable. The documented state is not.

The loop.

Confirmed change Doc edits CHANGELOG Review Commit Push

Every change, no exceptions. Tedious for the first week. Compounding from week two onward: every session opens with full context, and the system gets smarter as it grows instead of harder to maintain.

All of this is a public repo, too. Real docs from this house, sanitized, plus blank versions to start your own. More in the Starter Pack below ↓

04 · Spec-driven configuration

Spec it before you run it.

Single-model improvising works fine for small, reversible changes. It falls apart on multi-step infra projects where one mis-ordered command costs you a recovery weekend. The answer is to spec the project before any command runs.

The structure.

Plans live under docs/superpowers/plans/<date-project-name>/: a 30-line README that maps the phases, then one phase file per session, around 300 lines each. One phase, one session, one validation gate at the boundary. Specs live in a sibling directory and end with a Pre-Approval Verification section, where every concrete reference (a version, a file:line, an IP, a port) is verified live against the running infrastructure during the writing pass. Not from memory. Catching drift at write time costs one command. Catching it post-deploy costs hours.

Dual-Opus adversarial review.

Once a spec is drafted, I open a second Opus session and have it read the spec cold, prompted to find failure modes rather than validate the plan. It catches things the first one rationalized past. The merge of both passes becomes the approved spec.

Task tagging.

Inside each phase file, every task gets one tag:

  • DIRECT  execute in the main conversation. The default: SSH, config edits, doc updates, commits.
  • SUBAGENT  dispatch a subagent. Only when a task makes significant verification chatter or a logic-bearing artifact worth its own review.
  • HANDOFF  I do it. Physical actions, Touch ID, browser UI, anything outside Claude's reach.

Pre-deciding who executes each step removes a whole class of mid-execution confusion.

Validation gates.

Phases do not close until their gates ratify. Gates are objective: this command returns this value, this service responds on this port, this metric stays under X for 48 hours. If the gate does not pass, I do not move on. The closure is binary.

A deliberate override of the off-the-shelf workflow.

The default "superpowers" workflow produces one monolithic plan file and expects subagents for most execution. That broke down at homelab scale: a multi-day plan in one file is unreadable, blows past a session's context window, and an LXC rebuild does not want a subagent (the verification overhead is higher than the work). So I rewired it, documented in CLAUDE.md: a 30-line README nav plus per-phase files of ~300 lines, every task tagged with where it runs, one phase as one session. Same discipline, shaped for the actual constraint.

Proof case · Frigate iGPU passthrough

Five phases over six days. Bind the integrated GPU away from the kernel driver to VFIO, pass it through to the Services VM, cut Frigate's detector from a USB Coral to OpenVINO on the iGPU, bump face recognition to large, rework per-camera streams so 10 of 12 record main-stream while detecting on the sub-stream. Every gate ratified. A 48-hour soak passed cleanly: p99 inference under 100 ms, 5,000+ events a day, zero detector restarts. The Coral stayed plugged in as a 30-second rollback path.

05 · Capability-bounded AI

Secrets and boundaries.

The fastest way to derail a homelab project is to give the AI too much access or too little. Both have the same symptom: you stop trusting the work. I solve it with hard boundaries the AI cannot widen on its own. I use 1Password with two paths into it:

Default vault · AI cannot read

SSH keys, sudo passwords, anything I paste interactively. The service-account token Claude uses cannot reach this vault. If a script tries to read it, the read silently fails. There is no client-side trick that widens the scope.

AI vault · deliberate grants

Only the secrets Claude must read for automated injection into deploy scripts: HA tokens, Slack webhooks, monitoring passwords. Each entry is a deliberate grant. The token lives in the macOS Keychain and is injected into a single op invocation, never inherited by child processes.

SSH discipline.

Bare SSH aliases only, no raw IPs. The alias picks up the user, the key, and the connection multiplexer in one go. The agent is the 1Password agent: every signing operation prompts for Touch ID, with multiplexing batching commands inside a 10-minute window. Private key text never leaves the agent process.

Tell the AI what it cannot do.

The Home Assistant MCP integration is deliberately non-admin: a dedicated read-only user, twenty-plus deny rules blocking every write tool, service call, and entity update. Claude can read HA and that is the full extent of it. To change something, the change goes through the same workflow as everything else: edit a YAML file in git, run the deploy script, watch the reload. The MCP is not a control plane. It is a sensor. Capability-bounded AI is more useful than fully-capable AI, because I never have to wonder whether a session has gone sideways and started silently rewriting or turning things on or off in my house.

06 · Continuity

Memory that compounds.

Claude has a persistent memory system. Used well, every future session starts smarter than the last. Used badly, it accumulates noise. Four types:

User

Who I am, how I work, my role, my technical level. My desire to learn how things work along the way instead of Claude just doing it.

Feedback

Corrections and confirmations from past sessions. "Limited MCP access is by design." "Doc commits don't need approval, pushing does."

Project

Facts about the homelab not derivable from code or git. Why a workaround exists, what's ratified, what's in flight.

Reference

Pointers to where information lives. Which dashboard for which thing. Which channel alerts land in.

The triage rule.

Memory persists only what future sessions need that they cannot reconstruct from files. Preferences. Project facts. External references. Not in-progress work, not derivable patterns, not anything already in the docs. The docs are the system. Memory is the meta: the things about how I work with the system that the system itself does not know.

The compounding effect.

Most workflows get harder as projects accumulate. This one gets easier. Every project starts with a more thoroughly documented base, every memory entry refines how Claude collaborates with me, every CHANGELOG entry is one more bit of recall the next session opens with. The more I build, the less I have to re-explain. I have done all of this on the $20/month Claude Pro plan, because the context layer does the work the tokens would otherwise have to.

Part Two

The Proof.

What that methodology has produced. Hardware, network, services.

07 · The real unlock

Context is the product.

The reason this works is not that Claude is smart. Claude is smart in a generic, internet-shaped way. What is specific is the context layer about my situation loaded on every session: my network, my hardware, my constraints, my family, my workflows, my past decisions and why I made them. The methodology in Part One is how that context layer gets built and kept honest. Everything in Part Two is what it makes possible. The mindset shift is small and load-bearing:

Wrong question

"How should I set up and run Home Assistant?"

Right question

"I want to understand what's possible. I've seen folks mention bare metal, or Proxmox. I have a $500 budget but don't know how best to spend it. Ask me questions about my goals and help me shape what's right for me."

The first gets you a Reddit thread and a five-hour rabbit hole. The second gets you a deployment plan that fits your actual constraints. Three worked examples, two homelab, one from outside the genre to show this generalizes.

"My internet keeps dropping at 9pm."

Homelab

No context

Generic checklist. Reboot the router. Check for firmware. Run a speedtest. Maybe it's congestion. Maybe it's your ISP.

Context loaded

Run this query against your router logs for the 8:30–9:30 window. Cross-reference the WAN-latency Grafana panel. Three of your IoT plugs phone home at the top of the hour: check whether the timing correlates. If it does, here's the AdGuard rewrite that pins that vendor's flaky DNS to a known-good IP.

"I want to add a smart lock."

Homelab

No context

Compare Z-Wave, Zigbee, Matter, Thread, vendor clouds. Five hours of forum reading. Pick wrong, return it.

Context loaded

You already have a Zigbee coordinator and an IoT VLAN with outbound blocked and an HA dual-home that reaches in. Zigbee is right for you: same radio, no new infra, no cloud, no firewall exception. Here are three locks that pair cleanly. Add it to the IoT VLAN, expose the entity through HA, done.

"HSA or 401k bump this year?"

Outside the genre

No context

Generic advice. Both are tax-advantaged. Depends on your situation.

Context loaded

Your marginal bracket is X, state tax Y, expected medical spend Z on a high-deductible plan. Your employer matches 4%; you're at 6%, clearing the match. The HSA has a triple-tax advantage you're not getting elsewhere. Bump the HSA first, revisit the 401k in Q4 when your bonus lands.

Same lesson every time. The methodology in Part One is how you build the context layer. The rest of Part Two is what that layer makes possible at home.

08 · Stack at a glance

By the numbers.

A snapshot of the current state.

Version-controlled

308

Git commits in the homelab repo

12

Living docs, one per domain

10

Adversarially-reviewed specs completed

Running today

~2,000

Home Assistant entities

33

Uptime monitoring probes

5

Grafana dashboards in git

$0

Recurring monthly cost. No subscriptions, no cloud bill, self-hosted on hardware already owned.

09 · Hardware

Hardware and physical architecture

Two-rack layout, primary up top, secondary below. The top rack carries the networking backbone: switches, patch panels, and the NVR. The bottom rack holds the two compute nodes, the UPS, and a tool drawer.

I had Claude help me research what to buy, how to install, and where to logically place (and place again... many times as I kept adding devices) everything.

Fig. 01 · The Plan
Rack elevation plan: two-rack layout with networking backbone, NVR, compute nodes, and UPS
Rack layout, planning artifact. Drawn before anything was bolted in. Every U-position assigned, every device placed by function and access need.
Fig. 02 · The Build
Photograph of the built two-rack homelab
Same rack, installed. Top rack stacks the networking backbone; bottom holds the compute nodes, UPS, and a tool drawer.

From sketch to shipping. The plan held. Nothing on the right doesn't appear on the left.

▸ Compute

Two nodes, two purposes.

HP Elite Mini 800 G9

CPU

Intel Core i5-13500

Memory

48 GB DDR5

Storage

2 TB Samsung 990 Pro · boot
2 TB Samsung 990 Pro · PBS datastore

Uplink

2.5G RJ45 to core switch

RoleHome Assistant VM · AdGuard primary (LXC) · Proxmox Backup Server VM

Minisforum MS-02 Ultra

CPU

Intel Core Ultra 9 285HX

Memory

64 GB DDR5

Storage

2 TB Samsung 990 Pro · boot
4 TB WD BLACK SN7100 · data

Uplink

10G SFP+ DAC to core switch

RoleServices VM (Docker host, iGPU passthrough) · AdGuard secondary (LXC)
Both nodes run Proxmox VE 9.x. Everything below groups the rest of the stack by function: power, storage, and surveillance.

▸ Power

Always on.

UPS

Unit

CyberPower CP2000PFCRM2U · 2U rackmount · sine-wave

Shutdown

NUT triggers a graceful shutdown on sustained outage

RoleRides out blips, lands the rack softly on long outages

Feed

Circuit

Dedicated 20A circuit to the rack

Distribution

1U rackmount PDU feeding peripherals

RoleOne clean source, no shared breakers with the rest of the house

▸ Storage

Three paths.

System + backup

Boot

2 TB Samsung 990 Pro per node · OS and VM disks

Backup datastore

2 TB 990 Pro on G9 · Proxmox Backup Server target

RoleFast local boot, deduplicated backups next door

Bulk data

Drive

4 TB WD BLACK SN7100 on Ultra at /mnt/data

Holds

Frigate clips + the Immich photo library

RoleThe big, write-heavy tier kept off the boot drives

▸ Surveillance

Cameras, kept local.

Cameras + NVR

Cameras

12 Reolink PoE cameras

NVR

Reolink RLN16-410 · 24/7 local recording

Isolation

No internet · only HA and admins can reach in

RoleEyes on the property with nothing leaving the house

Frigate

AI layer

Frigate on the Services VM · all 12 cameras at native resolution

Detection

Arc 140T iGPU · Arcface facial recognition · semantic search

Events

MQTT to Home Assistant · detections drive automations

RoleThe smart layer that turns footage into triggers

10 · Network and VLANs

One tree, four VLANs.

The topology is a tree. The ISP feeds the router, the router feeds the core switch, and the core switch feeds every branch in the house.

WAN
Fiber ISP Fiber uplink
Fiber
Router
Firewalla Gold Plus Firewall · DHCP · WireGuard
2.5G uplink
Core Switch
Zyxel XMG1915-18EP 2.5G + 10G · all VLAN trunking
Switched distribution

PoE · Direct

Wi-Fi Access Points

2× GWN7674 · 2× GWN7665

Wi-Fi 7 + Wi-Fi 6E

All VLAN SSIDs

PoE

Security Switch

Netgear GS316EPP

Cameras + NVR only

VLAN ZNo internet

PoE

Panel Trunk

Netgear GS308EP

Living room, master bed, dashboard tablet

VLAN AVLAN Y

2.5G RJ45

HP Elite Mini 800 G9

Proxmox node 1 · dual-homed

Home Assistant, AdGuard primary

VLAN AVLAN X

10G SFP+ DAC

Minisforum MS-02 Ultra

Proxmox node 2

Services VM, AdGuard secondary

VLAN X

Access

Office Switch

Netgear MS305

Unmanaged 2.5G · single-VLAN

VLAN Y

▸ Segmentation

Four VLANs.

X · Management

Infrastructure only.

Switches, APs, router, hypervisors, HA, AdGuard. No client devices. Ever. Dedicated management drop to office available.

Y · Main

The daily segment.

Family devices, work laptops, gaming, phones. Internet allowed. Inter-VLAN access only for admin devices.

Z · Cameras

No outbound internet.

Reolink fleet + NVR, except scoped push-notification endpoints for alerts. Only HA and admin devices can reach in.

A · IoT

Also, no outbound internet.

Smart plugs, TVs, anything IoT. HA reaches in via a dual-home interface; the devices cannot reach out.

VLAN A is the architectural answer to "I want a smart thing but I don't want it phoning home." The device works locally with HA and is cut off from the internet entirely. That is the rule, not the exception.

HA dual-home.

Home Assistant runs as a single VM but sits on two VLANs at once, management plus IoT, via a tagged virtual interface. The router-on-a-stick pattern applied to a hypervisor guest. It lets HA see local broadcast traffic (mDNS, SSDP) on both networks natively without weakening segmentation. No cross-VLAN reflector to maintain. HA is just on both segments.

Inter-VLAN policy at the firewall.

L3 enforcement happens on the router. Cameras cannot initiate outbound except to the few explicitly allowed endpoints. IoT devices cannot initiate outbound at all. The guest VLAN, when used, is fully isolated. The default for new things is closed.

Remote access via VPN, not exposed services.

Nothing on the homelab is publicly reachable. To hit HA, Frigate, or the reverse proxy from outside, I connect through WireGuard on the router and land inside the home network on my own VPN subnet. Internal hostnames resolve via the home AdGuard instances over the tunnel, and the wildcard TLS cert still validates (issued via DNS-01 against a Cloudflare-hosted apex, no public port exposure). Four devices have WireGuard profiles. Zero services on the public internet. Full remote access whenever I want it.

11 · Software stack

What runs on it.

▸ Virtual machines

Node 1 · G9

Home Assistant OS

Dual-homed on management + IoT. Zigbee (ZBT-2) + Z-Wave (ZWA-2) radios via USB.

Node 1 · G9

Proxmox Backup Server

Dedicated 2 TB datastore. 7×daily / 4×weekly / 6×monthly retention, all four guests daily.

Node 2 · Ultra

Services VM

Debian 13, 28 GB / 8 vCPU. Arc 140T iGPU passed through for decode + OpenVINO + CLIP.

▸ Containers (LXC)

Node 1 · G9

AdGuard Home (primary)

Network-wide DNS filtering. Firewalla WAN primary DNS.

Node 2 · Ultra

AdGuard Home (secondary)

Secondary failover, synced from primary every 15 minutes.

▸ Containers on the Services VM

Surveillance

Frigate + Mosquitto

OpenVINO detector on the iGPU, VA-API decode for all 12 streams, ArcFace large, semantic search.

Media

Immich

Local photo library + Postgres + ML on CPU. Immich Kiosk drives the hallway tablet.

Edge / proxy

Nginx Proxy Manager

16 internal HTTPS hosts behind a Let's Encrypt wildcard via DNS-01.

Observability

Prometheus · Grafana · Loki

cAdvisor, node_exporter on every host, Blackbox, 3 Promtails. 5 Grafana dashboards in git.

Monitoring

Uptime Kuma 2.x

33 monitors across 5 groups, webhook for Slack alerts.

DNS sync · Tools

adguardhome-sync + IT Tools

Primary → secondary every 15 min, plus a small static-site stack.

The pattern: every container here is a git-backed compose stack. The compose file lives in the repo. A deploy script rsyncs the tree, resolves secrets from the AI vault at runtime, and runs docker compose up -d. No hand-edited compose on the host. Lose the VM, git pull on a fresh one and redeploy.

12 · What this unlocks

A foundation, not a finish line.

The hard part is done. VLAN segmentation, a hypervisor on each node, dual-home networking, local AI hardware, and a living documentation system mean every future project plugs into a foundation instead of starting from scratch.

Local AI-powered surveillance

Frigate on an integrated GPU. ArcFace face recognition on the large model. CLIP semantic search. ~5,000 events a day at sub-100 ms p99 inference. Zero cloud.

Debug-anything observability

Loki for logs, Prometheus for metrics, Grafana for dashboards, Uptime Kuma and Blackbox for health. Most investigations resolve in one query.

Ad and tracker blocking

Two AdGuard nodes answer DNS for the whole house, so every phone, TV, and tablet gets the same filtering with no app and no account to set up.

Infinite IoT without risk

Every new device lands on the IoT VLAN with no outbound internet, talks only to HA, and inherits the security posture. Add fifty more; none can see the family network.

Source-of-truth recovery

Rebuild from docs, not memory. The current state of the homelab is reconstructable from my private repo on Github.

Complexity without burden

Every future session opens with the full picture. The docs grow with the system. The more there is, the more useful they become.

13 · The journey

Five months of confirmed changes.

Five months ago I hadn’t done literally any of this. I didn’t even know what Proxmox or Home Assistant were. I’d heard of Docker, knew people did “homelab” stuff, but had zero real exposure to any of it. I’ve completely replaced my 1–2 hours of gaming every night (a 20-plus-year habit, we can fairly call it an addiction) with learning and building in my homelab with Claude. Every item below was the first time I did it.

February - March 2026

Foundation

Migrated all docs from scratchpads and Gemini gems into structured markdown in a single repo. Authored CLAUDE.md. Set up the change-workflow protocol. The first week of typing up what was already running was the most boring part. Everything since has compounded from it.

Bare-metal Home Assistant → Proxmox VM

First virtualization project. Took HA from bare metal to one VM among several on a hypervisor. Zero downtime. Dual-Opus planned. Full restore from snapshot.

Hardening & AI workflow maturity

Full security audit. HA MCP integration (read-only, deny rules on every write tool). Migrated HA automations and dashboards to git-backed YAML with their own deploy scripts.

April 2026

Services emerge

Frigate, Immich, the reverse proxy, Uptime Kuma, all deployed in quick succession on the same VM. The Services VM went from empty to running the whole stack in a few weeks.

The cluster era

Added a second Proxmox node (Minisforum MS-02 Ultra) over a 10G SFP+ DAC. Brought up a secondary AdGuard for DNS failover. Stood up a real Proxmox Backup Server on its own drive. Then physically replaced the original G9 boot drive. Fresh hypervisor install, networking rebuilt, every VM restored, every service revalidated end to end. The kind of “tear down the foundation under a running house” project I’d have paid someone else to do six months earlier.

May 2026

The big cutover: Services from G9 to Ultra

Migrated the entire Services VM and everything on it from G9 to Ultra: Frigate (Coral TPU passthrough following it), Immich (the 4 TB NVMe physically moving sockets between machines), the reverse proxy, Uptime Kuma, MQTT, the works. The static IP and MAC followed the VM, so every external reference updated atomically. Not a single piece of the family-facing stack went dark for more than a planned-restart window.

(I genuinely didn’t think I could pull that one off until I was halfway through it.)

NUT + UPS graceful shutdown

Coordinated power-loss handling across both nodes. Threshold-based forced-shutdown trigger. State-aware Slack and mobile-push alerts. Six UPS sensors live in HA.

Observability stack + log aggregation

Prometheus, Grafana, Loki, three Promtails, blackbox probes, Uptime Kuma scrape, Slack alerting. The “first place I look when something seems wrong” became a single dashboard.

Frigate iGPU passthrough

Arc 140T bound to VFIO on the Ultra host, passed through to the Services VM. OpenVINO detector replaced the Coral TPU. Dual-input architecture so 10 of 12 cameras record at main-stream quality while detecting on the sub-stream.

Access standardization

One SSH key in the default 1Password vault, serving all four homelab hosts plus GitHub. One vault rule. Deploy scripts that self-export their token so they work from any shell. NOPASSWD sudo for automated deploys; interactive sudo still prompts.

Ongoing

None of the projects above ran on a finished workflow. The workflow itself kept getting refined in parallel. Every time a session went sideways I figured out why and wrote the fix into CLAUDE.md. Every time a pattern earned its keep, I encoded it the same way. The version you’re reading about isn’t what I started with in March. It’s what survived three months of real use telling me what worked and what didn’t.

Part Three

A Starter Pack.

A drop-in prompt, a doc skeleton, a repo to fork, and the practices that compound.

14 · The starter pack

Build your own version.

The prompt isn’t the magic. The context you build is.

You can copy the prompt below verbatim and get a head start. But the prompt is the spark, not the engine. The engine is the documentation system you build with Claude on the other side of that first conversation. Treat the prompt as the seed.

Step 0

Decide what you’re documenting.

Pick one scope. Not all of them.

  • Home network and smart home.
  • A single server (your daily-driver, a homelab box, a Mac mini doing dev work).
  • A side project codebase.

Don’t try to document your whole digital life on day one. Pick the system you actually touch most often. Build the muscle there. Expand later.

The drop-in prompt

Paste this into a fresh Claude conversation.

Fill in the bracketed sections honestly. The more specific you are, the more tailored the response.

Drop-in prompt
I want to start working with you the way a product architect collaborates
with an engineering team - I describe what I want, you do the execution
via tools, I review every change before it gets committed. (If you don't
have tool access in this conversation, help me plan it and I'll run things
myself.)

Before we do anything, help me set up the foundation. Here's my context:

ROLE: [your job / how technical you are / how you learn best]

SCOPE: [the one system you're going to document and operate together -
        home network, a single server, a codebase, etc.]

GOALS: [what you want this system to do well over the next 6 months]

CONSTRAINTS: [budget, time, family/WAF, anything you won't compromise on,
              anything you can't change]

WHAT EXISTS TODAY: [hardware you own, services that run, accounts you use,
                    how it's currently set up - rough is fine]

WHAT I WANT HELP WITH FIRST: [the thing that pushed me to do this today]

Start by asking me at least five questions about my context, goals, or
constraints that would change your recommendation. Push on things I
haven't told you. Assume I don't know what I don't know - the questions
are how I find out what I should be thinking about. In your first reply,
only ask the questions. Don't propose anything yet.

Once I've answered, propose three things:
  1. A CLAUDE.md skeleton - section headings + one-line descriptions of
     what goes in each, tailored to my context.
  2. A starting file index - which living docs make sense for my scope.
  3. Five "critical safety rules" relevant to my context - operational
     footguns you'd want to be warned about unprompted in future sessions.

Don't write the full files yet. Show me the structure first. We'll iterate
before anything gets created.

The single most useful move

Tell Claude to ask you questions.

Most people use AI like a search engine. One shot, one answer. The bigger unlock is treating it like a senior practitioner who’s about to scope a project with you: ask it to ask you questions before recommending anything. This is the move that does the most work whenever I’m in territory where I don’t know what I don’t know.

At the end of any prompt where you’re not sure you’ve framed the problem right, add: “before you answer, ask me five questions about what you’d need to know to give me a better recommendation.” The questions reframe the whole thing more often than you’d expect.

Suggested CLAUDE.md skeleton

Section headings, and what goes in each.

Personas

Who uses this system. Your technical level. The family or team members who’ll touch it. How you want explanations framed.

Core Principles

The non-negotiables. Privacy posture, cost ceiling, reliability priorities, family-friendliness rules.

Critical Safety Rules

The operational footguns. Specific. Each rule names an artifact (a port, a file, a host) and a consequence. Include the why.

Credential Handling

Where secrets live. Which secrets the AI can read and which it can’t. The boundary, in writing.

Access Matrix

Every host or service the AI might touch. The user. The connection pattern. The privilege model.

File Index

Every doc in the project with a one-line description. The AI uses this to route lookups without re-reading everything.

Document Update Protocol

When docs change, how they change, who decides. The “surgical edits only” and “no updates during brainstorming” rules go here.

Change Workflow

The loop. Confirmed change, surgical edit, changelog entry, review, commit, push.

First-week practices

Build the muscle early.

  • Open a CHANGELOG.md on day one. Add an entry every time you confirm a change.
  • Commit every change. Even the boring ones. Especially the boring ones.
  • Never let a fact live only in conversation context. If it’s a real fact, it goes into the docs.
  • Surgical edits only. Don’t let Claude rewrite a whole section it didn’t need to touch.
  • Keep secrets out of git from day one, even on a private repo. Set up the vault pattern early. Migrating later is annoying.

Common pitfalls

What to avoid.

  • Secrets in committed YAML. Even on a private repo. Future-you will share that repo with someone and forget what’s in it. Use the vault pattern from day one.
  • Vague safety rules. “Be careful with this” is not a safety rule. “Before changing the VLAN on port N, warn me, losing the tagged VLAN silently breaks IoT” is a safety rule.
  • “I’ll document it later.” No. Document it now or it doesn’t exist.
  • Letting the AI improvise multi-step changes. Spec it. Then run it. The 10 minutes you spend on a spec saves the 4 hours of recovery from an out-of-order step.
  • Reading the full CHANGELOG every session. Grep first. Read the range. Context windows are finite.
  • Buy once, cry once. If you’re even remotely interested in homelab tinkering, do your research and go bigger than you think you should. I know a guy who could have saved some money making smarter first purchase decisions.

Get a head start

Or fork the whole thing.

You don’t have to start from a blank page. Everything I’ve described is a public repo: the real docs from this house with the secrets stripped, plus a blank version of each to fill in as you go.

The docs are the system.
Build them as you build it.