refactor: simplify homelab bootstrap around static IPs and fresh runs

Make Terraform the source of truth for node IPs, remove guest-agent/SSH discovery from the normal workflow path, simplify the bootstrap controller to a fresh-run flow, and swap the initial CNI to Flannel so cluster readiness is easier to prove before reintroducing more complex reconcile behavior.
2026-03-07 00:52:35 +00:00
parent e06b2c692e
commit a0b07816b9
9 changed files with 78 additions and 177 deletions
--- a/nixos/kubeadm/README.md
+++ b/nixos/kubeadm/README.md
@@ -50,7 +50,7 @@ sudo nixos-rebuild switch --flake .#cp-1
 For remote target-host workflows, use your preferred deploy wrapper later
 (`nixos-rebuild --target-host ...` or deploy-rs/colmena).

-## Bootstrap runbook (kubeadm + kube-vip + Cilium)
+## Bootstrap runbook (kubeadm + kube-vip + Flannel)

 1. Apply Nix config on all nodes (`cp-*`, then `wk-*`).
 2. On `cp-1`, run:
@@ -62,14 +62,10 @@ sudo th-kubeadm-init
 This infers the control-plane VIP as `<node-subnet>.250` on `eth0`, creates the
 kube-vip static pod manifest, and runs `kubeadm init`.

-3. Install Cilium from `cp-1`:
+3. Install Flannel from `cp-1`:

 ```bash
-helm repo add cilium https://helm.cilium.io
-helm repo update
-helm upgrade --install cilium cilium/cilium \
-  --namespace kube-system \
-  --set kubeProxyReplacement=true
+kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/v0.25.5/Documentation/kube-flannel.yml
 ```

 4. Generate join commands on `cp-1`:
@@ -98,7 +94,7 @@ kubectl get nodes -o wide
 kubectl -n kube-system get pods -o wide
 ```

-## Repeatable rebuild flow (recommended)
+## Fresh bootstrap flow (recommended)

 1. Copy and edit inventory:

@@ -107,7 +103,7 @@ cp ./scripts/inventory.example.env ./scripts/inventory.env
 $EDITOR ./scripts/inventory.env
 ```

-2. Rebuild all nodes and bootstrap/reconcile cluster:
+2. Rebuild all nodes and bootstrap a fresh cluster:

 ```bash
 ./scripts/rebuild-and-bootstrap.sh
@@ -141,15 +137,15 @@ For a full nuke/recreate lifecycle:
 - run Terraform destroy/apply for VMs first,
 - then run `./scripts/rebuild-and-bootstrap.sh` again.

-Node lists are discovered from Terraform outputs, so adding new workers/control
-planes in Terraform is picked up automatically by the bootstrap/reconcile flow.
+Node lists now come directly from static Terraform outputs, so bootstrap no longer
+depends on Proxmox guest-agent IP discovery or SSH subnet scanning.

 ## Optional Gitea workflow automation

 Primary flow:

 - Push to `master` triggers `.gitea/workflows/terraform-apply.yml`
- That workflow now does Terraform apply and then runs kubeadm rebuild/bootstrap reconciliation automatically
+- That workflow now does Terraform apply and then runs a fresh kubeadm bootstrap automatically

 Manual dispatch workflows are available:

@@ -164,9 +160,7 @@ Required repository secrets:
 Optional secrets:

 - `KUBEADM_SSH_USER` (defaults to `micqdf`)
- `KUBEADM_SUBNET_PREFIX` (optional, e.g. `10.27.27`; used for SSH-based IP discovery fallback)
-
-Node IPs are auto-discovered from Terraform state outputs (`control_plane_vm_ipv4`, `worker_vm_ipv4`), so you do not need per-node IP secrets.
+Node IPs are rendered directly from static Terraform outputs (`control_plane_vm_ipv4`, `worker_vm_ipv4`), so you do not need per-node IP secrets or SSH discovery fallbacks.

 ## Notes