Set Cilium kubeProxyReplacement from env (default false for homelab stability) and collect cilium daemonset/pod/log diagnostics when rollout times out during verification.
Update verification stage to block on cilium daemonset rollout and all nodes reaching Ready. This prevents workflows from reporting success while the cluster is still NotReady immediately after join.
Append --ignore-preflight-errors=NumCPU,HTTPProxyCIDR to control-plane join commands and HTTPProxyCIDR to worker joins so kubeadm join does not fail on known single-CPU/proxy CIDR checks in this environment.
Always re-run primary init when reconcile performs node rebuilds to avoid stale/partial cluster state causing join preflight failures. Also add wantedBy for kubelet so systemctl enable works as expected during join/init flows.
Clear completed bootstrap stage checkpoints whenever nodes are rebuilt so reconcile does not skip required init/cni/join work on fresh hosts. Also pass explicit --node-name for control-plane and worker joins, and ensure kubelet is enabled before join commands run.
Enable kubelet before kubeadm init and stop forcing kubelet out of wantedBy so kubeadm can reliably register the node during upload-config/kubelet. Also clear stale kubelet config files during remote prep to avoid restart-loop leftovers.
Do not skip node rebuilds unless SKIP_REBUILD=1 is explicitly set. This prevents stale remote helper scripts from being reused across retries after bootstrap logic changes.
Introduce a clean orchestration layer in nixos/kubeadm/bootstrap/controller.py and slim rebuild-and-bootstrap.sh into a thin wrapper. The controller now owns preflight, rebuild, init, CNI install, join, and verify stages with persisted checkpoints on cp-1 plus a local state copy for CI debugging.