Commit Graph

66 Commits

Author SHA1 Message Date
cbb8358ce6 fix: ensure kubelet is enabled for kubeadm init node registration
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 17s
Enable kubelet before kubeadm init and stop forcing kubelet out of wantedBy so kubeadm can reliably register the node during upload-config/kubelet. Also clear stale kubelet config files during remote prep to avoid restart-loop leftovers.
2026-03-03 01:04:50 +00:00
a16112a87a fix: rebuild nodes by default on reconcile
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 17s
Do not skip node rebuilds unless SKIP_REBUILD=1 is explicitly set. This prevents stale remote helper scripts from being reused across retries after bootstrap logic changes.
2026-03-03 00:34:55 +00:00
51b56e562e fix: use valid kube-vip log flag value
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 17s
kube-vip expects an unsigned integer for --log. Replace --log -4 with --log 4 so manifest generation no longer fails during bootstrap.
2026-03-03 00:25:25 +00:00
6fecfb3ee6 refactor: add Python bootstrap controller with resumable state
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 17s
Introduce a clean orchestration layer in nixos/kubeadm/bootstrap/controller.py and slim rebuild-and-bootstrap.sh into a thin wrapper. The controller now owns preflight, rebuild, init, CNI install, join, and verify stages with persisted checkpoints on cp-1 plus a local state copy for CI debugging.
2026-03-03 00:09:10 +00:00
355273add5 fix: preserve kube-vip mount path and only swap hostPath to super-admin
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 19s
The previous replacement changed both mountPath and hostPath, causing kube-vip to lose its expected in-container kubeconfig path and exit. Keep mountPath at /etc/kubernetes/admin.conf, swap only hostPath during bootstrap, and enable kube-vip debug log level.
2026-03-02 23:59:41 +00:00
262e9eb4d7 fix: bootstrap kube-vip without leader election
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 17s
Run first-control-plane kube-vip manifest without --leaderElection so VIP can bind before API/RBAC are fully available. Also print kube-vip container exit details on failure.
2026-03-02 23:28:44 +00:00
c445638d4a fix: run kube-vip in control-plane-only mode during bootstrap
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 17s
Remove --services from kube-vip static pod manifests for init/join. Service LB mode can crash-loop during kubeadm bootstrap before cluster RBAC is ready, which prevented VIP binding.
2026-03-02 22:52:44 +00:00
190dc2e095 fix: restore compatibility with older nixos-rebuild sudo flag
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
Use --use-remote-sudo in rebuild script since the runner's nixos-rebuild does not support --sudo yet.
2026-03-02 22:30:38 +00:00
a81799a2b5 fix: stabilize kubeadm bootstrap and reduce Proxmox plan latency
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
Move kubeadm reset ahead of kube-vip manifest generation, use super-admin.conf during bootstrap for kube-vip, and restore admin.conf after init. Also switch nixos-rebuild to --sudo and make QEMU guest agent optional so Terraform plan can skip slow guest-agent refreshes when it is not installed.
2026-03-02 22:09:10 +00:00
46c0786e57 fix: run kube-vip daemon before kubeadm init
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 10m8s
- Start kube-vip as a detached container to claim VIP before kubeadm init
- Wait for VIP to be bound before proceeding
- Generate static pod manifest for kube-vip
- Stop bootstrap kube-vip after API server is healthy (static pod takes over)
- Add kube-vip logs output if VIP fails to bind
2026-03-02 20:39:28 +00:00
1af45ca51e fix: skip kubeadm wait-control-plane phase, wait for VIP manually
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
- Use --skip-phases=wait-control-plane to avoid 4-minute timeout
- Wait for kube-vip to bind VIP before checking API server health
- Add kube-vip logs and VIP status to debug output
2026-03-02 19:37:06 +00:00
533f5a91e0 fix: add image pre-pull and debug output for kubeadm init
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
- Pre-pull k8s control plane images before init to speed up startup
- Add crictl pods and crictl ps -a output on failure for debugging
2026-03-02 18:35:41 +00:00
c061dda31d fix: disable webhook authz and clean stale kubelet configs
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
- Add authorization.mode: AlwaysAllow to KubeletConfiguration
- Remove stale kubelet config.yaml before unmasking in all kubeadm scripts
- This prevents 'no client provided, cannot use webhook authorization' error
2026-03-02 17:59:31 +00:00
fb21fbef4f fix: disable kubelet webhook auth in kubeadm init config
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
- Use explicit kubeadm config file with KubeletConfiguration
- Disable webhook authentication which was causing 'no client provided' error
- Add ConditionPathExists to kubelet systemd unit
2026-03-02 16:49:21 +00:00
1b76e07326 fix: kubelet directories and containerd readiness
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
- Create /var/lib/kubelet and /var/lib/kubelet/pki directories via tmpfiles
- Ensure containerd is running before kubeadm init
- Add kubelet logs output on kubeadm init failure for debugging
2026-03-02 14:44:47 +00:00
db72dcab75 fix: remove kubelet ConditionPathExists, add daemon-reload
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
- Remove ConditionPathExists from kubelet service definition as it
  prevents kubelet from starting when managed by kubeadm
- Add systemctl daemon-reload after unmasking in all kubeadm scripts
- Add reset-failed for consistent state cleanup
2026-03-02 13:58:49 +00:00
d42e83358c fix: mask kubelet before rebuild, unmask in kubeadm helpers
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
- Mask kubelet service entirely before nixos-rebuild to prevent systemd
  from restarting it during switch
- Unmask kubelet in th-kubeadm-init/join scripts before starting
2026-03-02 12:44:40 +00:00
93e43a546f fix: prevent kubelet auto-start during rebuild
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
Add wantedBy = [] to prevent kubelet from being started by multi-user.target
during nixos-rebuild switch. This allows rebuilds to succeed even when the
cluster is in a transitional state. Kubelet will be started by kubeadm
init/join commands instead.
2026-03-02 12:13:05 +00:00
ab5cc8b01d fix: disable lingering kubelet service before node rebuild
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
2026-03-02 10:08:27 +00:00
f65a414959 fix: stop auto-enabling kubelet during base node rebuild
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 10m8s
2026-03-02 09:13:53 +00:00
7c849ed019 fix: gate kubelet startup until kubeadm config exists
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
2026-03-02 08:39:22 +00:00
388b0c4f5d fix: align kubelet systemd unit with kubeadm flags
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
2026-03-02 03:44:35 +00:00
d810547675 fix: ignore kubeadm HTTPProxyCIDR preflight in homelab workflow
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
2026-03-02 03:06:29 +00:00
9426968cd4 fix: run kubeadm init/reset with clean environment
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
2026-03-02 02:36:57 +00:00
02a6bca60b fix: harden kubeadm scripts for proxy and preflight issues
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
2026-03-02 02:02:38 +00:00
a098c0aa29 fix: avoid sudo env loss for kube-vip image reference
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 10m8s
2026-03-02 01:27:44 +00:00
9b03cec23e fix: correctly propagate remote command exit status
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 10m7s
2026-03-02 00:52:24 +00:00
fd7be1a428 fix: require admin kubeconfig before skipping cp init
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 10m8s
2026-03-01 23:42:56 +00:00
f9e7356f94 fix: make cp-1 init detection and join token generation robust
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 9m44s
2026-03-01 21:56:59 +00:00
a5f0f0a420 fix: recover when admin kubeconfig is missing on primary control plane
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 10m7s
2026-03-01 20:58:44 +00:00
661fbc2ff4 fix: use admin kubeconfig for final cluster node check
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 10m7s
2026-03-01 20:31:57 +00:00
3fa227d7c9 feat: add SSH-based fallback for kubeadm IP inventory
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 10m7s
2026-03-01 19:28:15 +00:00
718a9930e8 fix: fail fast when terraform node IP outputs are empty
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
2026-03-01 18:01:09 +00:00
7ec1ce92cf fix: auto-detect kube-vip interface and tighten SSH fallback
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
2026-03-01 17:34:09 +00:00
88db11292d fix: fallback SSH user per host during bootstrap steps
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 10m6s
2026-03-01 13:34:15 +00:00
8bd064c828 fix: keep micqdf user during kubeadm node rebuilds
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
2026-03-01 13:31:46 +00:00
760d0e8b5b perf: speed up first bootstrap with fast-mode defaults
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 1m59s
2026-03-01 03:33:42 +00:00
3bdf3f8d84 feat: convert template-base into k8s-ready VM template
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 16s
2026-03-01 01:24:45 +00:00
dad409a5b7 fix: restore use-remote-sudo for nixos-rebuild compatibility
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 20s
2026-02-28 23:20:12 +00:00
45e818b113 fix: enable nix-command for remote gc and use --sudo
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 21s
2026-02-28 22:55:15 +00:00
f5d9eba9d0 feat: parallelize worker rebuilds with retry and timeout
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
2026-02-28 22:15:48 +00:00
327c07314c fix: reclaim remote nix store space before rebuild
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 13s
2026-02-28 21:24:26 +00:00
3b5d04dda2 fix: force bash for remote kubeadm commands
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 18s
2026-02-28 21:06:35 +00:00
ba912810d1 fix: preconfigure remote nix trusted-users before rebuild
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 12s
2026-02-28 20:25:50 +00:00
5c037d9a99 fix: prefer root SSH for deploy and trust micqdf in nix
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 18s
2026-02-28 20:03:26 +00:00
244887e9c2 fix: auto-detect SSH login user for node operations
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 18s
2026-02-28 19:25:48 +00:00
c94c1f61d8 fix: force explicit SSH identity for kubeadm remote operations
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 18s
2026-02-28 17:16:31 +00:00
046de9b3d4 fix: preseed known_hosts for kubeadm SSH operations
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 19s
2026-02-28 17:07:43 +00:00
5669305e59 feat: make kubeadm workflows auto-scale with terraform outputs
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 19s
2026-02-28 16:43:22 +00:00
f341816112 feat: run kubeadm reconcile after terraform apply on master
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 18s
2026-02-28 16:39:04 +00:00