Commit Graph

272 Commits

Author SHA1 Message Date
6fecfb3ee6 refactor: add Python bootstrap controller with resumable state
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 17s
Introduce a clean orchestration layer in nixos/kubeadm/bootstrap/controller.py and slim rebuild-and-bootstrap.sh into a thin wrapper. The controller now owns preflight, rebuild, init, CNI install, join, and verify stages with persisted checkpoints on cp-1 plus a local state copy for CI debugging.
2026-03-03 00:09:10 +00:00
355273add5 fix: preserve kube-vip mount path and only swap hostPath to super-admin
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 19s
The previous replacement changed both mountPath and hostPath, causing kube-vip to lose its expected in-container kubeconfig path and exit. Keep mountPath at /etc/kubernetes/admin.conf, swap only hostPath during bootstrap, and enable kube-vip debug log level.
2026-03-02 23:59:41 +00:00
262e9eb4d7 fix: bootstrap kube-vip without leader election
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 17s
Run first-control-plane kube-vip manifest without --leaderElection so VIP can bind before API/RBAC are fully available. Also print kube-vip container exit details on failure.
2026-03-02 23:28:44 +00:00
c445638d4a fix: run kube-vip in control-plane-only mode during bootstrap
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 17s
Remove --services from kube-vip static pod manifests for init/join. Service LB mode can crash-loop during kubeadm bootstrap before cluster RBAC is ready, which prevented VIP binding.
2026-03-02 22:52:44 +00:00
880bbcceca ci: speed up Terraform plan by skipping refresh in pipelines
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 16s
Use terraform plan -refresh=false in plan/apply workflows to avoid slow Proxmox state refresh on every push. This keeps CI fast while preserving apply behavior from the generated plan.
2026-03-02 22:32:10 +00:00
190dc2e095 fix: restore compatibility with older nixos-rebuild sudo flag
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
Use --use-remote-sudo in rebuild script since the runner's nixos-rebuild does not support --sudo yet.
2026-03-02 22:30:38 +00:00
a81799a2b5 fix: stabilize kubeadm bootstrap and reduce Proxmox plan latency
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
Move kubeadm reset ahead of kube-vip manifest generation, use super-admin.conf during bootstrap for kube-vip, and restore admin.conf after init. Also switch nixos-rebuild to --sudo and make QEMU guest agent optional so Terraform plan can skip slow guest-agent refreshes when it is not installed.
2026-03-02 22:09:10 +00:00
46c0786e57 fix: run kube-vip daemon before kubeadm init
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 10m8s
- Start kube-vip as a detached container to claim VIP before kubeadm init
- Wait for VIP to be bound before proceeding
- Generate static pod manifest for kube-vip
- Stop bootstrap kube-vip after API server is healthy (static pod takes over)
- Add kube-vip logs output if VIP fails to bind
2026-03-02 20:39:28 +00:00
1af45ca51e fix: skip kubeadm wait-control-plane phase, wait for VIP manually
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
- Use --skip-phases=wait-control-plane to avoid 4-minute timeout
- Wait for kube-vip to bind VIP before checking API server health
- Add kube-vip logs and VIP status to debug output
2026-03-02 19:37:06 +00:00
533f5a91e0 fix: add image pre-pull and debug output for kubeadm init
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
- Pre-pull k8s control plane images before init to speed up startup
- Add crictl pods and crictl ps -a output on failure for debugging
2026-03-02 18:35:41 +00:00
c061dda31d fix: disable webhook authz and clean stale kubelet configs
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
- Add authorization.mode: AlwaysAllow to KubeletConfiguration
- Remove stale kubelet config.yaml before unmasking in all kubeadm scripts
- This prevents 'no client provided, cannot use webhook authorization' error
2026-03-02 17:59:31 +00:00
fb21fbef4f fix: disable kubelet webhook auth in kubeadm init config
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
- Use explicit kubeadm config file with KubeletConfiguration
- Disable webhook authentication which was causing 'no client provided' error
- Add ConditionPathExists to kubelet systemd unit
2026-03-02 16:49:21 +00:00
1b76e07326 fix: kubelet directories and containerd readiness
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
- Create /var/lib/kubelet and /var/lib/kubelet/pki directories via tmpfiles
- Ensure containerd is running before kubeadm init
- Add kubelet logs output on kubeadm init failure for debugging
2026-03-02 14:44:47 +00:00
db72dcab75 fix: remove kubelet ConditionPathExists, add daemon-reload
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
- Remove ConditionPathExists from kubelet service definition as it
  prevents kubelet from starting when managed by kubeadm
- Add systemctl daemon-reload after unmasking in all kubeadm scripts
- Add reset-failed for consistent state cleanup
2026-03-02 13:58:49 +00:00
d42e83358c fix: mask kubelet before rebuild, unmask in kubeadm helpers
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
- Mask kubelet service entirely before nixos-rebuild to prevent systemd
  from restarting it during switch
- Unmask kubelet in th-kubeadm-init/join scripts before starting
2026-03-02 12:44:40 +00:00
93e43a546f fix: prevent kubelet auto-start during rebuild
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
Add wantedBy = [] to prevent kubelet from being started by multi-user.target
during nixos-rebuild switch. This allows rebuilds to succeed even when the
cluster is in a transitional state. Kubelet will be started by kubeadm
init/join commands instead.
2026-03-02 12:13:05 +00:00
ab5cc8b01d fix: disable lingering kubelet service before node rebuild
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
2026-03-02 10:08:27 +00:00
f65a414959 fix: stop auto-enabling kubelet during base node rebuild
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 10m8s
2026-03-02 09:13:53 +00:00
7c849ed019 fix: gate kubelet startup until kubeadm config exists
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
2026-03-02 08:39:22 +00:00
388b0c4f5d fix: align kubelet systemd unit with kubeadm flags
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
2026-03-02 03:44:35 +00:00
d810547675 fix: ignore kubeadm HTTPProxyCIDR preflight in homelab workflow
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
2026-03-02 03:06:29 +00:00
9426968cd4 fix: run kubeadm init/reset with clean environment
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
2026-03-02 02:36:57 +00:00
02a6bca60b fix: harden kubeadm scripts for proxy and preflight issues
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
2026-03-02 02:02:38 +00:00
a098c0aa29 fix: avoid sudo env loss for kube-vip image reference
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 10m8s
2026-03-02 01:27:44 +00:00
9b03cec23e fix: correctly propagate remote command exit status
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 10m7s
2026-03-02 00:52:24 +00:00
c794e07ab2 chore: trigger workflows
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 10m7s
2026-03-02 00:18:10 +00:00
fd7be1a428 fix: require admin kubeconfig before skipping cp init
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 10m8s
2026-03-01 23:42:56 +00:00
c0b820c92a Merge branch 'master' into stage
Some checks are pending
Terraform Plan / Terraform Plan (push) Waiting to run
2026-03-01 22:40:05 +00:00
f9e7356f94 fix: make cp-1 init detection and join token generation robust
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 9m44s
2026-03-01 21:56:59 +00:00
27185ed17a Merge pull request 'fix: recover when admin kubeconfig is missing on primary control plane' (#72) from stage into master
All checks were successful
Terraform Apply / Terraform Apply (push) Successful in 19m30s
Reviewed-on: #72
2026-03-01 21:30:33 +00:00
9baf35d886 Merge branch 'master' into stage
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 10m7s
2026-03-01 21:30:28 +00:00
a5f0f0a420 fix: recover when admin kubeconfig is missing on primary control plane
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 10m7s
2026-03-01 20:58:44 +00:00
310d273378 Merge pull request 'fix: use admin kubeconfig for final cluster node check' (#71) from stage into master
All checks were successful
Terraform Apply / Terraform Apply (push) Successful in 19m16s
Reviewed-on: #71
2026-03-01 20:38:17 +00:00
661fbc2ff4 fix: use admin kubeconfig for final cluster node check
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 10m7s
2026-03-01 20:31:57 +00:00
3b0219f211 Merge pull request 'feat: add SSH-based fallback for kubeadm IP inventory' (#70) from stage into master
All checks were successful
Terraform Apply / Terraform Apply (push) Successful in 20m6s
Reviewed-on: #70
2026-03-01 20:07:55 +00:00
3fa227d7c9 feat: add SSH-based fallback for kubeadm IP inventory
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 10m7s
2026-03-01 19:28:15 +00:00
61db9a26d9 Merge pull request 'fix: retry kubeadm inventory generation until VM IPs appear' (#69) from stage into master
Some checks failed
Terraform Apply / Terraform Apply (push) Failing after 12m43s
Reviewed-on: #69
2026-03-01 19:04:05 +00:00
8f915201e3 Merge branch 'master' into stage
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 10m6s
2026-03-01 18:46:59 +00:00
a933341c28 fix: retry kubeadm inventory generation until VM IPs appear
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
2026-03-01 18:42:18 +00:00
f90e971fab Merge pull request 'fix: fail fast when terraform node IP outputs are empty' (#68) from stage into master
Some checks failed
Terraform Apply / Terraform Apply (push) Failing after 10m9s
Reviewed-on: #68
2026-03-01 18:07:20 +00:00
920c0c10b8 Merge branch 'master' into stage
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 10m6s
2026-03-01 18:07:02 +00:00
718a9930e8 fix: fail fast when terraform node IP outputs are empty
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
2026-03-01 18:01:09 +00:00
a9f6153623 Merge pull request 'fix: auto-detect kube-vip interface and tighten SSH fallback' (#67) from stage into master
Some checks failed
Terraform Apply / Terraform Apply (push) Failing after 11m28s
Reviewed-on: #67
2026-03-01 17:35:34 +00:00
9edb8f807d Merge branch 'master' into stage
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 10m5s
2026-03-01 17:34:57 +00:00
7ec1ce92cf fix: auto-detect kube-vip interface and tighten SSH fallback
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
2026-03-01 17:34:09 +00:00
198f0e2910 Merge pull request 'stage' (#66) from stage into master
Some checks failed
Terraform Apply / Terraform Apply (push) Failing after 50m2s
Reviewed-on: #66
2026-03-01 13:55:31 +00:00
88db11292d fix: fallback SSH user per host during bootstrap steps
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 10m6s
2026-03-01 13:34:15 +00:00
8bd064c828 fix: keep micqdf user during kubeadm node rebuilds
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
2026-03-01 13:31:46 +00:00
364d407fb7 Merge pull request 'fix: avoid in-place VM updates on unreliable provider' (#65) from stage into master
Some checks failed
Terraform Apply / Terraform Apply (push) Failing after 55m11s
Reviewed-on: #65
2026-03-01 03:58:10 +00:00
c8771b897c Merge branch 'master' into stage
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 15s
2026-03-01 03:57:40 +00:00