Commit Graph

37 Commits

Author SHA1 Message Date
3cd0c70727 fix: stop overriding kubelet config in kubeadm init
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 17s
Remove custom KubeletConfiguration from init config so kubeadm uses default kubelet authn/authz settings and bootstrap registration path. This avoids the standalone-style kubelet behavior where the node never appears in the API.
2026-03-04 18:35:34 +00:00
d2dd6105a6 fix: recover from kubeadm CRISocket node-registration race
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 17s
Handle kubeadm init failures where upload-config/kubelet runs before the node object exists. When that specific error occurs, wait for cp-1 registration and run upload-config kubelet phase explicitly instead of aborting immediately.
2026-03-04 03:00:34 +00:00
b3c975bd73 fix: use kubeadm v1beta4 list format for kubeletExtraArgs
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 17s
kubeadm v1beta4 expects nodeRegistration.kubeletExtraArgs as a list of name/value args, not a map. Switch hostname-override to the correct structure so init config unmarshals successfully.
2026-03-04 02:00:07 +00:00
308a2fd4b7 fix: hard reset kubelet identity before kubeadm init
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 17s
Clear kubelet cert/bootstrap artifacts after reset and force hostname override in kubeadm nodeRegistration so the node consistently registers as cp-1 instead of inheriting stale template identity.
2026-03-04 01:35:41 +00:00
0cc0de2aea fix: pin kubeadm init node identity to flake hostname
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 17s
Set hostname before init and inject nodeRegistration.name into kubeadm InitConfiguration so cp-1 registers as the expected node (cp-1) instead of inheriting the template hostname. This fixes upload-config/kubelet failures caused by node lookup for k8s-base-template.
2026-03-04 01:17:44 +00:00
422b7d7f23 fix: force fresh kubeadm init after rebuild and make kubelet enable-able
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 17s
Always re-run primary init when reconcile performs node rebuilds to avoid stale/partial cluster state causing join preflight failures. Also add wantedBy for kubelet so systemctl enable works as expected during join/init flows.
2026-03-04 00:55:20 +00:00
3ebeb121b4 fix: force fresh bootstrap stages after rebuild and stabilize join node identity
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 17s
Clear completed bootstrap stage checkpoints whenever nodes are rebuilt so reconcile does not skip required init/cni/join work on fresh hosts. Also pass explicit --node-name for control-plane and worker joins, and ensure kubelet is enabled before join commands run.
2026-03-04 00:26:37 +00:00
cbb8358ce6 fix: ensure kubelet is enabled for kubeadm init node registration
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 17s
Enable kubelet before kubeadm init and stop forcing kubelet out of wantedBy so kubeadm can reliably register the node during upload-config/kubelet. Also clear stale kubelet config files during remote prep to avoid restart-loop leftovers.
2026-03-03 01:04:50 +00:00
51b56e562e fix: use valid kube-vip log flag value
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 17s
kube-vip expects an unsigned integer for --log. Replace --log -4 with --log 4 so manifest generation no longer fails during bootstrap.
2026-03-03 00:25:25 +00:00
355273add5 fix: preserve kube-vip mount path and only swap hostPath to super-admin
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 19s
The previous replacement changed both mountPath and hostPath, causing kube-vip to lose its expected in-container kubeconfig path and exit. Keep mountPath at /etc/kubernetes/admin.conf, swap only hostPath during bootstrap, and enable kube-vip debug log level.
2026-03-02 23:59:41 +00:00
262e9eb4d7 fix: bootstrap kube-vip without leader election
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 17s
Run first-control-plane kube-vip manifest without --leaderElection so VIP can bind before API/RBAC are fully available. Also print kube-vip container exit details on failure.
2026-03-02 23:28:44 +00:00
c445638d4a fix: run kube-vip in control-plane-only mode during bootstrap
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 17s
Remove --services from kube-vip static pod manifests for init/join. Service LB mode can crash-loop during kubeadm bootstrap before cluster RBAC is ready, which prevented VIP binding.
2026-03-02 22:52:44 +00:00
a81799a2b5 fix: stabilize kubeadm bootstrap and reduce Proxmox plan latency
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
Move kubeadm reset ahead of kube-vip manifest generation, use super-admin.conf during bootstrap for kube-vip, and restore admin.conf after init. Also switch nixos-rebuild to --sudo and make QEMU guest agent optional so Terraform plan can skip slow guest-agent refreshes when it is not installed.
2026-03-02 22:09:10 +00:00
46c0786e57 fix: run kube-vip daemon before kubeadm init
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 10m8s
- Start kube-vip as a detached container to claim VIP before kubeadm init
- Wait for VIP to be bound before proceeding
- Generate static pod manifest for kube-vip
- Stop bootstrap kube-vip after API server is healthy (static pod takes over)
- Add kube-vip logs output if VIP fails to bind
2026-03-02 20:39:28 +00:00
1af45ca51e fix: skip kubeadm wait-control-plane phase, wait for VIP manually
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
- Use --skip-phases=wait-control-plane to avoid 4-minute timeout
- Wait for kube-vip to bind VIP before checking API server health
- Add kube-vip logs and VIP status to debug output
2026-03-02 19:37:06 +00:00
533f5a91e0 fix: add image pre-pull and debug output for kubeadm init
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
- Pre-pull k8s control plane images before init to speed up startup
- Add crictl pods and crictl ps -a output on failure for debugging
2026-03-02 18:35:41 +00:00
c061dda31d fix: disable webhook authz and clean stale kubelet configs
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
- Add authorization.mode: AlwaysAllow to KubeletConfiguration
- Remove stale kubelet config.yaml before unmasking in all kubeadm scripts
- This prevents 'no client provided, cannot use webhook authorization' error
2026-03-02 17:59:31 +00:00
fb21fbef4f fix: disable kubelet webhook auth in kubeadm init config
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
- Use explicit kubeadm config file with KubeletConfiguration
- Disable webhook authentication which was causing 'no client provided' error
- Add ConditionPathExists to kubelet systemd unit
2026-03-02 16:49:21 +00:00
1b76e07326 fix: kubelet directories and containerd readiness
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
- Create /var/lib/kubelet and /var/lib/kubelet/pki directories via tmpfiles
- Ensure containerd is running before kubeadm init
- Add kubelet logs output on kubeadm init failure for debugging
2026-03-02 14:44:47 +00:00
db72dcab75 fix: remove kubelet ConditionPathExists, add daemon-reload
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
- Remove ConditionPathExists from kubelet service definition as it
  prevents kubelet from starting when managed by kubeadm
- Add systemctl daemon-reload after unmasking in all kubeadm scripts
- Add reset-failed for consistent state cleanup
2026-03-02 13:58:49 +00:00
d42e83358c fix: mask kubelet before rebuild, unmask in kubeadm helpers
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
- Mask kubelet service entirely before nixos-rebuild to prevent systemd
  from restarting it during switch
- Unmask kubelet in th-kubeadm-init/join scripts before starting
2026-03-02 12:44:40 +00:00
93e43a546f fix: prevent kubelet auto-start during rebuild
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
Add wantedBy = [] to prevent kubelet from being started by multi-user.target
during nixos-rebuild switch. This allows rebuilds to succeed even when the
cluster is in a transitional state. Kubelet will be started by kubeadm
init/join commands instead.
2026-03-02 12:13:05 +00:00
f65a414959 fix: stop auto-enabling kubelet during base node rebuild
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 10m8s
2026-03-02 09:13:53 +00:00
7c849ed019 fix: gate kubelet startup until kubeadm config exists
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
2026-03-02 08:39:22 +00:00
388b0c4f5d fix: align kubelet systemd unit with kubeadm flags
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
2026-03-02 03:44:35 +00:00
d810547675 fix: ignore kubeadm HTTPProxyCIDR preflight in homelab workflow
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
2026-03-02 03:06:29 +00:00
9426968cd4 fix: run kubeadm init/reset with clean environment
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
2026-03-02 02:36:57 +00:00
02a6bca60b fix: harden kubeadm scripts for proxy and preflight issues
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
2026-03-02 02:02:38 +00:00
a098c0aa29 fix: avoid sudo env loss for kube-vip image reference
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 10m8s
2026-03-02 01:27:44 +00:00
7ec1ce92cf fix: auto-detect kube-vip interface and tighten SSH fallback
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
2026-03-01 17:34:09 +00:00
8bd064c828 fix: keep micqdf user during kubeadm node rebuilds
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
2026-03-01 13:31:46 +00:00
327c07314c fix: reclaim remote nix store space before rebuild
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 13s
2026-02-28 21:24:26 +00:00
5c037d9a99 fix: prefer root SSH for deploy and trust micqdf in nix
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 18s
2026-02-28 20:03:26 +00:00
885a92f494 chore: add lightweight flake checks for kubeadm configs
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 17s
2026-02-28 16:19:37 +00:00
91dd20e60e fix: escape shell expansion in kubeadm helper scripts
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 17s
2026-02-28 16:12:25 +00:00
7206d8cd41 feat: implement kubeadm bootstrap scaffolding for Nix nodes
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 18s
2026-02-28 16:04:14 +00:00
21be01346b feat: refactor infra to cp/wk kubeadm topology
Some checks failed
Terraform Plan / Terraform Plan (push) Failing after 9s
Provision 3 thin control planes and 3 workers with role-specific sizing and VMID ranges (701/711), generate per-node cloud-init snippets with SSH key injection, and add NixOS kubeadm host/module scaffolding for cp-1..3 and wk-1..3.
2026-02-28 14:16:55 +00:00