Commit Graph

40 Commits

Author SHA1 Message Date
034869347a fix: require kubelet kubeconfig before starting service
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 17s
Inline kubelet bootstrap/kubeconfig flags in ExecStart and gate startup on /etc/kubernetes/*kubelet.conf in addition to config.yaml. This prevents kubelet entering standalone mode with webhook auth enabled when no client config is present.
2026-03-04 20:45:47 +00:00
6b6ca021c9 fix: add kubelet bootstrap kubeconfig args to systemd unit
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 17s
Include KUBELET_KUBECONFIG_ARGS in kubelet ExecStart so kubelet can authenticate with bootstrap-kubelet.conf/kubelet.conf and register node objects during kubeadm init.
2026-03-04 19:26:07 +00:00
ba6cf42c04 fix: restart kubelet during CRISocket recovery and add registration diagnostics
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 16s
When kubeadm init fails at upload-config/kubelet due missing node object, explicitly restart kubelet to ensure bootstrap flags are loaded before waiting for node registration. Add kubelet flag dump and focused registration log output to surface auth/cert errors.
2026-03-04 18:37:50 +00:00
3cd0c70727 fix: stop overriding kubelet config in kubeadm init
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 17s
Remove custom KubeletConfiguration from init config so kubeadm uses default kubelet authn/authz settings and bootstrap registration path. This avoids the standalone-style kubelet behavior where the node never appears in the API.
2026-03-04 18:35:34 +00:00
d2dd6105a6 fix: recover from kubeadm CRISocket node-registration race
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 17s
Handle kubeadm init failures where upload-config/kubelet runs before the node object exists. When that specific error occurs, wait for cp-1 registration and run upload-config kubelet phase explicitly instead of aborting immediately.
2026-03-04 03:00:34 +00:00
b3c975bd73 fix: use kubeadm v1beta4 list format for kubeletExtraArgs
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 17s
kubeadm v1beta4 expects nodeRegistration.kubeletExtraArgs as a list of name/value args, not a map. Switch hostname-override to the correct structure so init config unmarshals successfully.
2026-03-04 02:00:07 +00:00
308a2fd4b7 fix: hard reset kubelet identity before kubeadm init
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 17s
Clear kubelet cert/bootstrap artifacts after reset and force hostname override in kubeadm nodeRegistration so the node consistently registers as cp-1 instead of inheriting stale template identity.
2026-03-04 01:35:41 +00:00
0cc0de2aea fix: pin kubeadm init node identity to flake hostname
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 17s
Set hostname before init and inject nodeRegistration.name into kubeadm InitConfiguration so cp-1 registers as the expected node (cp-1) instead of inheriting the template hostname. This fixes upload-config/kubelet failures caused by node lookup for k8s-base-template.
2026-03-04 01:17:44 +00:00
422b7d7f23 fix: force fresh kubeadm init after rebuild and make kubelet enable-able
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 17s
Always re-run primary init when reconcile performs node rebuilds to avoid stale/partial cluster state causing join preflight failures. Also add wantedBy for kubelet so systemctl enable works as expected during join/init flows.
2026-03-04 00:55:20 +00:00
3ebeb121b4 fix: force fresh bootstrap stages after rebuild and stabilize join node identity
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 17s
Clear completed bootstrap stage checkpoints whenever nodes are rebuilt so reconcile does not skip required init/cni/join work on fresh hosts. Also pass explicit --node-name for control-plane and worker joins, and ensure kubelet is enabled before join commands run.
2026-03-04 00:26:37 +00:00
cbb8358ce6 fix: ensure kubelet is enabled for kubeadm init node registration
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 17s
Enable kubelet before kubeadm init and stop forcing kubelet out of wantedBy so kubeadm can reliably register the node during upload-config/kubelet. Also clear stale kubelet config files during remote prep to avoid restart-loop leftovers.
2026-03-03 01:04:50 +00:00
51b56e562e fix: use valid kube-vip log flag value
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 17s
kube-vip expects an unsigned integer for --log. Replace --log -4 with --log 4 so manifest generation no longer fails during bootstrap.
2026-03-03 00:25:25 +00:00
355273add5 fix: preserve kube-vip mount path and only swap hostPath to super-admin
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 19s
The previous replacement changed both mountPath and hostPath, causing kube-vip to lose its expected in-container kubeconfig path and exit. Keep mountPath at /etc/kubernetes/admin.conf, swap only hostPath during bootstrap, and enable kube-vip debug log level.
2026-03-02 23:59:41 +00:00
262e9eb4d7 fix: bootstrap kube-vip without leader election
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 17s
Run first-control-plane kube-vip manifest without --leaderElection so VIP can bind before API/RBAC are fully available. Also print kube-vip container exit details on failure.
2026-03-02 23:28:44 +00:00
c445638d4a fix: run kube-vip in control-plane-only mode during bootstrap
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 17s
Remove --services from kube-vip static pod manifests for init/join. Service LB mode can crash-loop during kubeadm bootstrap before cluster RBAC is ready, which prevented VIP binding.
2026-03-02 22:52:44 +00:00
a81799a2b5 fix: stabilize kubeadm bootstrap and reduce Proxmox plan latency
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
Move kubeadm reset ahead of kube-vip manifest generation, use super-admin.conf during bootstrap for kube-vip, and restore admin.conf after init. Also switch nixos-rebuild to --sudo and make QEMU guest agent optional so Terraform plan can skip slow guest-agent refreshes when it is not installed.
2026-03-02 22:09:10 +00:00
46c0786e57 fix: run kube-vip daemon before kubeadm init
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 10m8s
- Start kube-vip as a detached container to claim VIP before kubeadm init
- Wait for VIP to be bound before proceeding
- Generate static pod manifest for kube-vip
- Stop bootstrap kube-vip after API server is healthy (static pod takes over)
- Add kube-vip logs output if VIP fails to bind
2026-03-02 20:39:28 +00:00
1af45ca51e fix: skip kubeadm wait-control-plane phase, wait for VIP manually
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
- Use --skip-phases=wait-control-plane to avoid 4-minute timeout
- Wait for kube-vip to bind VIP before checking API server health
- Add kube-vip logs and VIP status to debug output
2026-03-02 19:37:06 +00:00
533f5a91e0 fix: add image pre-pull and debug output for kubeadm init
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
- Pre-pull k8s control plane images before init to speed up startup
- Add crictl pods and crictl ps -a output on failure for debugging
2026-03-02 18:35:41 +00:00
c061dda31d fix: disable webhook authz and clean stale kubelet configs
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
- Add authorization.mode: AlwaysAllow to KubeletConfiguration
- Remove stale kubelet config.yaml before unmasking in all kubeadm scripts
- This prevents 'no client provided, cannot use webhook authorization' error
2026-03-02 17:59:31 +00:00
fb21fbef4f fix: disable kubelet webhook auth in kubeadm init config
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
- Use explicit kubeadm config file with KubeletConfiguration
- Disable webhook authentication which was causing 'no client provided' error
- Add ConditionPathExists to kubelet systemd unit
2026-03-02 16:49:21 +00:00
1b76e07326 fix: kubelet directories and containerd readiness
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
- Create /var/lib/kubelet and /var/lib/kubelet/pki directories via tmpfiles
- Ensure containerd is running before kubeadm init
- Add kubelet logs output on kubeadm init failure for debugging
2026-03-02 14:44:47 +00:00
db72dcab75 fix: remove kubelet ConditionPathExists, add daemon-reload
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
- Remove ConditionPathExists from kubelet service definition as it
  prevents kubelet from starting when managed by kubeadm
- Add systemctl daemon-reload after unmasking in all kubeadm scripts
- Add reset-failed for consistent state cleanup
2026-03-02 13:58:49 +00:00
d42e83358c fix: mask kubelet before rebuild, unmask in kubeadm helpers
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
- Mask kubelet service entirely before nixos-rebuild to prevent systemd
  from restarting it during switch
- Unmask kubelet in th-kubeadm-init/join scripts before starting
2026-03-02 12:44:40 +00:00
93e43a546f fix: prevent kubelet auto-start during rebuild
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
Add wantedBy = [] to prevent kubelet from being started by multi-user.target
during nixos-rebuild switch. This allows rebuilds to succeed even when the
cluster is in a transitional state. Kubelet will be started by kubeadm
init/join commands instead.
2026-03-02 12:13:05 +00:00
f65a414959 fix: stop auto-enabling kubelet during base node rebuild
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 10m8s
2026-03-02 09:13:53 +00:00
7c849ed019 fix: gate kubelet startup until kubeadm config exists
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
2026-03-02 08:39:22 +00:00
388b0c4f5d fix: align kubelet systemd unit with kubeadm flags
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
2026-03-02 03:44:35 +00:00
d810547675 fix: ignore kubeadm HTTPProxyCIDR preflight in homelab workflow
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
2026-03-02 03:06:29 +00:00
9426968cd4 fix: run kubeadm init/reset with clean environment
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
2026-03-02 02:36:57 +00:00
02a6bca60b fix: harden kubeadm scripts for proxy and preflight issues
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
2026-03-02 02:02:38 +00:00
a098c0aa29 fix: avoid sudo env loss for kube-vip image reference
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 10m8s
2026-03-02 01:27:44 +00:00
7ec1ce92cf fix: auto-detect kube-vip interface and tighten SSH fallback
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
2026-03-01 17:34:09 +00:00
8bd064c828 fix: keep micqdf user during kubeadm node rebuilds
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
2026-03-01 13:31:46 +00:00
327c07314c fix: reclaim remote nix store space before rebuild
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 13s
2026-02-28 21:24:26 +00:00
5c037d9a99 fix: prefer root SSH for deploy and trust micqdf in nix
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 18s
2026-02-28 20:03:26 +00:00
885a92f494 chore: add lightweight flake checks for kubeadm configs
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 17s
2026-02-28 16:19:37 +00:00
91dd20e60e fix: escape shell expansion in kubeadm helper scripts
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 17s
2026-02-28 16:12:25 +00:00
7206d8cd41 feat: implement kubeadm bootstrap scaffolding for Nix nodes
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 18s
2026-02-28 16:04:14 +00:00
21be01346b feat: refactor infra to cp/wk kubeadm topology
Some checks failed
Terraform Plan / Terraform Plan (push) Failing after 9s
Provision 3 thin control planes and 3 workers with role-specific sizing and VMID ranges (701/711), generate per-node cloud-init snippets with SSH key injection, and add NixOS kubeadm host/module scaffolding for cp-1..3 and wk-1..3.
2026-02-28 14:16:55 +00:00