TerraHome

Author	SHA1	Message	Date
MichaelFisher1997	33bb0ffb17	fix: keep DHCP enabled by default on template VM All checks were successful Terraform Plan / Terraform Plan (push) Successful in 14s Details The template machine can lose connectivity when rebuilt directly because it has no cloud-init network data during template maintenance. Restore DHCP as the default for the template itself while keeping cloud-init + networkd enabled so cloned VMs can still consume injected network settings.	2026-03-08 20:12:03 +00:00
MichaelFisher1997	cd8e538c51	ci: switch checkout action source away from gitea.com mirror All checks were successful Terraform Plan / Terraform Plan (push) Successful in 16s Details The gitea.com checkout action mirror is timing out during workflow startup. Use actions/checkout@v4 directly so jobs do not fail before any repository logic runs.	2026-03-08 13:36:21 +00:00
MichaelFisher1997	808c290c71	chore: clarify stale template cloud-init failure message Some checks failed Terraform Plan / Terraform Plan (push) Failing after 31s Details Make SSH bootstrap failures explain the real root cause when fresh clones never accept the injected user/key: the Proxmox source template itself still needs the updated cloud-init-capable NixOS configuration.	2026-03-08 13:16:37 +00:00
MichaelFisher1997	79a4c941e5	fix: enable cloud-init networking in NixOS template All checks were successful Terraform Plan / Terraform Plan (push) Successful in 16s Details Freshly recreated VMs were reachable but did not accept the injected SSH key, which indicates Proxmox cloud-init settings were not being applied. Enable cloud-init and cloud-init network handling in the base template so static IPs, hostname, ciuser, and SSH keys take effect on first boot.	2026-03-08 05:16:19 +00:00
MichaelFisher1997	4c167f618a	fix: wait for SSH readiness after VM provisioning All checks were successful Terraform Plan / Terraform Plan (push) Successful in 17s Details Freshly recreated VMs can take a few minutes before cloud-init users and SSH are available. Retry SSH authentication in the bootstrap controller before failing so rebuild/bootstrap does not abort immediately on new hosts.	2026-03-08 05:00:39 +00:00
MichaelFisher1997	7bc861b3e8	ci: speed up Terraform destroy plan by skipping refresh All checks were successful Terraform Plan / Terraform Plan (push) Successful in 16s Details Use terraform plan -refresh=false for destroy workflows so manual NUKE runs do not spend minutes refreshing Proxmox VM state before building the destroy plan.	2026-03-08 04:37:52 +00:00
MichaelFisher1997	b7b364a112	fix: vendor Flannel manifest and harden CNI bootstrap timing All checks were successful Terraform Plan / Terraform Plan (push) Successful in 17s Details Stop depending on GitHub during cluster bring-up by shipping the Flannel manifest in-repo, ensure required host paths exist on NixOS nodes, and wait/retry against a stable API before applying the CNI. This removes the TLS handshake timeout failure mode and makes early network bootstrap deterministic.	2026-03-08 03:24:16 +00:00
MichaelFisher1997	bd866f7dac	fix: add mount utility to kubelet service PATH All checks were successful Terraform Plan / Terraform Plan (push) Successful in 16s Details Flannel pods were stuck because kubelet could not execute mount for projected service account volumes on NixOS. Add util-linux to the kubelet systemd PATH so mount is available during volume setup.	2026-03-07 14:18:20 +00:00
micqdf	0cce4bcf72	Merge branch 'master' into stage All checks were successful Terraform Plan / Terraform Plan (push) Successful in 16s Details	2026-03-07 12:22:01 +00:00
MichaelFisher1997	065567210e	debug: print detailed Flannel pod diagnostics on rollout timeout All checks were successful Terraform Plan / Terraform Plan (push) Successful in 18s Details When kube-flannel daemonset rollout stalls, print pod descriptions and per-container logs for the init containers and main flannel container so the next failure shows the actual cause instead of only Init:0/2.	2026-03-07 12:19:21 +00:00
micqdf	c5f0b1ac37	Merge pull request 'stage' (#121 ) from stage into master Some checks failed Terraform Apply / Terraform Apply (push) Failing after 30m28s Details Reviewed-on: #121	2026-03-07 01:01:38 +00:00
micqdf	e740d47011	Merge branch 'master' into stage All checks were successful Terraform Plan / Terraform Plan (push) Successful in 16s Details	2026-03-07 00:57:47 +00:00
MichaelFisher1997	d9d3976c4c	fix: use self-contained Terraform variable validations All checks were successful Terraform Plan / Terraform Plan (push) Successful in 17s Details Terraform variable validation blocks can only reference the variable under validation. Replace count-based checks with fixed-length validations for the current 3 control planes and 3 workers.	2026-03-07 00:54:51 +00:00
MichaelFisher1997	a0b07816b9	refactor: simplify homelab bootstrap around static IPs and fresh runs Some checks failed Terraform Plan / Terraform Plan (push) Failing after 10s Details Make Terraform the source of truth for node IPs, remove guest-agent/SSH discovery from the normal workflow path, simplify the bootstrap controller to a fresh-run flow, and swap the initial CNI to Flannel so cluster readiness is easier to prove before reintroducing more complex reconcile behavior.	2026-03-07 00:52:35 +00:00
micqdf	d964ff8b50	Merge pull request 'fix: point Cilium directly at API server and print rollout diagnostics' (#120 ) from stage into master Some checks failed Terraform Apply / Terraform Apply (push) Failing after 26m43s Details Reviewed-on: #120	2026-03-05 01:25:52 +00:00
MichaelFisher1997	e06b2c692e	fix: point Cilium directly at API server and print rollout diagnostics All checks were successful Terraform Plan / Terraform Plan (push) Successful in 18s Details Set Cilium k8sServiceHost/k8sServicePort to the primary control-plane API endpoint to avoid in-cluster service routing dependency during early bootstrap. Also print cilium daemonset/pod/log diagnostics when rollout times out.	2026-03-05 01:21:21 +00:00
micqdf	c48bbddef3	Merge pull request 'fix: stabilize Cilium install defaults and add rollout diagnostics' (#119 ) from stage into master Some checks failed Terraform Apply / Terraform Apply (push) Failing after 26m43s Details Reviewed-on: #119	2026-03-05 00:52:04 +00:00
MichaelFisher1997	ca54c44fa4	fix: stabilize Cilium install defaults and add rollout diagnostics All checks were successful Terraform Plan / Terraform Plan (push) Successful in 17s Details Set Cilium kubeProxyReplacement from env (default false for homelab stability) and collect cilium daemonset/pod/log diagnostics when rollout times out during verification.	2026-03-05 00:48:41 +00:00
micqdf	8bda08be07	Merge pull request 'fix: hard-reset nodes before kubeadm join retries' (#118 ) from stage into master Some checks failed Terraform Apply / Terraform Apply (push) Failing after 29m30s Details Reviewed-on: #118	2026-03-05 00:16:31 +00:00
MichaelFisher1997	0778de9719	fix: hard-reset nodes before kubeadm join retries All checks were successful Terraform Plan / Terraform Plan (push) Successful in 17s Details Before control-plane and worker joins, remove stale kubelet/kubernetes identity files and run kubeadm reset -f. This prevents preflight failures like FileAvailable--etc-kubernetes-kubelet.conf during repeated reconcile attempts.	2026-03-04 23:38:15 +00:00
micqdf	92f0658995	Merge pull request 'fix: add heuristic SSH inventory fallback for generic hostnames' (#117 ) from stage into master Some checks failed Terraform Apply / Terraform Apply (push) Failing after 19m52s Details Reviewed-on: #117	2026-03-04 23:13:08 +00:00
MichaelFisher1997	fc4eb1bc6e	fix: add heuristic SSH inventory fallback for generic hostnames All checks were successful Terraform Plan / Terraform Plan (push) Successful in 16s Details When Proxmox guest-agent IPs are empty and SSH discovery returns duplicate generic hostnames (e.g. flex), assign remaining missing nodes from unmatched SSH-reachable IPs in deterministic order. Also emit SSH-reachable IP diagnostics on failure.	2026-03-04 23:07:45 +00:00
micqdf	4b017364c8	Merge pull request 'fix: wait for Cilium and node readiness before marking bootstrap success' (#116 ) from stage into master Some checks failed Terraform Apply / Terraform Apply (push) Failing after 8m47s Details Reviewed-on: #116	2026-03-04 22:57:39 +00:00
MichaelFisher1997	a70de061b0	fix: wait for Cilium and node readiness before marking bootstrap success All checks were successful Terraform Plan / Terraform Plan (push) Successful in 18s Details Update verification stage to block on cilium daemonset rollout and all nodes reaching Ready. This prevents workflows from reporting success while the cluster is still NotReady immediately after join.	2026-03-04 22:26:43 +00:00
micqdf	9d98f56725	Merge pull request 'fix: add join preflight ignores for homelab control planes' (#115 ) from stage into master All checks were successful Terraform Apply / Terraform Apply (push) Successful in 44m43s Details Reviewed-on: #115	2026-03-04 21:13:02 +00:00
MichaelFisher1997	5ddd00f711	fix: add join preflight ignores for homelab control planes All checks were successful Terraform Plan / Terraform Plan (push) Successful in 16s Details Append --ignore-preflight-errors=NumCPU,HTTPProxyCIDR to control-plane join commands and HTTPProxyCIDR to worker joins so kubeadm join does not fail on known single-CPU/proxy CIDR checks in this environment.	2026-03-04 21:09:27 +00:00
micqdf	5af4021228	Merge pull request 'fix: require kubelet kubeconfig before starting service' (#114 ) from stage into master Some checks failed Terraform Apply / Terraform Apply (push) Failing after 16m56s Details Reviewed-on: #114	2026-03-04 20:46:48 +00:00
MichaelFisher1997	034869347a	fix: require kubelet kubeconfig before starting service All checks were successful Terraform Plan / Terraform Plan (push) Successful in 17s Details Inline kubelet bootstrap/kubeconfig flags in ExecStart and gate startup on /etc/kubernetes/*kubelet.conf in addition to config.yaml. This prevents kubelet entering standalone mode with webhook auth enabled when no client config is present.	2026-03-04 20:45:47 +00:00
micqdf	50d0d99332	Merge pull request 'stage' (#113 ) from stage into master Some checks failed Terraform Apply / Terraform Apply (push) Failing after 18m7s Details Reviewed-on: #113	2026-03-04 19:32:40 +00:00
MichaelFisher1997	f0093deedc	fix: avoid assigning control-plane VIP as node SSH address All checks were successful Terraform Plan / Terraform Plan (push) Successful in 15s Details Exclude the configured VIP suffix from subnet scans and prefer non-VIP IPs when multiple SSH endpoints resolve to the same node. This prevents cp-1 being discovered as .250 and later failing SSH commands against the floating VIP.	2026-03-04 19:26:37 +00:00
MichaelFisher1997	6b6ca021c9	fix: add kubelet bootstrap kubeconfig args to systemd unit All checks were successful Terraform Plan / Terraform Plan (push) Successful in 17s Details Include KUBELET_KUBECONFIG_ARGS in kubelet ExecStart so kubelet can authenticate with bootstrap-kubelet.conf/kubelet.conf and register node objects during kubeadm init.	2026-03-04 19:26:07 +00:00
micqdf	c034f7975c	Merge pull request 'stage' (#112 ) from stage into master Some checks failed Terraform Apply / Terraform Apply (push) Failing after 28m53s Details Reviewed-on: #112	2026-03-04 18:51:53 +00:00
micqdf	90ef0ec33f	Merge branch 'master' into stage All checks were successful Terraform Plan / Terraform Plan (push) Successful in 17s Details	2026-03-04 18:42:22 +00:00
MichaelFisher1997	ba6cf42c04	fix: restart kubelet during CRISocket recovery and add registration diagnostics All checks were successful Terraform Plan / Terraform Plan (push) Successful in 16s Details When kubeadm init fails at upload-config/kubelet due missing node object, explicitly restart kubelet to ensure bootstrap flags are loaded before waiting for node registration. Add kubelet flag dump and focused registration log output to surface auth/cert errors.	2026-03-04 18:37:50 +00:00
MichaelFisher1997	3cd0c70727	fix: stop overriding kubelet config in kubeadm init All checks were successful Terraform Plan / Terraform Plan (push) Successful in 17s Details Remove custom KubeletConfiguration from init config so kubeadm uses default kubelet authn/authz settings and bootstrap registration path. This avoids the standalone-style kubelet behavior where the node never appears in the API.	2026-03-04 18:35:34 +00:00
micqdf	3281ebd216	Merge pull request 'fix: recover from kubeadm CRISocket node-registration race' (#111 ) from stage into master Some checks failed Terraform Apply / Terraform Apply (push) Failing after 18m6s Details Reviewed-on: #111	2026-03-04 03:03:17 +00:00
MichaelFisher1997	d2dd6105a6	fix: recover from kubeadm CRISocket node-registration race All checks were successful Terraform Plan / Terraform Plan (push) Successful in 17s Details Handle kubeadm init failures where upload-config/kubelet runs before the node object exists. When that specific error occurs, wait for cp-1 registration and run upload-config kubelet phase explicitly instead of aborting immediately.	2026-03-04 03:00:34 +00:00
micqdf	981afc509a	Merge pull request 'fix: use kubeadm v1beta4 list format for kubeletExtraArgs' (#110 ) from stage into master Some checks failed Terraform Apply / Terraform Apply (push) Failing after 19m48s Details Reviewed-on: #110	2026-03-04 02:32:22 +00:00
MichaelFisher1997	b3c975bd73	fix: use kubeadm v1beta4 list format for kubeletExtraArgs All checks were successful Terraform Plan / Terraform Plan (push) Successful in 17s Details kubeadm v1beta4 expects nodeRegistration.kubeletExtraArgs as a list of name/value args, not a map. Switch hostname-override to the correct structure so init config unmarshals successfully.	2026-03-04 02:00:07 +00:00
micqdf	8aab666fad	Merge pull request 'fix: hard reset kubelet identity before kubeadm init' (#109 ) from stage into master Some checks failed Terraform Apply / Terraform Apply (push) Failing after 12m25s Details Reviewed-on: #109	2026-03-04 01:42:55 +00:00
MichaelFisher1997	308a2fd4b7	fix: hard reset kubelet identity before kubeadm init All checks were successful Terraform Plan / Terraform Plan (push) Successful in 17s Details Clear kubelet cert/bootstrap artifacts after reset and force hostname override in kubeadm nodeRegistration so the node consistently registers as cp-1 instead of inheriting stale template identity.	2026-03-04 01:35:41 +00:00
micqdf	3fd7ed48b1	Merge pull request 'fix: pin kubeadm init node identity to flake hostname' (#108 ) from stage into master Some checks failed Terraform Apply / Terraform Apply (push) Failing after 15m22s Details Reviewed-on: #108	2026-03-04 01:18:51 +00:00
MichaelFisher1997	0cc0de2aea	fix: pin kubeadm init node identity to flake hostname All checks were successful Terraform Plan / Terraform Plan (push) Successful in 17s Details Set hostname before init and inject nodeRegistration.name into kubeadm InitConfiguration so cp-1 registers as the expected node (cp-1) instead of inheriting the template hostname. This fixes upload-config/kubelet failures caused by node lookup for k8s-base-template.	2026-03-04 01:17:44 +00:00
micqdf	99458ca829	Merge pull request 'fix: force fresh kubeadm init after rebuild and make kubelet enable-able' (#107 ) from stage into master Some checks failed Terraform Apply / Terraform Apply (push) Failing after 17m1s Details Reviewed-on: #107	2026-03-04 00:56:30 +00:00
MichaelFisher1997	422b7d7f23	fix: force fresh kubeadm init after rebuild and make kubelet enable-able All checks were successful Terraform Plan / Terraform Plan (push) Successful in 17s Details Always re-run primary init when reconcile performs node rebuilds to avoid stale/partial cluster state causing join preflight failures. Also add wantedBy for kubelet so systemctl enable works as expected during join/init flows.	2026-03-04 00:55:20 +00:00
micqdf	adc8a620f4	Merge pull request 'fix: force fresh bootstrap stages after rebuild and stabilize join node identity' (#106 ) from stage into master Some checks failed Terraform Apply / Terraform Apply (push) Failing after 20m28s Details Reviewed-on: #106	2026-03-04 00:32:06 +00:00
MichaelFisher1997	3ebeb121b4	fix: force fresh bootstrap stages after rebuild and stabilize join node identity All checks were successful Terraform Plan / Terraform Plan (push) Successful in 17s Details Clear completed bootstrap stage checkpoints whenever nodes are rebuilt so reconcile does not skip required init/cni/join work on fresh hosts. Also pass explicit --node-name for control-plane and worker joins, and ensure kubelet is enabled before join commands run.	2026-03-04 00:26:37 +00:00
micqdf	f11aadf79c	Merge pull request 'fix: map SSH-discovered nodes by VMID when hostnames are generic' (#105 ) from stage into master Some checks failed Terraform Apply / Terraform Apply (push) Failing after 27m43s Details Reviewed-on: #105	2026-03-03 23:37:45 +00:00
MichaelFisher1997	b4265a649e	fix: map SSH-discovered nodes by VMID when hostnames are generic All checks were successful Terraform Plan / Terraform Plan (push) Successful in 16s Details Some freshly cloned VMs still report template/generic hostnames during discovery. Probe DMI product serial over SSH and map it to Terraform VMIDs so cp-2/cp-3/wk-2 can be resolved even before hostname reconciliation.	2026-03-03 22:16:35 +00:00
micqdf	09d2f56967	Merge pull request 'fix: make SSH inventory discovery more reliable on CI' (#104 ) from stage into master Some checks failed Terraform Apply / Terraform Apply (push) Failing after 8m46s Details Reviewed-on: #104	2026-03-03 21:45:57 +00:00

1 2 3 4 5 ...

358 Commits