HetznerTerra

Author	SHA1	Message	Date
micqdf	50d97209e6	fix: ignore Rancher Turtles cleanup hook pod Deploy Cluster / Terraform (push) Successful in 30s Details Deploy Cluster / Ansible (push) Successful in 14m41s Details	2026-04-26 02:33:21 +00:00
micqdf	46b2ff7d19	fix: harden final health checks Deploy Cluster / Terraform (push) Successful in 31s Details Deploy Cluster / Ansible (push) Failing after 17m50s Details	2026-04-26 02:14:02 +00:00
micqdf	a4f1d179e9	fix: use Rancher registry for webhook image Deploy Cluster / Terraform (push) Successful in 32s Details Deploy Cluster / Ansible (push) Failing after 26m36s Details	2026-04-26 01:35:16 +00:00
micqdf	9879de5a86	fix: stop pre-pulling Rancher child images Deploy Cluster / Terraform (push) Successful in 35s Details Deploy Cluster / Ansible (push) Failing after 11m1s Details	2026-04-26 00:57:49 +00:00
micqdf	195e9bce25	fix: parallelize Rancher child image warmup Deploy Cluster / Terraform (push) Successful in 35s Details Deploy Cluster / Ansible (push) Failing after 23m46s Details	2026-04-26 00:02:12 +00:00
micqdf	4796606432	fix: warm Rancher child images on all nodes Deploy Cluster / Terraform (push) Successful in 32s Details Deploy Cluster / Ansible (push) Has been cancelled Details	2026-04-25 23:30:20 +00:00
micqdf	f3c96b65d2	fix: shorten Rancher chart retry windows Deploy Cluster / Terraform (push) Successful in 34s Details Deploy Cluster / Ansible (push) Failing after 25m40s Details	2026-04-25 22:30:07 +00:00
micqdf	c7a375758f	fix: retry Rancher chart pulls during waits Deploy Cluster / Terraform (push) Successful in 31s Details Deploy Cluster / Ansible (push) Has been cancelled Details	2026-04-25 22:03:09 +00:00
micqdf	40647318b4	fix: tolerate cached Helm repository artifacts Deploy Cluster / Terraform (push) Successful in 32s Details Deploy Cluster / Ansible (push) Failing after 29m36s Details	2026-04-25 20:44:03 +00:00
micqdf	cdb26904d2	fix: retry Tailscale chart pulls during bootstrap Deploy Cluster / Terraform (push) Successful in 32s Details Deploy Cluster / Ansible (push) Failing after 27m40s Details	2026-04-25 20:11:43 +00:00
micqdf	3c06e046c2	fix: warm External Secrets image before install Deploy Cluster / Terraform (push) Successful in 32s Details Deploy Cluster / Ansible (push) Failing after 21m10s Details	2026-04-25 19:46:21 +00:00
micqdf	17f1815e7f	fix: use CRI pulls for Flux image warmup Deploy Cluster / Terraform (push) Successful in 30s Details Deploy Cluster / Ansible (push) Failing after 15m3s Details	2026-04-25 19:28:29 +00:00
micqdf	66e86e55ea	fix: require Flux image warmup before bootstrap Deploy Cluster / Terraform (push) Successful in 31s Details Deploy Cluster / Ansible (push) Failing after 23m13s Details	2026-04-25 19:02:32 +00:00
micqdf	43df412243	fix: handle missing Proxmox VM config during cleanup Deploy Cluster / Terraform (push) Successful in 1m41s Details Deploy Cluster / Ansible (push) Failing after 44m51s Details	2026-04-25 17:40:51 +00:00
micqdf	383ef9e9ac	fix: clean orphan Proxmox cloud-init volumes Deploy Cluster / Terraform (push) Failing after 19s Details Deploy Cluster / Ansible (push) Has been skipped Details	2026-04-25 17:38:57 +00:00
micqdf	18abc5073b	fix: keep concurrent Terraform apply Deploy Cluster / Terraform (push) Failing after 1m28s Details Deploy Cluster / Ansible (push) Has been skipped Details	2026-04-25 17:30:59 +00:00
micqdf	f8da2594ca	fix: serialize Proxmox VM apply Deploy Cluster / Ansible (push) Has been cancelled Details Deploy Cluster / Terraform (push) Has been cancelled Details	2026-04-25 17:27:59 +00:00
micqdf	003333a061	fix: make health checks observe Flux readiness Deploy Cluster / Terraform (push) Successful in 28s Details Deploy Cluster / Ansible (push) Successful in 11m14s Details	2026-04-25 03:52:43 +00:00
micqdf	a6071c504b	fix: point Promtail at Loki service Deploy Cluster / Terraform (push) Successful in 28s Details Deploy Cluster / Ansible (push) Has been cancelled Details	2026-04-25 03:43:23 +00:00
micqdf	08123457f1	fix: ignore stale install hook pods in health check Deploy Cluster / Terraform (push) Successful in 29s Details Deploy Cluster / Ansible (push) Has been cancelled Details	2026-04-25 03:41:00 +00:00
micqdf	15defc686f	fix: allow slow Promtail image pulls Deploy Cluster / Terraform (push) Successful in 28s Details Deploy Cluster / Ansible (push) Has been cancelled Details	2026-04-25 03:10:47 +00:00
micqdf	abb7578328	fix: run post-deploy checks with bash Deploy Cluster / Terraform (push) Successful in 28s Details Deploy Cluster / Ansible (push) Failing after 12m17s Details	2026-04-25 02:42:54 +00:00
micqdf	045880bdd6	fix: ignore stale Rancher helm operation pods Deploy Cluster / Terraform (push) Successful in 28s Details Deploy Cluster / Ansible (push) Has been cancelled Details	2026-04-25 02:23:30 +00:00
micqdf	bfcf57bcc5	fix: enforce post-deploy health checks Deploy Cluster / Terraform (push) Successful in 29s Details Deploy Cluster / Ansible (push) Has been cancelled Details	2026-04-25 02:22:16 +00:00
micqdf	7e3ebec95b	fix: wait for Rancher resources before rollout checks Deploy Cluster / Terraform (push) Successful in 29s Details Deploy Cluster / Ansible (push) Successful in 17m31s Details	2026-04-25 01:54:21 +00:00
micqdf	0c31c3b1d5	fix: fail fast on stalled Flux Helm releases Deploy Cluster / Terraform (push) Successful in 30s Details Deploy Cluster / Ansible (push) Failing after 10m33s Details	2026-04-25 01:40:42 +00:00
micqdf	5523feb563	fix: wait for Rancher Flux resources before rollout Deploy Cluster / Terraform (push) Successful in 27s Details Deploy Cluster / Ansible (push) Failing after 39m43s Details	2026-04-25 00:59:16 +00:00
micqdf	cafa2fa0b3	fix: reset stalled bootstrap Helm releases Deploy Cluster / Terraform (push) Successful in 27s Details Deploy Cluster / Ansible (push) Failing after 9m5s Details	2026-04-25 00:48:33 +00:00
micqdf	a7fd4c0b97	fix: wait on actual ESO deployment names Deploy Cluster / Terraform (push) Successful in 30s Details Deploy Cluster / Ansible (push) Failing after 38m19s Details	2026-04-25 00:07:48 +00:00
micqdf	e56a3a6c38	fix: wait for ESO webhook before ClusterSecretStore Deploy Cluster / Terraform (push) Successful in 29s Details Deploy Cluster / Ansible (push) Failing after 10m13s Details	2026-04-24 23:13:03 +00:00
micqdf	7b2eca07ab	fix: pull external-secrets chart from OCI Deploy Cluster / Terraform (push) Successful in 30s Details Deploy Cluster / Ansible (push) Failing after 9m41s Details	2026-04-24 15:24:58 +00:00
micqdf	347ca041ba	fix: reduce rerun bootstrap pre-pull delays Deploy Cluster / Terraform (push) Successful in 28s Details Deploy Cluster / Ansible (push) Failing after 39m26s Details	2026-04-24 12:09:34 +00:00
micqdf	68b293efe4	fix: qualify Flux HelmChart bootstrap resources Deploy Cluster / Terraform (push) Successful in 27s Details Deploy Cluster / Ansible (push) Has been cancelled Details	2026-04-24 10:47:13 +00:00
micqdf	1f465cc0c1	fix: force reconcile bootstrap Helm charts Deploy Cluster / Terraform (push) Successful in 30s Details Deploy Cluster / Ansible (push) Failing after 15m37s Details	2026-04-24 10:17:49 +00:00
micqdf	6e22bd26b3	fix: wait directly on ESO Helm readiness Deploy Cluster / Terraform (push) Successful in 27s Details Deploy Cluster / Ansible (push) Failing after 47m9s Details	2026-04-23 22:09:45 +00:00
micqdf	869880c152	fix: wait for ESO resources before CRD conditions Deploy Cluster / Terraform (push) Successful in 31s Details Deploy Cluster / Ansible (push) Failing after 31m14s Details	2026-04-23 21:17:44 +00:00
micqdf	31e95eb227	fix: pre-pull Flux controllers before bootstrap rollout Deploy Cluster / Terraform (push) Successful in 28s Details Deploy Cluster / Ansible (push) Failing after 16m39s Details	2026-04-23 20:36:57 +00:00
micqdf	12675417bd	fix: use correct namespace and deployment name for ESO rollout check Deploy Cluster / Terraform (push) Successful in 1m36s Details Deploy Cluster / Ansible (push) Failing after 40m40s Details The ESO deployment is named external-secrets-external-secrets in the external-secrets namespace, not external-secrets in kube-system.	2026-04-23 19:00:15 +00:00
micqdf	8e081ddfda	fix: wait on ESO deployment directly instead of Flux Kustomization status Deploy Cluster / Terraform (push) Successful in 29s Details Deploy Cluster / Ansible (push) Failing after 19m8s Details The addon-external-secrets Flux Kustomization was timing out during bootstrap because image pulls on fresh Proxmox VMs are slow. The critical dependency is the ESO deployment being available for the Doppler ClusterSecretStore. Replace the Kustomization readiness check with direct checks for ESO CRD establishment and deployment rollout, which are the actual prerequisites for the next step.	2026-04-23 07:32:19 +00:00
micqdf	a7d540ca65	fix: stop forcing Flux releases during deploy bootstrap Deploy Cluster / Terraform (push) Successful in 32s Details Deploy Cluster / Ansible (push) Successful in 21m12s Details Remove the HelmRelease reset/force annotations from the deploy workflow now that the cluster can converge on its own. The runtime waits remain, but CI no longer re-triggers Rancher and NFS churn on every bootstrap attempt.	2026-04-23 00:35:31 +00:00
micqdf	098bd98876	fix: wait on Rancher and storage runtime objects during bootstrap Deploy Cluster / Terraform (push) Successful in 26s Details Deploy Cluster / Ansible (push) Failing after 25m19s Details Flux can leave HelmRelease and Kustomization conditions stale after transient chart fetch or image pull failures even when the underlying workloads recover. Switch the deploy workflow to wait on the concrete runtime resources we care about: the NFS provisioner deployment and StorageClass, Rancher deployment, webhook, cert-manager issuer/certificate, and the rancher-backup deployment.	2026-04-22 18:41:09 +00:00
micqdf	9c0523e880	fix: pre-pull Rancher images and reset Rancher release during bootstrap Deploy Cluster / Terraform (push) Successful in 28s Details Deploy Cluster / Ansible (push) Failing after 27m30s Details Rancher installs were stalling on transient Docker Hub TLS handshake timeouts for rancher shell, webhook, and system-upgrade-controller images. Pre-pull the required images onto all nodes after k3s comes up, extend the Rancher HelmRelease timeout, and reset/force the Rancher HelmRelease before waiting on addon-rancher so bootstrap can recover from stale failed remediation state.	2026-04-22 11:00:54 +00:00
micqdf	8372d562ad	fix: reset and force nfs helmrelease during bootstrap Deploy Cluster / Terraform (push) Successful in 29s Details Deploy Cluster / Ansible (push) Failing after 20m22s Details When the NFS storage HelmRelease has already entered a failed remediation state, a plain reconcile request is not enough to clear the stale failure counters. Send requestedAt, resetAt, and forceAt together so helm-controller retries the release cleanly before the workflow waits on addon-nfs-storage.	2026-04-22 10:35:32 +00:00
micqdf	1bb11dfe3a	fix: force nfs storage reconcile during flux bootstrap Deploy Cluster / Terraform (push) Successful in 27s Details Deploy Cluster / Ansible (push) Failing after 19m0s Details The NFS HelmRelease can remain in a failed state from an earlier bootstrap attempt even after the backing NFS export is corrected and the pod becomes healthy. Request a fresh reconcile of the HelmRelease and addon kustomization before waiting on addon-nfs-storage so the bootstrap step can observe the recovered state.	2026-04-22 10:08:20 +00:00
micqdf	71bdc6a709	fix: extend Flux bootstrap timeouts on fresh clusters Deploy Cluster / Terraform (push) Successful in 26s Details Deploy Cluster / Ansible (push) Failing after 18m44s Details Fresh Proxmox clusters need longer for the Flux controller rollouts and first GitRepository/Kustomization reconciliations, especially while images are still being pulled onto the control plane. Increase the bootstrap wait windows so CI does not fail while the controllers are still converging.	2026-04-22 08:36:27 +00:00
micqdf	714f20417b	fix: tolerate control-plane taint when pinning Flux to cp1 Deploy Cluster / Terraform (push) Successful in 28s Details Deploy Cluster / Ansible (push) Failing after 10m19s Details Flux bootstrap patches the controllers onto k8s-cluster-cp-1, but the control-plane node is tainted NoSchedule. Add the matching toleration in both the checked-in patch manifest and the bootstrap workflow so the controllers can actually schedule and roll out on cp-1.	2026-04-22 05:05:15 +00:00
micqdf	5c53b8e06e	fix: normalize Proxmox endpoint and stop dashboards self-trigger Deploy Cluster / Terraform (push) Failing after 53s Details Deploy Cluster / Ansible (push) Has been skipped Details Accept Proxmox API endpoints with or without /api2/json in CI and local tfvars, and avoid running the dashboards workflow just because its own workflow file changed during platform migrations.	2026-04-22 03:13:22 +00:00
micqdf	b1dae28aa5	feat: migrate cluster baseline from Hetzner to Proxmox Deploy Cluster / Terraform (push) Failing after 52s Details Deploy Cluster / Ansible (push) Has been skipped Details Deploy Grafana Content / Grafana Content (push) Failing after 1m37s Details Replace Hetzner infrastructure and cloud-provider assumptions with Proxmox VM clones, kube-vip API HA, and NFS-backed storage. Update bootstrap, Flux addons, CI workflows, and docs to target the new private Proxmox baseline while preserving the existing Tailscale, Doppler, Flux, Rancher, and B2 backup flows.	2026-04-22 03:02:13 +00:00
micqdf	7385c2263e	fix: add tailnet smoke checks and move Tailscale operator to stable Deploy Cluster / Terraform (push) Successful in 49s Details Deploy Cluster / Ansible (push) Successful in 5m55s Details Add a post-deploy smoke test that validates Tailscale DNS, proxy readiness, reachability, and service responses for Rancher, Grafana, and Prometheus. Move the operator to the stable Helm repo/version and align the baseline docs with the current HA private-only architecture.	2026-04-18 19:59:13 +00:00
micqdf	2ba6b6a896	fix: remove unused Flux CLI install from deploy workflow Deploy Cluster / Terraform (push) Successful in 49s Details Deploy Cluster / Ansible (push) Successful in 5m40s Details The deploy pipeline never uses the flux binary after installation, so the GitHub release download only adds a flaky failure point. Remove the step and keep the bootstrap path kubectl-only.	2026-04-18 17:45:59 +00:00

1 2 3

101 Commits