Commit Graph

341 Commits

Author SHA1 Message Date
micqdf 9879de5a86 fix: stop pre-pulling Rancher child images
Deploy Cluster / Terraform (push) Successful in 35s
Deploy Cluster / Ansible (push) Failing after 11m1s
2026-04-26 00:57:49 +00:00
micqdf 195e9bce25 fix: parallelize Rancher child image warmup
Deploy Cluster / Terraform (push) Successful in 35s
Deploy Cluster / Ansible (push) Failing after 23m46s
2026-04-26 00:02:12 +00:00
micqdf 4796606432 fix: warm Rancher child images on all nodes
Deploy Cluster / Terraform (push) Successful in 32s
Deploy Cluster / Ansible (push) Has been cancelled
2026-04-25 23:30:20 +00:00
micqdf b1eab6a0fa fix: vendor Rancher chart for bootstrap
Deploy Cluster / Terraform (push) Successful in 31s
Deploy Cluster / Ansible (push) Has been cancelled
2026-04-25 23:08:26 +00:00
micqdf f3c96b65d2 fix: shorten Rancher chart retry windows
Deploy Cluster / Terraform (push) Successful in 34s
Deploy Cluster / Ansible (push) Failing after 25m40s
2026-04-25 22:30:07 +00:00
micqdf c7a375758f fix: retry Rancher chart pulls during waits
Deploy Cluster / Terraform (push) Successful in 31s
Deploy Cluster / Ansible (push) Has been cancelled
2026-04-25 22:03:09 +00:00
micqdf d0be48b65c fix: gate Tailscale addon on Helm release
Deploy Cluster / Terraform (push) Successful in 32s
Deploy Cluster / Ansible (push) Failing after 36m36s
2026-04-25 21:21:34 +00:00
micqdf 40647318b4 fix: tolerate cached Helm repository artifacts
Deploy Cluster / Terraform (push) Successful in 32s
Deploy Cluster / Ansible (push) Failing after 29m36s
2026-04-25 20:44:03 +00:00
micqdf cdb26904d2 fix: retry Tailscale chart pulls during bootstrap
Deploy Cluster / Terraform (push) Successful in 32s
Deploy Cluster / Ansible (push) Failing after 27m40s
2026-04-25 20:11:43 +00:00
micqdf 3c06e046c2 fix: warm External Secrets image before install
Deploy Cluster / Terraform (push) Successful in 32s
Deploy Cluster / Ansible (push) Failing after 21m10s
2026-04-25 19:46:21 +00:00
micqdf 17f1815e7f fix: use CRI pulls for Flux image warmup
Deploy Cluster / Terraform (push) Successful in 30s
Deploy Cluster / Ansible (push) Failing after 15m3s
2026-04-25 19:28:29 +00:00
micqdf 66e86e55ea fix: require Flux image warmup before bootstrap
Deploy Cluster / Terraform (push) Successful in 31s
Deploy Cluster / Ansible (push) Failing after 23m13s
2026-04-25 19:02:32 +00:00
micqdf 43df412243 fix: handle missing Proxmox VM config during cleanup
Deploy Cluster / Terraform (push) Successful in 1m41s
Deploy Cluster / Ansible (push) Failing after 44m51s
2026-04-25 17:40:51 +00:00
micqdf 383ef9e9ac fix: clean orphan Proxmox cloud-init volumes
Deploy Cluster / Terraform (push) Failing after 19s
Deploy Cluster / Ansible (push) Has been skipped
2026-04-25 17:38:57 +00:00
micqdf 18abc5073b fix: keep concurrent Terraform apply
Deploy Cluster / Terraform (push) Failing after 1m28s
Deploy Cluster / Ansible (push) Has been skipped
2026-04-25 17:30:59 +00:00
micqdf f8da2594ca fix: serialize Proxmox VM apply
Deploy Cluster / Ansible (push) Has been cancelled
Deploy Cluster / Terraform (push) Has been cancelled
2026-04-25 17:27:59 +00:00
micqdf e0359f0097 tes
Deploy Cluster / Terraform (push) Failing after 1m26s
Deploy Cluster / Ansible (push) Has been skipped
2026-04-25 17:22:12 +00:00
micqdf 003333a061 fix: make health checks observe Flux readiness
Deploy Cluster / Terraform (push) Successful in 28s
Deploy Cluster / Ansible (push) Successful in 11m14s
2026-04-25 03:52:43 +00:00
micqdf a6071c504b fix: point Promtail at Loki service
Deploy Cluster / Terraform (push) Successful in 28s
Deploy Cluster / Ansible (push) Has been cancelled
2026-04-25 03:43:23 +00:00
micqdf 08123457f1 fix: ignore stale install hook pods in health check
Deploy Cluster / Terraform (push) Successful in 29s
Deploy Cluster / Ansible (push) Has been cancelled
2026-04-25 03:41:00 +00:00
micqdf 757d88ed52 fix: use cached Promtail images when available
Deploy Cluster / Terraform (push) Successful in 29s
Deploy Cluster / Ansible (push) Failing after 13m15s
2026-04-25 03:25:44 +00:00
micqdf 15defc686f fix: allow slow Promtail image pulls
Deploy Cluster / Terraform (push) Successful in 28s
Deploy Cluster / Ansible (push) Has been cancelled
2026-04-25 03:10:47 +00:00
micqdf abb7578328 fix: run post-deploy checks with bash
Deploy Cluster / Terraform (push) Successful in 28s
Deploy Cluster / Ansible (push) Failing after 12m17s
2026-04-25 02:42:54 +00:00
micqdf bc87a7ca43 fix: avoid immutable observability PVC changes
Deploy Cluster / Terraform (push) Successful in 28s
Deploy Cluster / Ansible (push) Failing after 10m47s
2026-04-25 02:25:40 +00:00
micqdf 045880bdd6 fix: ignore stale Rancher helm operation pods
Deploy Cluster / Terraform (push) Successful in 28s
Deploy Cluster / Ansible (push) Has been cancelled
2026-04-25 02:23:30 +00:00
micqdf bfcf57bcc5 fix: enforce post-deploy health checks
Deploy Cluster / Terraform (push) Successful in 29s
Deploy Cluster / Ansible (push) Has been cancelled
2026-04-25 02:22:16 +00:00
micqdf 7e3ebec95b fix: wait for Rancher resources before rollout checks
Deploy Cluster / Terraform (push) Successful in 29s
Deploy Cluster / Ansible (push) Successful in 17m31s
2026-04-25 01:54:21 +00:00
micqdf 0c31c3b1d5 fix: fail fast on stalled Flux Helm releases
Deploy Cluster / Terraform (push) Successful in 30s
Deploy Cluster / Ansible (push) Failing after 10m33s
2026-04-25 01:40:42 +00:00
micqdf 5523feb563 fix: wait for Rancher Flux resources before rollout
Deploy Cluster / Terraform (push) Successful in 27s
Deploy Cluster / Ansible (push) Failing after 39m43s
2026-04-25 00:59:16 +00:00
micqdf cafa2fa0b3 fix: reset stalled bootstrap Helm releases
Deploy Cluster / Terraform (push) Successful in 27s
Deploy Cluster / Ansible (push) Failing after 9m5s
2026-04-25 00:48:33 +00:00
micqdf a7fd4c0b97 fix: wait on actual ESO deployment names
Deploy Cluster / Terraform (push) Successful in 30s
Deploy Cluster / Ansible (push) Failing after 38m19s
2026-04-25 00:07:48 +00:00
micqdf e56a3a6c38 fix: wait for ESO webhook before ClusterSecretStore
Deploy Cluster / Terraform (push) Successful in 29s
Deploy Cluster / Ansible (push) Failing after 10m13s
2026-04-24 23:13:03 +00:00
micqdf 7b2eca07ab fix: pull external-secrets chart from OCI
Deploy Cluster / Terraform (push) Successful in 30s
Deploy Cluster / Ansible (push) Failing after 9m41s
2026-04-24 15:24:58 +00:00
micqdf 347ca041ba fix: reduce rerun bootstrap pre-pull delays
Deploy Cluster / Terraform (push) Successful in 28s
Deploy Cluster / Ansible (push) Failing after 39m26s
2026-04-24 12:09:34 +00:00
micqdf 3f52bad854 fix: make Ansible reruns faster and idempotent
Deploy Cluster / Terraform (push) Successful in 28s
Deploy Cluster / Ansible (push) Has been cancelled
2026-04-24 11:44:11 +00:00
micqdf c89c31adea fix: clean up Ansible bootstrap warnings
Deploy Cluster / Terraform (push) Successful in 27s
Deploy Cluster / Ansible (push) Has been cancelled
2026-04-24 11:07:13 +00:00
micqdf 68b293efe4 fix: qualify Flux HelmChart bootstrap resources
Deploy Cluster / Terraform (push) Successful in 27s
Deploy Cluster / Ansible (push) Has been cancelled
2026-04-24 10:47:13 +00:00
micqdf 1f465cc0c1 fix: force reconcile bootstrap Helm charts
Deploy Cluster / Terraform (push) Successful in 30s
Deploy Cluster / Ansible (push) Failing after 15m37s
2026-04-24 10:17:49 +00:00
micqdf 6e22bd26b3 fix: wait directly on ESO Helm readiness
Deploy Cluster / Terraform (push) Successful in 27s
Deploy Cluster / Ansible (push) Failing after 47m9s
2026-04-23 22:09:45 +00:00
micqdf 869880c152 fix: wait for ESO resources before CRD conditions
Deploy Cluster / Terraform (push) Successful in 31s
Deploy Cluster / Ansible (push) Failing after 31m14s
2026-04-23 21:17:44 +00:00
micqdf 31e95eb227 fix: pre-pull Flux controllers before bootstrap rollout
Deploy Cluster / Terraform (push) Successful in 28s
Deploy Cluster / Ansible (push) Failing after 16m39s
2026-04-23 20:36:57 +00:00
micqdf 12675417bd fix: use correct namespace and deployment name for ESO rollout check
Deploy Cluster / Terraform (push) Successful in 1m36s
Deploy Cluster / Ansible (push) Failing after 40m40s
The ESO deployment is named external-secrets-external-secrets in the
external-secrets namespace, not external-secrets in kube-system.
2026-04-23 19:00:15 +00:00
micqdf 8e081ddfda fix: wait on ESO deployment directly instead of Flux Kustomization status
Deploy Cluster / Terraform (push) Successful in 29s
Deploy Cluster / Ansible (push) Failing after 19m8s
The addon-external-secrets Flux Kustomization was timing out during bootstrap
because image pulls on fresh Proxmox VMs are slow. The critical dependency is
the ESO deployment being available for the Doppler ClusterSecretStore. Replace
the Kustomization readiness check with direct checks for ESO CRD establishment
and deployment rollout, which are the actual prerequisites for the next step.
2026-04-23 07:32:19 +00:00
micqdf 4b7517c9c5 fix: health-check external-secrets addon via HelmRelease only
Deploy Cluster / Terraform (push) Successful in 27s
Deploy Cluster / Ansible (push) Failing after 17m22s
The external-secrets Kustomization was still using wait=true, which makes Flux
hold the addon in a failed state when the HelmRepository has transient fetch
errors even though the HelmRelease and runtime controller deployments are
healthy. Switch it to an explicit HelmRelease health check like the other
helm-backed addons.
2026-04-23 07:11:21 +00:00
micqdf f9bc53723f fix: make image pre-pull roles fully best effort
Deploy Cluster / Terraform (push) Successful in 27s
Deploy Cluster / Ansible (push) Failing after 22m46s
The pre-pull roles were still blocking the playbook because they retried until
success and exhausted their retry budget during registry TLS timeouts. Keep the
image pulls as opportunistic cache warmers, but never let them fail the
bootstrap; log any missed images instead.
2026-04-23 06:41:21 +00:00
micqdf ee6417c18e fix: pre-pull core bootstrap images on cp1 before Flux bootstrap
Deploy Cluster / Terraform (push) Successful in 28s
Deploy Cluster / Ansible (push) Has been cancelled
Fresh clusters were repeatedly timing out while kubelet pulled the pause image,
k3s packaged component images, and Flux controller images onto the first
control plane. Pre-pull the core control-plane bootstrap images into
containerd on cp-1 so Flux and packaged addons start from a warm cache instead
of racing registry TLS timeouts.
2026-04-23 05:55:14 +00:00
micqdf 1156dc0203 fix: pre-pull kube-vip images before waiting for VIP
Deploy Cluster / Terraform (push) Successful in 29s
Deploy Cluster / Ansible (push) Failing after 43m31s
The primary control plane was stalling because kubelet still had to pull both
the Rancher pause image and the kube-vip image before the DaemonSet pod could
become Ready. Pre-pull those images into containerd, extend the readiness wait,
and emit pod diagnostics if kube-vip still does not come up.
2026-04-23 03:55:52 +00:00
micqdf 4151027e01 fix: clean stale Tailscale node devices before bootstrap
Deploy Cluster / Terraform (push) Successful in 1m40s
Deploy Cluster / Ansible (push) Failing after 14m30s
Run the Tailscale cleanup role against the cluster hostnames before any node
reconnects to the tailnet. This removes stale offline cp/worker devices from
previous rebuilds so replacement VMs can reclaim their original hostnames
instead of getting -1 suffixes.
2026-04-23 03:25:17 +00:00
micqdf 9269e9df1b docs: add guide for deploying app repos to the cluster
Deploy Cluster / Terraform (push) Successful in 1m36s
Deploy Cluster / Ansible (push) Has been cancelled
Document the recommended two-repo model for application delivery, including
Flux attachment objects, Doppler/ExternalSecret wiring, Tailscale service
exposure, and the steps for enabling the suspended apps layer.
2026-04-23 02:43:00 +00:00
micqdf d9374bc209 fix: remove duplicate wait keys from helm addon kustomizations
Deploy Cluster / Terraform (push) Successful in 29s
Deploy Cluster / Ansible (push) Has been cancelled
The repo-only Kustomization healthCheck change accidentally left the original
wait:true keys in the Rancher and Rancher backup Kustomizations, which broke
the infrastructure kustomize build. Remove the duplicate keys so Flux can
apply the HelmRelease-only health checks cleanly.
2026-04-23 02:20:57 +00:00