micqdf
2f54c99203
fix: allow flux ui auth secret read
Deploy Cluster / Terraform (push) Has been cancelled
Deploy Cluster / Ansible (push) Has been cancelled
2026-05-05 05:07:20 +00:00
micqdf
8d74661483
fix: use published weave gitops chart
Deploy Cluster / Terraform (push) Has been cancelled
Deploy Cluster / Ansible (push) Has been cancelled
2026-05-05 05:01:33 +00:00
micqdf
b62afcdf97
feat: add private flux ui
Deploy Cluster / Terraform (push) Waiting to run
Deploy Cluster / Ansible (push) Blocked by required conditions
2026-05-05 04:58:33 +00:00
micqdf
c62364fe67
feat: deploy microservices through traefik
Deploy Cluster / Terraform (push) Successful in 39s
Deploy Cluster / Ansible (push) Failing after 17m19s
2026-05-05 01:28:59 +00:00
micqdf
a04b8ad865
fix: vendor observability charts
Deploy Cluster / Terraform (push) Waiting to run
Deploy Cluster / Ansible (push) Blocked by required conditions
2026-05-04 10:49:46 +00:00
micqdf
f5473a9bec
fix: use local cache for flux oci charts
Deploy Cluster / Terraform (push) Waiting to run
Deploy Cluster / Ansible (push) Blocked by required conditions
2026-05-04 10:44:00 +00:00
micqdf
94e5b607c5
fix: use bundled rancher system charts
Deploy Cluster / Terraform (push) Successful in 32s
Deploy Cluster / Ansible (push) Successful in 19m36s
2026-05-03 20:38:15 +00:00
micqdf
bac568d540
fix: sync rancher flux bootstrap secret
Deploy Cluster / Terraform (push) Has been cancelled
Deploy Cluster / Ansible (push) Has been cancelled
2026-05-03 20:25:17 +00:00
micqdf
e5c8d55530
fix: retry observability oci chart fetches
Deploy Cluster / Terraform (push) Successful in 32s
Deploy Cluster / Ansible (push) Successful in 18m20s
2026-05-03 07:24:31 +00:00
micqdf
2636f14408
fix: decouple doppler store health gate
Deploy Cluster / Terraform (push) Successful in 35s
Deploy Cluster / Ansible (push) Successful in 55m57s
2026-05-02 19:43:33 +00:00
micqdf
6ed0a29253
fix: force sync rancher bootstrap secrets
Deploy Cluster / Terraform (push) Successful in 33s
Deploy Cluster / Ansible (push) Failing after 49m21s
2026-05-02 18:27:05 +00:00
micqdf
ce5a05dcd4
fix: wait longer for doppler store validation
Deploy Cluster / Terraform (push) Successful in 34s
Deploy Cluster / Ansible (push) Failing after 36m43s
2026-05-02 17:05:54 +00:00
micqdf
524147dac3
fix: decouple observability secret health gate
Deploy Cluster / Terraform (push) Has been cancelled
Deploy Cluster / Ansible (push) Has been cancelled
Reconcile Observability / Observability (push) Has been cancelled
2026-05-02 00:05:55 +00:00
micqdf
bd71017a85
fix: shorten observability iteration loop
Deploy Cluster / Terraform (push) Has been cancelled
Deploy Cluster / Ansible (push) Has been cancelled
Reconcile Observability / Observability (push) Failing after 6m15s
2026-05-01 19:37:26 +00:00
micqdf
e9327b0c61
fix: stop preloading observability images everywhere
Deploy Cluster / Terraform (push) Successful in 34s
Deploy Cluster / Ansible (push) Failing after 54m12s
2026-05-01 07:52:35 +00:00
micqdf
d57e8c8fe8
fix: reset tailscale helm release directly
Deploy Cluster / Terraform (push) Successful in 32s
Deploy Cluster / Ansible (push) Failing after 33m39s
2026-04-30 20:25:48 +00:00
micqdf
93a2a42917
fix: simplify tailscale operator health gate
Deploy Cluster / Terraform (push) Successful in 33s
Deploy Cluster / Ansible (push) Failing after 40m18s
2026-04-30 19:34:33 +00:00
micqdf
a33a993867
fix: harden cluster rebuild determinism
Deploy Grafana Content / Grafana Content (push) Failing after 1m14s
Deploy Cluster / Terraform (push) Failing after 4m59s
Deploy Cluster / Ansible (push) Has been skipped
2026-04-30 07:36:27 +00:00
micqdf
d050e8962a
fix: seed cert-manager images before flux
Deploy Cluster / Terraform (push) Successful in 32s
Deploy Cluster / Ansible (push) Failing after 1h25m21s
2026-04-27 00:04:19 +00:00
micqdf
d925eeac3f
fix: remove Rancher backup workflow
Deploy Cluster / Terraform (push) Successful in 1m33s
Deploy Cluster / Ansible (push) Failing after 54m21s
2026-04-26 22:13:20 +00:00
micqdf
a2ed9555c0
fix: vendor critical bootstrap charts
Deploy Cluster / Terraform (push) Successful in 30s
Deploy Cluster / Ansible (push) Failing after 20m0s
2026-04-26 21:01:01 +00:00
micqdf
0625eee297
fix: uninstall failed observability upgrades
Deploy Cluster / Terraform (push) Successful in 31s
Deploy Cluster / Ansible (push) Failing after 42m47s
2026-04-26 18:46:07 +00:00
micqdf
499a3462e7
fix: seed observability dependencies
Deploy Cluster / Terraform (push) Successful in 31s
Deploy Cluster / Ansible (push) Has been cancelled
2026-04-26 10:32:25 +00:00
micqdf
a6a630000a
fix: vendor Tailscale operator chart
Deploy Cluster / Terraform (push) Successful in 37s
Deploy Cluster / Ansible (push) Failing after 23m49s
2026-04-26 09:17:44 +00:00
micqdf
ff9e58d44f
fix: remove NFS chart fetch dependency
Deploy Cluster / Terraform (push) Successful in 1m37s
Deploy Cluster / Ansible (push) Has been cancelled
2026-04-26 07:48:11 +00:00
micqdf
46b2ff7d19
fix: harden final health checks
Deploy Cluster / Terraform (push) Successful in 31s
Deploy Cluster / Ansible (push) Failing after 17m50s
2026-04-26 02:14:02 +00:00
micqdf
a4f1d179e9
fix: use Rancher registry for webhook image
Deploy Cluster / Terraform (push) Successful in 32s
Deploy Cluster / Ansible (push) Failing after 26m36s
2026-04-26 01:35:16 +00:00
micqdf
b1eab6a0fa
fix: vendor Rancher chart for bootstrap
Deploy Cluster / Terraform (push) Successful in 31s
Deploy Cluster / Ansible (push) Has been cancelled
2026-04-25 23:08:26 +00:00
micqdf
d0be48b65c
fix: gate Tailscale addon on Helm release
Deploy Cluster / Terraform (push) Successful in 32s
Deploy Cluster / Ansible (push) Failing after 36m36s
2026-04-25 21:21:34 +00:00
micqdf
cdb26904d2
fix: retry Tailscale chart pulls during bootstrap
Deploy Cluster / Terraform (push) Successful in 32s
Deploy Cluster / Ansible (push) Failing after 27m40s
2026-04-25 20:11:43 +00:00
micqdf
3c06e046c2
fix: warm External Secrets image before install
Deploy Cluster / Terraform (push) Successful in 32s
Deploy Cluster / Ansible (push) Failing after 21m10s
2026-04-25 19:46:21 +00:00
micqdf
a6071c504b
fix: point Promtail at Loki service
Deploy Cluster / Terraform (push) Successful in 28s
Deploy Cluster / Ansible (push) Has been cancelled
2026-04-25 03:43:23 +00:00
micqdf
757d88ed52
fix: use cached Promtail images when available
Deploy Cluster / Terraform (push) Successful in 29s
Deploy Cluster / Ansible (push) Failing after 13m15s
2026-04-25 03:25:44 +00:00
micqdf
15defc686f
fix: allow slow Promtail image pulls
Deploy Cluster / Terraform (push) Successful in 28s
Deploy Cluster / Ansible (push) Has been cancelled
2026-04-25 03:10:47 +00:00
micqdf
bc87a7ca43
fix: avoid immutable observability PVC changes
Deploy Cluster / Terraform (push) Successful in 28s
Deploy Cluster / Ansible (push) Failing after 10m47s
2026-04-25 02:25:40 +00:00
micqdf
bfcf57bcc5
fix: enforce post-deploy health checks
Deploy Cluster / Terraform (push) Successful in 29s
Deploy Cluster / Ansible (push) Has been cancelled
2026-04-25 02:22:16 +00:00
micqdf
7b2eca07ab
fix: pull external-secrets chart from OCI
Deploy Cluster / Terraform (push) Successful in 30s
Deploy Cluster / Ansible (push) Failing after 9m41s
2026-04-24 15:24:58 +00:00
micqdf
4b7517c9c5
fix: health-check external-secrets addon via HelmRelease only
...
Deploy Cluster / Terraform (push) Successful in 27s
Deploy Cluster / Ansible (push) Failing after 17m22s
The external-secrets Kustomization was still using wait=true, which makes Flux
hold the addon in a failed state when the HelmRepository has transient fetch
errors even though the HelmRelease and runtime controller deployments are
healthy. Switch it to an explicit HelmRelease health check like the other
helm-backed addons.
2026-04-23 07:11:21 +00:00
micqdf
d9374bc209
fix: remove duplicate wait keys from helm addon kustomizations
...
Deploy Cluster / Terraform (push) Successful in 29s
Deploy Cluster / Ansible (push) Has been cancelled
The repo-only Kustomization healthCheck change accidentally left the original
wait:true keys in the Rancher and Rancher backup Kustomizations, which broke
the infrastructure kustomize build. Remove the duplicate keys so Flux can
apply the HelmRelease-only health checks cleanly.
2026-04-23 02:20:57 +00:00
micqdf
c570a476b5
fix: make helm-based addon kustomizations health-check HelmReleases only
...
Deploy Cluster / Terraform (push) Successful in 29s
Deploy Cluster / Ansible (push) Has been cancelled
These addon Kustomizations were using wait=true, which made Flux treat transient
HelmRepository fetch timeouts as addon failures even when the HelmRelease and
runtime workloads were healthy. Switch the affected Kustomizations to explicit
HelmRelease healthChecks so readiness reflects the actual deployed platform
state instead of repository fetch flakiness.
2026-04-23 02:15:45 +00:00
micqdf
a7f11ccf94
fix: give Rancher more time to pass startup probe during upgrades
...
Deploy Cluster / Terraform (push) Successful in 28s
Deploy Cluster / Ansible (push) Successful in 18m59s
Rancher needs longer than the chart default 2-minute startup probe budget on
this cluster while it restores local catalogs and finishes API startup. Extend
the startup probe failure threshold so Helm upgrades can complete instead of
restarting the new pod before it becomes ready.
2026-04-23 01:44:25 +00:00
micqdf
55d7b8201e
fix: make Rancher image pre-pull best effort and disable managed SUC
...
Deploy Cluster / Terraform (push) Successful in 27s
Deploy Cluster / Ansible (push) Failing after 32m19s
Docker Hub TLS handshakes are too flaky to make pre-pulling a hard bootstrap
requirement. Treat image pre-pull as opportunistic and disable Rancher's
managed system-upgrade-controller feature so that image is removed from the
critical install path while Rancher and its webhook converge.
2026-04-22 11:33:13 +00:00
micqdf
9c0523e880
fix: pre-pull Rancher images and reset Rancher release during bootstrap
...
Deploy Cluster / Terraform (push) Successful in 28s
Deploy Cluster / Ansible (push) Failing after 27m30s
Rancher installs were stalling on transient Docker Hub TLS handshake timeouts
for rancher shell, webhook, and system-upgrade-controller images. Pre-pull the
required images onto all nodes after k3s comes up, extend the Rancher HelmRelease
timeout, and reset/force the Rancher HelmRelease before waiting on addon-rancher
so bootstrap can recover from stale failed remediation state.
2026-04-22 11:00:54 +00:00
micqdf
624cd5aab6
fix: point NFS provisioner at active Proxmox host export
...
Deploy Cluster / Terraform (push) Successful in 27s
Deploy Cluster / Ansible (push) Failing after 18m51s
The cluster nodes can reach the exported NFS path on 10.27.27.239, not
10.27.27.22. Update the storage addon and repo note so the NFS provisioner
mounts the live export and Flux health checks can converge.
2026-04-22 09:46:01 +00:00
micqdf
b1dae28aa5
feat: migrate cluster baseline from Hetzner to Proxmox
...
Deploy Cluster / Terraform (push) Failing after 52s
Deploy Cluster / Ansible (push) Has been skipped
Deploy Grafana Content / Grafana Content (push) Failing after 1m37s
Replace Hetzner infrastructure and cloud-provider assumptions with Proxmox
VM clones, kube-vip API HA, and NFS-backed storage. Update bootstrap,
Flux addons, CI workflows, and docs to target the new private Proxmox
baseline while preserving the existing Tailscale, Doppler, Flux, Rancher,
and B2 backup flows.
2026-04-22 03:02:13 +00:00
micqdf
7385c2263e
fix: add tailnet smoke checks and move Tailscale operator to stable
...
Deploy Cluster / Terraform (push) Successful in 49s
Deploy Cluster / Ansible (push) Successful in 5m55s
Add a post-deploy smoke test that validates Tailscale DNS, proxy readiness,
reachability, and service responses for Rancher, Grafana, and Prometheus.
Move the operator to the stable Helm repo/version and align the baseline docs
with the current HA private-only architecture.
2026-04-18 19:59:13 +00:00
micqdf
60f466ab98
remove Weave GitOps addon
...
Deploy Cluster / Terraform (push) Successful in 41s
Deploy Cluster / Ansible (push) Successful in 5m37s
Drop the Flux UI addon and its Tailscale exposure because the UI lags the
current Flux APIs and reports misleading HelmRelease errors. Keep Flux managed
through the controllers themselves and use Rancher or the flux CLI for access.
2026-04-18 18:44:55 +00:00
micqdf
9126de1423
fix: Align Prometheus external URL with Tailscale service port
...
Deploy Cluster / Terraform (push) Successful in 48s
Deploy Cluster / Ansible (push) Failing after 4m52s
Prometheus is exposed on port 9090 through the Tailscale LoadBalancer
service, so the configured external URL and repo docs should match the
actual address users reach after rebuilds.
2026-04-18 17:11:16 +00:00
micqdf
68dbd2e5b7
fix: Reserve Tailscale service hostnames and tag exposed proxies
...
Deploy Cluster / Terraform (push) Successful in 53s
Deploy Cluster / Ansible (push) Successful in 6m3s
Reserve grafana/prometheus/flux alongside rancher during rebuild cleanup so
stale tailnet devices do not force -1 hostnames. Tag the exposed Tailscale
services so operator-managed proxies are provisioned with explicit prod/service
tags from the tailnet policy.
2026-04-18 05:48:26 +00:00
micqdf
ceefcc3b29
cleanup: Remove obsolete port-forwarding, deferred Traefik files, and CI workaround
...
Deploy Cluster / Terraform (push) Successful in 2m21s
Deploy Cluster / Ansible (push) Successful in 13m9s
- Remove ansible/roles/private-access/ (replaced by Tailscale LB services)
- Remove deferred observability ingress/traefik files (replaced by direct Tailscale LBs)
- Remove orphaned kustomization-traefik-config.yaml (no backing directory)
- Simplify CI: remove SA patch + job deletion workaround for rancher-backup
(now handled by postRenderer in HelmRelease)
- Update AGENTS.md to reflect current architecture
2026-04-02 01:21:23 +00:00