89e53d9ec9
fix: Handle restricted B2 keys and safe JSON parsing in restore step
Deploy Cluster / Terraform (push) Successful in 52s
Deploy Cluster / Ansible (push) Successful in 20m48s
2026-03-31 01:43:04 +00:00
5a2551f40a
fix: Fix flux CLI download URL - use correct GitHub URL with v prefix on version
Deploy Cluster / Terraform (push) Successful in 51s
Deploy Cluster / Ansible (push) Failing after 21m52s
2026-03-30 03:11:40 +00:00
8c7b62c024
feat: Automate Rancher backup restore in CI pipeline
...
Deploy Cluster / Terraform (push) Successful in 2m18s
Deploy Cluster / Ansible (push) Failing after 6m28s
- Wait for Rancher and rancher-backup operator to be ready
- Patch default SA in cattle-resources-system (fixes post-install hook failure)
- Clean up failed patch-sa jobs
- Force reconcile rancher-backup HelmRelease
- Find latest backup from B2 using Backblaze API
- Create Restore CR to restore Rancher state from latest backup
- Wait for restore to complete before continuing
2026-03-30 01:56:29 +00:00
5269884408
feat: Auto-cleanup stale Tailscale devices before cluster boot
...
Deploy Cluster / Terraform (push) Successful in 2m17s
Deploy Cluster / Ansible (push) Failing after 6m35s
Adds tailscale-cleanup Ansible role that uses the Tailscale API to
delete offline devices matching reserved hostnames (e.g. rancher).
Runs during site.yml before Finalize to prevent hostname collisions
like rancher-1 on rebuild.
Requires TAILSCALE_API_KEY (API access token) passed as extra var.
2026-03-29 11:47:53 +00:00
9d601dc77c
feat: Add CloudNativePG with B2 backups for persistent Rancher database
...
Deploy Cluster / Terraform (push) Successful in 4m16s
Deploy Cluster / Ansible (push) Failing after 12m27s
- Add Local Path Provisioner for storage
- Add CloudNativePG operator (v1.27.0) via Flux
- Create PostgreSQL cluster with B2 (Backblaze) auto-backup/restore
- Update Rancher to use external PostgreSQL via CATTLE_DB_CATTLE_* env vars
- Add weekly pg_dump CronJob to B2 (Sundays 2AM)
- Add pre-destroy backup hook to destroy workflow
- Add B2 credentials to Doppler (B2_ACCOUNT_ID, B2_APPLICATION_KEY)
- Generate RANCHER_DB_PASSWORD in Doppler
Backup location: HetznerTerra/rancher-backups/
Retention: 14 backups
2026-03-25 23:06:45 +00:00
ff31cb4e74
Implement HA control plane with Load Balancer (3-3 topology)
...
Deploy Cluster / Terraform (push) Failing after 10s
Deploy Cluster / Ansible (push) Has been skipped
Major changes:
- Terraform: Scale to 3 control planes (cx23) + 3 workers (cx33)
- Terraform: Add Hetzner Load Balancer (lb11) for Kubernetes API
- Terraform: Add kube_api_lb_ip output
- Ansible: Add community.network collection to requirements
- Ansible: Update inventory to include LB endpoint
- Ansible: Configure secondary CPs and workers to join via LB
- Ansible: Add k3s_join_endpoint variable for HA joins
- Workflow: Add imports for cp-2, cp-3, and worker-3
- Docs: Update STABLE_BASELINE.md with HA topology and phase gates
Topology:
- 3 control planes (cx23 - 2 vCPU, 8GB RAM each)
- 3 workers (cx33 - 4 vCPU, 16GB RAM each)
- 1 Load Balancer (lb11) routing to all 3 control planes on port 6443
- Workers and secondary CPs join via LB endpoint for HA
Cost impact: +~€26/month (2 extra CPs + 1 extra worker + LB)
2026-03-23 02:39:39 +00:00
cadfedacf1
Fix providerID health check - use shell module for piped grep
Deploy Cluster / Terraform (push) Successful in 1m47s
Deploy Cluster / Ansible (push) Failing after 18m4s
2026-03-22 22:55:55 +00:00
561cd67b0c
Enable Hetzner CCM and CSI for cloud provider integration
...
Deploy Cluster / Terraform (push) Successful in 30s
Deploy Cluster / Ansible (push) Failing after 3m21s
- Enable --kubelet-arg=cloud-provider=external on all nodes (control planes and workers)
- Activate CCM Kustomization with 10m timeout for Hetzner cloud-controller-manager
- Activate CSI Kustomization with dependsOn CCM and 10m timeout for hcloud-csi
- Update deploy workflow to wait for CCM/CSI readiness (600s timeout)
- Add providerID verification to post-deploy health checks
This enables proper cloud provider integration with Hetzner CCM for node
labeling and Hetzner CSI for persistent volume provisioning.
2026-03-22 22:26:21 +00:00
7b5d794dfc
fix: update health checks for deferred observability
Deploy Cluster / Ansible (push) Has been cancelled
Deploy Cluster / Terraform (push) Has been cancelled
2026-03-22 01:04:27 +00:00
8643bbfc12
fix: defer observability to get clean baseline
Deploy Cluster / Ansible (push) Has been cancelled
Deploy Cluster / Terraform (push) Has been cancelled
2026-03-22 01:03:55 +00:00
84f446c2e6
fix: restore observability timeouts to 5 minutes
Deploy Cluster / Terraform (push) Successful in 32s
Deploy Cluster / Ansible (push) Failing after 8m38s
2026-03-22 00:43:37 +00:00
989848fa89
fix: increase observability timeouts to 10 minutes
Deploy Cluster / Terraform (push) Successful in 2m1s
Deploy Cluster / Ansible (push) Failing after 13m54s
2026-03-21 19:34:43 +00:00
56e5807474
fix: create doppler ClusterSecretStore after ESO is installed
Deploy Cluster / Terraform (push) Successful in 47s
Deploy Cluster / Ansible (push) Failing after 8m31s
2026-03-21 19:19:43 +00:00
a01cf435d4
fix: skip ccm/csi waits for stable baseline - using k3s embedded
Deploy Cluster / Terraform (push) Successful in 37s
Deploy Cluster / Ansible (push) Has been cancelled
2026-03-21 18:40:53 +00:00
84f77c4a68
fix: use kubectl patch instead of apply for flux controller nodeSelector
Deploy Cluster / Terraform (push) Successful in 38s
Deploy Cluster / Ansible (push) Failing after 9m41s
2026-03-21 18:05:41 +00:00
2e4196688c
fix: bootstrap flux in phases - crds first, then resources
Deploy Cluster / Terraform (push) Successful in 38s
Deploy Cluster / Ansible (push) Failing after 3m19s
2026-03-21 17:42:39 +00:00
fcf7f139ff
fix: use public api endpoint for flux bootstrap
Deploy Cluster / Terraform (push) Successful in 41s
Deploy Cluster / Ansible (push) Failing after 2m16s
2026-03-21 00:07:51 +00:00
7139ae322d
fix: bootstrap flux during cluster deploy
Deploy Cluster / Terraform (push) Successful in 38s
Deploy Cluster / Ansible (push) Failing after 3m21s
2026-03-20 10:37:11 +00:00
522626a52b
refactor: simplify stable cluster baseline
Deploy Cluster / Terraform (push) Successful in 1m48s
Deploy Cluster / Ansible (push) Failing after 4m7s
2026-03-20 02:24:37 +00:00
3e41f71b1b
fix: harden terraform destroy workflow
Deploy Cluster / Terraform (push) Successful in 2m28s
Deploy Cluster / Ansible (push) Failing after 20m4s
2026-03-19 23:26:03 +00:00
6f2e056b98
feat: sync runtime secrets from doppler
Deploy Cluster / Terraform (push) Successful in 45s
Deploy Cluster / Ansible (push) Successful in 9m56s
2026-03-09 00:25:41 +00:00
6177b581e4
fix: correct dashboard verification checks and retry helm upgrade lock
Deploy Cluster / Terraform (push) Successful in 44s
Deploy Grafana Content / Grafana Content (push) Successful in 1m29s
Deploy Cluster / Ansible (push) Failing after 11m11s
2026-03-04 08:48:30 +00:00
b1e21c4a4b
fix: speed up dashboards workflow firewall apply and set TF_VAR env
Deploy Cluster / Terraform (push) Successful in 43s
Deploy Grafana Content / Grafana Content (push) Failing after 1m22s
Deploy Cluster / Ansible (push) Failing after 9m2s
2026-03-04 03:54:56 +00:00
2f166ed9e7
feat: manage grafana content as code with fast dashboard workflow
Deploy Cluster / Terraform (push) Successful in 46s
Deploy Cluster / Ansible (push) Has been cancelled
Deploy Grafana Content / Grafana Content (push) Has been cancelled
2026-03-04 03:36:01 +00:00
1c39274df7
feat: stabilize tailscale observability exposure with declarative proxy class
Deploy Cluster / Terraform (push) Successful in 54s
Deploy Cluster / Ansible (push) Successful in 22m19s
2026-03-04 01:37:00 +00:00
63247b79a6
fix: harden Tailscale operator rollout with preflight and diagnostics
Deploy Cluster / Terraform (push) Successful in 47s
Deploy Cluster / Ansible (push) Has been cancelled
2026-03-02 21:39:47 +00:00
b30977a158
feat: deploy lightweight observability stack via Ansible
Deploy Cluster / Terraform (push) Successful in 45s
Deploy Cluster / Ansible (push) Has been cancelled
2026-03-02 01:33:41 +00:00
d92bde78f4
chore: enforce CSI smoke test and add post-deploy health checks
Deploy Cluster / Terraform (push) Successful in 42s
Deploy Cluster / Ansible (push) Failing after 8m20s
2026-03-01 23:45:27 +00:00
2bc9749b81
feat: switch kubeconfig to tailnet endpoint and deploy Hetzner CSI
Deploy Cluster / Terraform (push) Successful in 51s
Deploy Cluster / Ansible (push) Successful in 3m12s
2026-03-01 17:12:12 +00:00
54717cccad
fix: allow current CI runner IP through firewall before Ansible
Deploy Cluster / Terraform (push) Successful in 35s
Deploy Cluster / Ansible (push) Successful in 5m13s
2026-03-01 14:50:55 +00:00
fffd3876fb
fix: remove empty TF_VAR CIDR envs causing plan parse errors
Deploy Cluster / Terraform (push) Successful in 39s
Deploy Cluster / Ansible (push) Failing after 1m28s
2026-03-01 14:47:32 +00:00
86c38e385f
fix: remove CI tailscale dependency and allow runner CIDR exception
Deploy Cluster / Terraform (push) Failing after 31s
Deploy Cluster / Ansible (push) Has been skipped
2026-03-01 14:08:08 +00:00
d29a428f2d
fix: robust tailscaled startup in CI runner
Deploy Cluster / Terraform (push) Successful in 34s
Deploy Cluster / Ansible (push) Failing after 2m44s
2026-03-01 13:57:12 +00:00
a8ef173713
fix: start tailscaled daemon before tailscale up in CI
Deploy Cluster / Terraform (push) Successful in 35s
Deploy Cluster / Ansible (push) Failing after 2m15s
2026-03-01 13:52:20 +00:00
41d0abda16
fix: auto-import existing Hetzner servers into Terraform state in CI
Deploy Cluster / Terraform (push) Failing after 21s
Deploy Cluster / Ansible (push) Has been skipped
2026-03-01 13:27:02 +00:00
011c220f59
fix: avoid server replacement; install tailscale via Ansible
Deploy Cluster / Terraform (push) Failing after 22s
Deploy Cluster / Ansible (push) Has been skipped
2026-03-01 04:51:19 +00:00
1eebfe77df
feat: integrate tailscale access and lock SSH/API to tailnet
Deploy Cluster / Terraform (push) Failing after 20s
Deploy Cluster / Ansible (push) Has been skipped
2026-03-01 04:04:56 +00:00
7230b2b6c8
fix: Use --break-system-packages for pip on Debian 12
Deploy Cluster / Terraform (push) Successful in 20s
Deploy Cluster / Ansible (push) Failing after 1m12s
2026-02-28 22:50:31 +00:00
f40a090c7c
fix: Install pip via apt before installing Python packages
Deploy Cluster / Terraform (push) Successful in 19s
Deploy Cluster / Ansible (push) Failing after 22s
2026-02-28 22:47:24 +00:00
19ba491c54
fix: Use system Python instead of setup-python action
Deploy Cluster / Terraform (push) Successful in 21s
Deploy Cluster / Ansible (push) Failing after 12s
2026-02-28 22:45:50 +00:00
34c2b6895e
fix: Use Python 3.12 instead of 3.11
Deploy Cluster / Terraform (push) Successful in 18s
Deploy Cluster / Ansible (push) Failing after 14s
2026-02-28 22:44:46 +00:00
2fcc8cff77
fix: Ansible fetches outputs directly from Terraform state instead of artifacts
Deploy Cluster / Terraform (push) Successful in 19s
Deploy Cluster / Ansible (push) Failing after 18s
2026-02-28 22:43:26 +00:00
683f994905
fix: Create outputs directory before saving terraform outputs
Deploy Cluster / Terraform (push) Successful in 2m34s
Deploy Cluster / Ansible (push) Failing after 3m48s
2026-02-28 22:27:24 +00:00
ebe86cfacf
fix: Typo in chmod path id_ed255 -> id_ed25519
Deploy Cluster / Terraform (push) Failing after 14s
Deploy Cluster / Ansible (push) Has been skipped
2026-02-28 21:27:37 +00:00
cbd0e0c2c8
fix: Write SSH keys to files before Terraform plan/apply
Deploy Cluster / Terraform (push) Failing after 13s
Deploy Cluster / Ansible (push) Has been skipped
2026-02-28 21:26:14 +00:00
4f0402decf
fix: Add TF_VAR_s3_endpoint and TF_VAR_s3_bucket env vars
Deploy Cluster / Terraform (push) Has been cancelled
Deploy Cluster / Ansible (push) Has been cancelled
2026-02-28 21:12:48 +00:00
109a6a241e
fix: Revert to endpoint for CLI backend config
Deploy Cluster / Ansible (push) Has been cancelled
Deploy Cluster / Terraform (push) Has been cancelled
2026-02-28 21:01:58 +00:00
cd16545ad3
fix: Add skip_requesting_account_id and use endpoints.s3 for Backblaze B2
Deploy Cluster / Terraform (push) Failing after 9s
Deploy Cluster / Ansible (push) Has been skipped
2026-02-28 20:58:40 +00:00
2ce0cc018e
fix: Combine workflows for Gitea compatibility, use artifact v3
Deploy Cluster / Terraform (push) Failing after 14s
Deploy Cluster / Ansible (push) Has been skipped
2026-02-28 20:28:25 +00:00
3b3084b997
feat: Add HA Kubernetes cluster with Terraform + Ansible
...
Terraform / Validate (push) Failing after 17s
Terraform / Plan (push) Has been skipped
Terraform / Apply (push) Has been skipped
- 3x CX23 control plane nodes (HA)
- 4x CX33 worker nodes
- k3s with embedded etcd
- Hetzner CCM for load balancers
- Gitea CI/CD workflows
- Backblaze B2 for Terraform state
2026-02-28 20:24:55 +00:00