Commit Graph

244 Commits

Author SHA1 Message Date
89364e8f37 fix: Add dependsOn for rancher-backup operator to wait for CRDs
Some checks failed
Deploy Cluster / Terraform (push) Successful in 50s
Deploy Cluster / Ansible (push) Has been cancelled
2026-03-29 22:57:22 +00:00
20d7a6f777 fix: Install rancher-backup CRD chart before operator
Some checks failed
Deploy Cluster / Terraform (push) Successful in 50s
Deploy Cluster / Ansible (push) Has been cancelled
The rancher-backup operator requires CRDs from the rancher-backup-crd
chart to be installed first.
2026-03-29 22:51:34 +00:00
22ce5fd6f4 feat: Add cert-manager as dependency for Rancher
All checks were successful
Deploy Cluster / Terraform (push) Successful in 49s
Deploy Cluster / Ansible (push) Successful in 5m59s
Rancher requires cert-manager when managing its own TLS (not tls:external).
Added cert-manager HelmRelease with CRDs enabled.
2026-03-29 22:36:30 +00:00
afb1782d38 fix: Separate Backup CRs into their own kustomization
All checks were successful
Deploy Cluster / Terraform (push) Successful in 41s
Deploy Cluster / Ansible (push) Successful in 5m57s
The Backup and Restore CRs need the rancher-backup CRDs to exist first.
Moved them to a separate kustomization that depends on the operator being ready.
2026-03-29 22:22:29 +00:00
48870433bf fix: Remove tls:external from Rancher HelmRelease
Some checks failed
Deploy Cluster / Terraform (push) Failing after 55s
Deploy Cluster / Ansible (push) Has been skipped
With Tailscale LoadBalancer, TLS is not actually terminated at the edge.
The Tailscale proxy does TCP passthrough, so Rancher must serve its own
TLS certs. Setting tls: external caused Rancher to listen HTTP-only,
which broke HTTPS access through Tailscale.
2026-03-29 22:19:23 +00:00
f2c506b350 refactor: Replace CNPG external DB with rancher-backup operator
All checks were successful
Deploy Cluster / Terraform (push) Successful in 48s
Deploy Cluster / Ansible (push) Successful in 6m5s
Rancher 2.x uses embedded etcd, not an external PostgreSQL database.
The CATTLE_DB_CATTLE_* env vars are Rancher v1 only and were ignored.

- Remove all CNPG (CloudNativePG) cluster, operator, and related configs
- Remove external DB env vars from Rancher HelmRelease
- Remove rancher-db-password ExternalSecret
- Add rancher-backup operator HelmRelease (v106.0.2+up8.1.0)
- Add B2 credentials ExternalSecret for backup storage
- Add recurring Backup CR (daily at 03:00, 7 day retention)
- Add commented-out Restore CR for rebuild recovery
- Update Flux dependency graph accordingly
2026-03-29 21:53:16 +00:00
efdf13976a fix: Handle missing 'online' field in Tailscale API response
All checks were successful
Deploy Cluster / Terraform (push) Successful in 2m12s
Deploy Cluster / Ansible (push) Successful in 9m19s
2026-03-29 13:52:23 +00:00
5269884408 feat: Auto-cleanup stale Tailscale devices before cluster boot
Some checks failed
Deploy Cluster / Terraform (push) Successful in 2m17s
Deploy Cluster / Ansible (push) Failing after 6m35s
Adds tailscale-cleanup Ansible role that uses the Tailscale API to
delete offline devices matching reserved hostnames (e.g. rancher).
Runs during site.yml before Finalize to prevent hostname collisions
like rancher-1 on rebuild.

Requires TAILSCALE_API_KEY (API access token) passed as extra var.
2026-03-29 11:47:53 +00:00
6e5b0518be feat: Add kubeconfig refresh script and fix Ansible Finalize to use public IP
All checks were successful
Deploy Cluster / Terraform (push) Successful in 53s
Deploy Cluster / Ansible (push) Successful in 5m25s
- scripts/refresh-kubeconfig.sh fetches a fresh kubeconfig from CP1
- Ansible site.yml Finalize step now uses public IP instead of Tailscale
  hostname for the kubeconfig server address
- Updated AGENTS.md with kubeconfig refresh instructions
2026-03-29 03:31:36 +00:00
905d069e91 fix: Add serverName to CNPG externalClusters for B2 recovery
All checks were successful
Deploy Cluster / Terraform (push) Successful in 49s
Deploy Cluster / Ansible (push) Successful in 5m22s
CNPG uses the external cluster name (b2-backup) as the barman server
name by default, but the backups were stored under server name rancher-db.
2026-03-29 03:22:19 +00:00
25ba4b7115 fix: Add skipEmptyWalArchiveCheck annotation and B2 secret healthcheck to CNPG
All checks were successful
Deploy Cluster / Terraform (push) Successful in 49s
Deploy Cluster / Ansible (push) Successful in 5m22s
- Skip WAL archive emptiness check so recovery works when restoring over
  an existing backup archive in B2
- Add healthCheck for b2-credentials secret in CNPG kustomization to
  prevent recovery from starting before ExternalSecret has synced
2026-03-29 03:15:23 +00:00
6a593fd559 feat: Add B2 recovery bootstrap to CNPG cluster
All checks were successful
Deploy Cluster / Terraform (push) Successful in 2m6s
Deploy Cluster / Ansible (push) Successful in 8m16s
2026-03-29 00:22:24 +00:00
936f54a1b5 fix: Restore canonical Rancher tailnet hostname
All checks were successful
Deploy Cluster / Terraform (push) Successful in 48s
Deploy Cluster / Ansible (push) Successful in 6m1s
2026-03-29 00:00:39 +00:00
c9df11e65f fix: Align Rancher tailnet hostname with live proxy
All checks were successful
Deploy Cluster / Terraform (push) Successful in 49s
Deploy Cluster / Ansible (push) Successful in 6m1s
2026-03-28 23:47:09 +00:00
a3c238fda9 fix: Apply Rancher server URL after chart install
All checks were successful
Deploy Cluster / Terraform (push) Successful in 2m43s
Deploy Cluster / Ansible (push) Successful in 10m39s
2026-03-28 23:12:59 +00:00
a15fa50302 fix: Use Doppler-backed Rancher bootstrap password
All checks were successful
Deploy Cluster / Terraform (push) Successful in 49s
Deploy Cluster / Ansible (push) Successful in 5m43s
2026-03-28 22:51:38 +00:00
0f4f0b09fb fix: Add Rancher DB password ExternalSecret
All checks were successful
Deploy Cluster / Terraform (push) Successful in 49s
Deploy Cluster / Ansible (push) Successful in 5m42s
2026-03-28 22:42:05 +00:00
4c002a870c fix: Remove invalid Rancher server-url manifest
Some checks failed
Deploy Cluster / Terraform (push) Successful in 51s
Deploy Cluster / Ansible (push) Has been cancelled
2026-03-28 22:39:31 +00:00
43d11ac7e6 docs: Add agent guidance and sync Rancher docs
All checks were successful
Deploy Cluster / Terraform (push) Successful in 2m33s
Deploy Cluster / Ansible (push) Successful in 9m44s
2026-03-28 22:13:37 +00:00
8c5edcf0a1 fix: Set Rancher server URL to tailnet hostname
All checks were successful
Deploy Cluster / Terraform (push) Successful in 1m0s
Deploy Cluster / Ansible (push) Successful in 6m27s
2026-03-28 04:07:44 +00:00
a81da0d178 feat: Expose Rancher via Tailscale hostname
All checks were successful
Deploy Cluster / Terraform (push) Successful in 52s
Deploy Cluster / Ansible (push) Successful in 6m42s
2026-03-28 03:59:02 +00:00
2a72527c79 fix: Switch Traefik from LoadBalancer to NodePort, remove unused Hetzner LB
All checks were successful
Deploy Cluster / Terraform (push) Successful in 49s
Deploy Cluster / Ansible (push) Successful in 6m25s
2026-03-28 03:21:19 +00:00
7cb3b84ecb feat: Replace custom pgdump job with CNPG ScheduledBackup
Some checks failed
Deploy Cluster / Terraform (push) Successful in 1m30s
Deploy Cluster / Ansible (push) Has been cancelled
2026-03-28 03:15:39 +00:00
d4930235fa fix: Point CNPG backups at the existing B2 bucket
All checks were successful
Deploy Cluster / Terraform (push) Successful in 48s
Deploy Cluster / Ansible (push) Successful in 6m17s
2026-03-26 23:35:19 +00:00
ee8dc4b451 fix: Add Role for B2 credentials access
All checks were successful
Deploy Cluster / Terraform (push) Successful in 49s
Deploy Cluster / Ansible (push) Successful in 6m29s
2026-03-26 23:04:40 +00:00
144d40e7ac feat: Add RBAC for CNP to read B2 credentials secret
All checks were successful
Deploy Cluster / Terraform (push) Successful in 48s
Deploy Cluster / Ansible (push) Successful in 6m38s
2026-03-26 22:56:00 +00:00
cc14e32572 fix: Use gzip instead of lzop for backup compression
Some checks failed
Deploy Cluster / Terraform (push) Successful in 49s
Deploy Cluster / Ansible (push) Has been cancelled
2026-03-26 22:51:10 +00:00
a207a5a7fd fix: Remove invalid encryption field from CNP backup config
Some checks failed
Deploy Cluster / Terraform (push) Successful in 40s
Deploy Cluster / Ansible (push) Has been cancelled
2026-03-26 22:49:29 +00:00
4e1772c175 feat: Add B2 backup configuration to CNP Cluster
Some checks failed
Deploy Cluster / Terraform (push) Successful in 1m38s
Deploy Cluster / Ansible (push) Has been cancelled
2026-03-26 22:47:31 +00:00
ff70b12084 chore: Add HTTP/HTTPS firewall rules for Load Balancer
All checks were successful
Deploy Cluster / Terraform (push) Successful in 52s
Deploy Cluster / Ansible (push) Successful in 6m56s
2026-03-26 22:36:13 +00:00
a3963c56e6 cleanup: Remove traefik-config, simplify traefik helmrelease
All checks were successful
Deploy Cluster / Terraform (push) Successful in 50s
Deploy Cluster / Ansible (push) Successful in 6m20s
2026-03-26 03:16:56 +00:00
612435c42c fix: Add Hetzner LB health check config to Traefik
Some checks failed
Deploy Cluster / Terraform (push) Successful in 47s
Deploy Cluster / Ansible (push) Has been cancelled
2026-03-26 03:11:10 +00:00
ac42f671a2 fix: Remove addon-traefik-config dependency from flux-ui
Some checks failed
Deploy Cluster / Terraform (push) Successful in 50s
Deploy Cluster / Ansible (push) Has been cancelled
2026-03-26 03:05:58 +00:00
dbe7ec0468 fix: Remove expose boolean from traefik ports config
Some checks failed
Deploy Cluster / Terraform (push) Successful in 49s
Deploy Cluster / Ansible (push) Has been cancelled
2026-03-26 03:01:13 +00:00
816ac8b3c0 fix: Use official Traefik helm repo instead of rancher-stable
Some checks failed
Deploy Cluster / Terraform (push) Successful in 48s
Deploy Cluster / Ansible (push) Has been cancelled
2026-03-26 02:59:00 +00:00
6f7998639f fix: Use standard kustomize API in traefik addon
Some checks failed
Deploy Cluster / Terraform (push) Successful in 48s
Deploy Cluster / Ansible (push) Has been cancelled
2026-03-26 02:56:52 +00:00
7a14f89ad1 fix: Correct traefik kustomization path and sourceRef
Some checks failed
Deploy Cluster / Terraform (push) Successful in 48s
Deploy Cluster / Ansible (push) Has been cancelled
2026-03-26 02:55:37 +00:00
786901c5d7 fix: Correct traefik kustomization reference (directory not file)
Some checks failed
Deploy Cluster / Terraform (push) Successful in 47s
Deploy Cluster / Ansible (push) Has been cancelled
2026-03-26 02:54:29 +00:00
46f3d1130b feat: Add Flux-managed Traefik HelmRelease with Hetzner LB config
Some checks failed
Deploy Cluster / Terraform (push) Successful in 48s
Deploy Cluster / Ansible (push) Has been cancelled
2026-03-26 02:52:49 +00:00
2fe5a626d4 fix: Add Hetzner network zone annotation to Traefik LoadBalancer
All checks were successful
Deploy Cluster / Terraform (push) Successful in 52s
Deploy Cluster / Ansible (push) Successful in 6m20s
2026-03-26 02:30:43 +00:00
2ef68c8087 fix: Remove deprecated enablePodMonitor field in CNP Cluster
All checks were successful
Deploy Cluster / Terraform (push) Successful in 2m13s
Deploy Cluster / Ansible (push) Successful in 10m15s
2026-03-26 01:01:53 +00:00
e2cae18f5f fix: Remove backup config for initial deployment - add backup after DB is running
All checks were successful
Deploy Cluster / Terraform (push) Successful in 36s
Deploy Cluster / Ansible (push) Successful in 4m56s
2026-03-26 00:46:50 +00:00
e0c1e41ee9 fix: Remove bootstrap recovery - create fresh DB (recovery only needed after first backup)
Some checks failed
Deploy Cluster / Terraform (push) Successful in 35s
Deploy Cluster / Ansible (push) Has been cancelled
2026-03-26 00:43:49 +00:00
63533de901 fix: Fix retentionPolicy format (14d not keep14)
Some checks failed
Deploy Cluster / Terraform (push) Successful in 47s
Deploy Cluster / Ansible (push) Has been cancelled
2026-03-26 00:41:44 +00:00
1b39710f63 fix: Move retentionPolicy to correct location in backup spec
Some checks failed
Deploy Cluster / Terraform (push) Successful in 37s
Deploy Cluster / Ansible (push) Has been cancelled
2026-03-26 00:39:25 +00:00
8c034323dc fix: Fix Cluster CR with correct barmanObjectStore schema
Some checks failed
Deploy Cluster / Terraform (push) Successful in 35s
Deploy Cluster / Ansible (push) Has been cancelled
2026-03-26 00:35:23 +00:00
5fa2b411ee fix: Fix Cluster CR schema - use barmanObjectStore instead of b2
Some checks failed
Deploy Cluster / Terraform (push) Successful in 35s
Deploy Cluster / Ansible (push) Has been cancelled
2026-03-26 00:33:04 +00:00
3ea28e525f fix: Fix CNP operator image repository (cloudnative-pg not postgresql)
All checks were successful
Deploy Cluster / Terraform (push) Successful in 43s
Deploy Cluster / Ansible (push) Successful in 4m55s
2026-03-26 00:21:09 +00:00
4b95ba113d fix: Remove LPP helm (already installed by k3s), fix CNP chart version to 0.27.1
All checks were successful
Deploy Cluster / Terraform (push) Successful in 36s
Deploy Cluster / Ansible (push) Successful in 5m7s
2026-03-26 00:13:22 +00:00
13627bf81f fix: Split CNP operator from CNP cluster to fix CRD dependency
All checks were successful
Deploy Cluster / Terraform (push) Successful in 35s
Deploy Cluster / Ansible (push) Successful in 5m0s
- Move CNP operator HelmRelease to cnpg-operator folder
- Create addon-cnpg-operator kustomization (deploys operator first)
- Update addon-cnpg to dependOn addon-cnpg-operator
- Add addon-cnpg as dependency for addon-rancher (needs database)
2026-03-26 00:06:34 +00:00