HetznerTerra

Author	SHA1	Message	Date
micqdf	5c53b8e06e	fix: normalize Proxmox endpoint and stop dashboards self-trigger Deploy Cluster / Terraform (push) Failing after 53s Details Deploy Cluster / Ansible (push) Has been skipped Details Accept Proxmox API endpoints with or without /api2/json in CI and local tfvars, and avoid running the dashboards workflow just because its own workflow file changed during platform migrations.	2026-04-22 03:13:22 +00:00
micqdf	b1dae28aa5	feat: migrate cluster baseline from Hetzner to Proxmox Deploy Cluster / Terraform (push) Failing after 52s Details Deploy Cluster / Ansible (push) Has been skipped Details Deploy Grafana Content / Grafana Content (push) Failing after 1m37s Details Replace Hetzner infrastructure and cloud-provider assumptions with Proxmox VM clones, kube-vip API HA, and NFS-backed storage. Update bootstrap, Flux addons, CI workflows, and docs to target the new private Proxmox baseline while preserving the existing Tailscale, Doppler, Flux, Rancher, and B2 backup flows.	2026-04-22 03:02:13 +00:00
micqdf	6c6b9d20ca	update README Deploy Cluster / Ansible (push) Has been cancelled Details Deploy Cluster / Terraform (push) Has been cancelled Details	2026-04-22 01:14:21 +00:00
micqdf	c3a2f25c94	docs: record validated Rancher restore drill Deploy Cluster / Terraform (push) Successful in 2m11s Details Deploy Cluster / Ansible (push) Successful in 10m9s Details Update the baseline to treat Rancher backup and restore validation as part of the accepted platform state, and capture the successful live drill run performed on 2026-04-18.	2026-04-18 21:27:42 +00:00
micqdf	7385c2263e	fix: add tailnet smoke checks and move Tailscale operator to stable Deploy Cluster / Terraform (push) Successful in 49s Details Deploy Cluster / Ansible (push) Successful in 5m55s Details Add a post-deploy smoke test that validates Tailscale DNS, proxy readiness, reachability, and service responses for Rancher, Grafana, and Prometheus. Move the operator to the stable Helm repo/version and align the baseline docs with the current HA private-only architecture.	2026-04-18 19:59:13 +00:00
micqdf	60f466ab98	remove Weave GitOps addon Deploy Cluster / Terraform (push) Successful in 41s Details Deploy Cluster / Ansible (push) Successful in 5m37s Details Drop the Flux UI addon and its Tailscale exposure because the UI lags the current Flux APIs and reports misleading HelmRelease errors. Keep Flux managed through the controllers themselves and use Rancher or the flux CLI for access.	2026-04-18 18:44:55 +00:00
micqdf	b20356e9fe	fix: only clean stale Tailscale names before proxies exist Deploy Cluster / Terraform (push) Failing after 51s Details Deploy Cluster / Ansible (push) Has been skipped Details The Tailscale cleanup role was deleting reserved service hostnames on later deploy runs, which removed the live Rancher/Grafana/Prometheus/Flux proxy nodes from the tailnet. Skip cleanup whenever the current cluster already has those Tailscale services, while still allowing cleanup on fresh rebuilds.	2026-04-18 18:16:27 +00:00
micqdf	2ba6b6a896	fix: remove unused Flux CLI install from deploy workflow Deploy Cluster / Terraform (push) Successful in 49s Details Deploy Cluster / Ansible (push) Successful in 5m40s Details The deploy pipeline never uses the flux binary after installation, so the GitHub release download only adds a flaky failure point. Remove the step and keep the bootstrap path kubectl-only.	2026-04-18 17:45:59 +00:00
micqdf	9126de1423	fix: Align Prometheus external URL with Tailscale service port Deploy Cluster / Terraform (push) Successful in 48s Details Deploy Cluster / Ansible (push) Failing after 4m52s Details Prometheus is exposed on port 9090 through the Tailscale LoadBalancer service, so the configured external URL and repo docs should match the actual address users reach after rebuilds.	2026-04-18 17:11:16 +00:00
micqdf	4532b9ed74	chore: trigger rebuild Deploy Cluster / Terraform (push) Successful in 2m8s Details Deploy Cluster / Ansible (push) Successful in 12m54s Details	2026-04-18 06:09:54 +00:00
micqdf	68dbd2e5b7	fix: Reserve Tailscale service hostnames and tag exposed proxies Deploy Cluster / Terraform (push) Successful in 53s Details Deploy Cluster / Ansible (push) Successful in 6m3s Details Reserve grafana/prometheus/flux alongside rancher during rebuild cleanup so stale tailnet devices do not force -1 hostnames. Tag the exposed Tailscale services so operator-managed proxies are provisioned with explicit prod/service tags from the tailnet policy.	2026-04-18 05:48:26 +00:00
micqdf	ceefcc3b29	cleanup: Remove obsolete port-forwarding, deferred Traefik files, and CI workaround Deploy Cluster / Terraform (push) Successful in 2m21s Details Deploy Cluster / Ansible (push) Successful in 13m9s Details - Remove ansible/roles/private-access/ (replaced by Tailscale LB services) - Remove deferred observability ingress/traefik files (replaced by direct Tailscale LBs) - Remove orphaned kustomization-traefik-config.yaml (no backing directory) - Simplify CI: remove SA patch + job deletion workaround for rancher-backup (now handled by postRenderer in HelmRelease) - Update AGENTS.md to reflect current architecture	2026-04-02 01:21:23 +00:00
micqdf	0d339b3163	fix: Use rancher/kubectl image for rancher-backup hook Deploy Cluster / Terraform (push) Successful in 53s Details Deploy Cluster / Ansible (push) Successful in 5m41s Details bitnami/kubectl:1.34 tag doesn't exist. rancher/kubectl is already available in the cluster's image cache.	2026-04-02 01:00:27 +00:00
micqdf	30ccf13c82	fix: Use postRenderer to replace broken kuberlr-kubectl image in rancher-backup hook Deploy Cluster / Terraform (push) Successful in 55s Details Deploy Cluster / Ansible (push) Has been cancelled Details The chart's post-install hook hardcodes rancher/kuberlr-kubectl which can't download kubectl. Use Flux postRenderers to patch the job image to bitnami/kubectl at render time.	2026-04-02 00:51:50 +00:00
micqdf	75e3604f30	fix: Skip post-install hooks for rancher-backup HelmRelease Deploy Cluster / Terraform (push) Successful in 57s Details Deploy Cluster / Ansible (push) Has been cancelled Details The chart's post-install hook uses rancher/kuberlr-kubectl which fails to download kubectl. The SA automountServiceAccountToken is managed manually, so the hook is unnecessary.	2026-04-02 00:45:03 +00:00
micqdf	e4235a6e58	fix: Correct Flux UI pod selector labels to match deployed weave-gitops labels Deploy Cluster / Terraform (push) Successful in 51s Details Deploy Cluster / Ansible (push) Successful in 20m36s Details Actual labels are app.kubernetes.io/name=weave-gitops and app.kubernetes.io/instance=flux-system-weave-gitops.	2026-04-01 02:08:12 +00:00
micqdf	ea2d534171	fix: Use admin.existingSecret for Grafana creds from Doppler Deploy Cluster / Terraform (push) Successful in 50s Details Deploy Cluster / Ansible (push) Successful in 20m42s Details Revert to idiomatic Grafana chart approach. ExternalSecret creates the secret with admin-user/admin-password keys before Grafana's first start on fresh cluster creation.	2026-04-01 01:41:49 +00:00
micqdf	a1b9fe6aa6	fix: Use Flux valuesFrom to inject Doppler Grafana creds as Helm values Deploy Cluster / Terraform (push) Successful in 49s Details Deploy Cluster / Ansible (push) Successful in 20m38s Details Switch from admin.existingSecret to valuesFrom so Flux reads the Doppler-managed secret and injects credentials as standard Helm values.	2026-03-31 23:40:54 +00:00
micqdf	33765657ec	fix: Correct pod selectors for Prometheus and Flux Tailscale services, use Doppler for Grafana creds Deploy Cluster / Terraform (push) Successful in 50s Details Deploy Cluster / Ansible (push) Successful in 21m0s Details Prometheus needs operator.prometheus.io/name label selector. Flux UI pods are labeled gitops-server not weave-gitops. Grafana now reads admin creds from Doppler via ExternalSecret instead of hardcoded values.	2026-03-31 22:54:57 +00:00
micqdf	b8f64fa952	feat: Expose Grafana, Prometheus, and Flux UI via Tailscale LoadBalancer services Deploy Cluster / Terraform (push) Successful in 55s Details Deploy Cluster / Ansible (push) Successful in 20m47s Details Replace Ansible port-forwarding + tailscale serve with direct Tailscale LB services matching the existing Rancher pattern. Each service gets its own tailnet hostname (grafana/prometheus/flux.silverside-gopher.ts.net).	2026-03-31 08:53:28 +00:00
micqdf	569d741751	push Deploy Cluster / Terraform (push) Successful in 2m37s Details Deploy Cluster / Ansible (push) Successful in 25m37s Details	2026-03-31 02:46:55 +00:00
micqdf	89e53d9ec9	fix: Handle restricted B2 keys and safe JSON parsing in restore step Deploy Cluster / Terraform (push) Successful in 52s Details Deploy Cluster / Ansible (push) Successful in 20m48s Details	2026-03-31 01:43:04 +00:00
micqdf	5a2551f40a	fix: Fix flux CLI download URL - use correct GitHub URL with v prefix on version Deploy Cluster / Terraform (push) Successful in 51s Details Deploy Cluster / Ansible (push) Failing after 21m52s Details	2026-03-30 03:11:40 +00:00
micqdf	8c7b62c024	feat: Automate Rancher backup restore in CI pipeline Deploy Cluster / Terraform (push) Successful in 2m18s Details Deploy Cluster / Ansible (push) Failing after 6m28s Details - Wait for Rancher and rancher-backup operator to be ready - Patch default SA in cattle-resources-system (fixes post-install hook failure) - Clean up failed patch-sa jobs - Force reconcile rancher-backup HelmRelease - Find latest backup from B2 using Backblaze API - Create Restore CR to restore Rancher state from latest backup - Wait for restore to complete before continuing	2026-03-30 01:56:29 +00:00
micqdf	a1f07f863a	docs: Update restore template with real Backup CR format Deploy Cluster / Terraform (push) Successful in 50s Details Deploy Cluster / Ansible (push) Successful in 6m2s Details Include actual restore CR spec and note the latest backup filename for reference.	2026-03-30 00:09:53 +00:00
micqdf	2c3a49c2e0	fix: Rename B2 secret keys to match rancher-backup operator expectations Deploy Cluster / Terraform (push) Successful in 51s Details Deploy Cluster / Ansible (push) Has been cancelled Details The operator expects accessKey/secretKey, not aws_access_key_id/aws_secret_access_key.	2026-03-30 00:05:13 +00:00
micqdf	a7ce3dcc1a	fix: Remove s3 block from rancher-backup HelmRelease values Deploy Cluster / Terraform (push) Successful in 49s Details Deploy Cluster / Ansible (push) Successful in 6m12s Details The S3 config caused the operator to try downloading kubectl, which fails in the container. S3 credentials are correctly configured in the Backup CR and ExternalSecret instead.	2026-03-29 23:47:21 +00:00
micqdf	0ab9418458	fix: Re-add HTTPS port to Tailscale LB for Rancher Deploy Cluster / Terraform (push) Successful in 51s Details Deploy Cluster / Ansible (push) Successful in 6m6s Details Rancher now manages its own TLS (no longer tls:external), so it serves HTTPS on port 443. The Tailscale LoadBalancer needs to expose both HTTP (80) and HTTPS (443) targeting the corresponding container ports.	2026-03-29 23:04:49 +00:00
micqdf	c251672618	fix: Configure S3 bucketName for rancher-backup operator Deploy Cluster / Terraform (push) Successful in 50s Details Deploy Cluster / Ansible (push) Has been cancelled Details	2026-03-29 23:01:18 +00:00
micqdf	89364e8f37	fix: Add dependsOn for rancher-backup operator to wait for CRDs Deploy Cluster / Terraform (push) Successful in 50s Details Deploy Cluster / Ansible (push) Has been cancelled Details	2026-03-29 22:57:22 +00:00
micqdf	20d7a6f777	fix: Install rancher-backup CRD chart before operator Deploy Cluster / Terraform (push) Successful in 50s Details Deploy Cluster / Ansible (push) Has been cancelled Details The rancher-backup operator requires CRDs from the rancher-backup-crd chart to be installed first.	2026-03-29 22:51:34 +00:00
micqdf	22ce5fd6f4	feat: Add cert-manager as dependency for Rancher Deploy Cluster / Terraform (push) Successful in 49s Details Deploy Cluster / Ansible (push) Successful in 5m59s Details Rancher requires cert-manager when managing its own TLS (not tls:external). Added cert-manager HelmRelease with CRDs enabled.	2026-03-29 22:36:30 +00:00
micqdf	afb1782d38	fix: Separate Backup CRs into their own kustomization Deploy Cluster / Terraform (push) Successful in 41s Details Deploy Cluster / Ansible (push) Successful in 5m57s Details The Backup and Restore CRs need the rancher-backup CRDs to exist first. Moved them to a separate kustomization that depends on the operator being ready.	2026-03-29 22:22:29 +00:00
micqdf	48870433bf	fix: Remove tls:external from Rancher HelmRelease Deploy Cluster / Terraform (push) Failing after 55s Details Deploy Cluster / Ansible (push) Has been skipped Details With Tailscale LoadBalancer, TLS is not actually terminated at the edge. The Tailscale proxy does TCP passthrough, so Rancher must serve its own TLS certs. Setting tls: external caused Rancher to listen HTTP-only, which broke HTTPS access through Tailscale.	2026-03-29 22:19:23 +00:00
micqdf	f2c506b350	refactor: Replace CNPG external DB with rancher-backup operator Deploy Cluster / Terraform (push) Successful in 48s Details Deploy Cluster / Ansible (push) Successful in 6m5s Details Rancher 2.x uses embedded etcd, not an external PostgreSQL database. The CATTLE_DB_CATTLE_* env vars are Rancher v1 only and were ignored. - Remove all CNPG (CloudNativePG) cluster, operator, and related configs - Remove external DB env vars from Rancher HelmRelease - Remove rancher-db-password ExternalSecret - Add rancher-backup operator HelmRelease (v106.0.2+up8.1.0) - Add B2 credentials ExternalSecret for backup storage - Add recurring Backup CR (daily at 03:00, 7 day retention) - Add commented-out Restore CR for rebuild recovery - Update Flux dependency graph accordingly	2026-03-29 21:53:16 +00:00
micqdf	efdf13976a	fix: Handle missing 'online' field in Tailscale API response Deploy Cluster / Terraform (push) Successful in 2m12s Details Deploy Cluster / Ansible (push) Successful in 9m19s Details	2026-03-29 13:52:23 +00:00
micqdf	5269884408	feat: Auto-cleanup stale Tailscale devices before cluster boot Deploy Cluster / Terraform (push) Successful in 2m17s Details Deploy Cluster / Ansible (push) Failing after 6m35s Details Adds tailscale-cleanup Ansible role that uses the Tailscale API to delete offline devices matching reserved hostnames (e.g. rancher). Runs during site.yml before Finalize to prevent hostname collisions like rancher-1 on rebuild. Requires TAILSCALE_API_KEY (API access token) passed as extra var.	2026-03-29 11:47:53 +00:00
micqdf	6e5b0518be	feat: Add kubeconfig refresh script and fix Ansible Finalize to use public IP Deploy Cluster / Terraform (push) Successful in 53s Details Deploy Cluster / Ansible (push) Successful in 5m25s Details - scripts/refresh-kubeconfig.sh fetches a fresh kubeconfig from CP1 - Ansible site.yml Finalize step now uses public IP instead of Tailscale hostname for the kubeconfig server address - Updated AGENTS.md with kubeconfig refresh instructions	2026-03-29 03:31:36 +00:00
micqdf	905d069e91	fix: Add serverName to CNPG externalClusters for B2 recovery Deploy Cluster / Terraform (push) Successful in 49s Details Deploy Cluster / Ansible (push) Successful in 5m22s Details CNPG uses the external cluster name (b2-backup) as the barman server name by default, but the backups were stored under server name rancher-db.	2026-03-29 03:22:19 +00:00
micqdf	25ba4b7115	fix: Add skipEmptyWalArchiveCheck annotation and B2 secret healthcheck to CNPG Deploy Cluster / Terraform (push) Successful in 49s Details Deploy Cluster / Ansible (push) Successful in 5m22s Details - Skip WAL archive emptiness check so recovery works when restoring over an existing backup archive in B2 - Add healthCheck for b2-credentials secret in CNPG kustomization to prevent recovery from starting before ExternalSecret has synced	2026-03-29 03:15:23 +00:00
micqdf	6a593fd559	feat: Add B2 recovery bootstrap to CNPG cluster Deploy Cluster / Terraform (push) Successful in 2m6s Details Deploy Cluster / Ansible (push) Successful in 8m16s Details	2026-03-29 00:22:24 +00:00
micqdf	936f54a1b5	fix: Restore canonical Rancher tailnet hostname Deploy Cluster / Terraform (push) Successful in 48s Details Deploy Cluster / Ansible (push) Successful in 6m1s Details	2026-03-29 00:00:39 +00:00
micqdf	c9df11e65f	fix: Align Rancher tailnet hostname with live proxy Deploy Cluster / Terraform (push) Successful in 49s Details Deploy Cluster / Ansible (push) Successful in 6m1s Details	2026-03-28 23:47:09 +00:00
micqdf	a3c238fda9	fix: Apply Rancher server URL after chart install Deploy Cluster / Terraform (push) Successful in 2m43s Details Deploy Cluster / Ansible (push) Successful in 10m39s Details	2026-03-28 23:12:59 +00:00
micqdf	a15fa50302	fix: Use Doppler-backed Rancher bootstrap password Deploy Cluster / Terraform (push) Successful in 49s Details Deploy Cluster / Ansible (push) Successful in 5m43s Details	2026-03-28 22:51:38 +00:00
micqdf	0f4f0b09fb	fix: Add Rancher DB password ExternalSecret Deploy Cluster / Terraform (push) Successful in 49s Details Deploy Cluster / Ansible (push) Successful in 5m42s Details	2026-03-28 22:42:05 +00:00
micqdf	4c002a870c	fix: Remove invalid Rancher server-url manifest Deploy Cluster / Terraform (push) Successful in 51s Details Deploy Cluster / Ansible (push) Has been cancelled Details	2026-03-28 22:39:31 +00:00
micqdf	43d11ac7e6	docs: Add agent guidance and sync Rancher docs Deploy Cluster / Terraform (push) Successful in 2m33s Details Deploy Cluster / Ansible (push) Successful in 9m44s Details	2026-03-28 22:13:37 +00:00
micqdf	8c5edcf0a1	fix: Set Rancher server URL to tailnet hostname Deploy Cluster / Terraform (push) Successful in 1m0s Details Deploy Cluster / Ansible (push) Successful in 6m27s Details	2026-03-28 04:07:44 +00:00
micqdf	a81da0d178	feat: Expose Rancher via Tailscale hostname Deploy Cluster / Terraform (push) Successful in 52s Details Deploy Cluster / Ansible (push) Successful in 6m42s Details	2026-03-28 03:59:02 +00:00

1 2 3 4 5 ...

273 Commits