11 Commits

Author SHA1 Message Date
micqdf b1dae28aa5 feat: migrate cluster baseline from Hetzner to Proxmox
Deploy Cluster / Terraform (push) Failing after 52s
Deploy Cluster / Ansible (push) Has been skipped
Deploy Grafana Content / Grafana Content (push) Failing after 1m37s
Replace Hetzner infrastructure and cloud-provider assumptions with Proxmox
VM clones, kube-vip API HA, and NFS-backed storage. Update bootstrap,
Flux addons, CI workflows, and docs to target the new private Proxmox
baseline while preserving the existing Tailscale, Doppler, Flux, Rancher,
and B2 backup flows.
2026-04-22 03:02:13 +00:00
micqdf c3a2f25c94 docs: record validated Rancher restore drill
Deploy Cluster / Terraform (push) Successful in 2m11s
Deploy Cluster / Ansible (push) Successful in 10m9s
Update the baseline to treat Rancher backup and restore validation as part
of the accepted platform state, and capture the successful live drill run
performed on 2026-04-18.
2026-04-18 21:27:42 +00:00
micqdf 7385c2263e fix: add tailnet smoke checks and move Tailscale operator to stable
Deploy Cluster / Terraform (push) Successful in 49s
Deploy Cluster / Ansible (push) Successful in 5m55s
Add a post-deploy smoke test that validates Tailscale DNS, proxy readiness,
reachability, and service responses for Rancher, Grafana, and Prometheus.
Move the operator to the stable Helm repo/version and align the baseline docs
with the current HA private-only architecture.
2026-04-18 19:59:13 +00:00
micqdf 936f54a1b5 fix: Restore canonical Rancher tailnet hostname
Deploy Cluster / Terraform (push) Successful in 48s
Deploy Cluster / Ansible (push) Successful in 6m1s
2026-03-29 00:00:39 +00:00
micqdf c9df11e65f fix: Align Rancher tailnet hostname with live proxy
Deploy Cluster / Terraform (push) Successful in 49s
Deploy Cluster / Ansible (push) Successful in 6m1s
2026-03-28 23:47:09 +00:00
micqdf 43d11ac7e6 docs: Add agent guidance and sync Rancher docs
Deploy Cluster / Terraform (push) Successful in 2m33s
Deploy Cluster / Ansible (push) Successful in 9m44s
2026-03-28 22:13:37 +00:00
micqdf ff31cb4e74 Implement HA control plane with Load Balancer (3-3 topology)
Deploy Cluster / Terraform (push) Failing after 10s
Deploy Cluster / Ansible (push) Has been skipped
Major changes:
- Terraform: Scale to 3 control planes (cx23) + 3 workers (cx33)
- Terraform: Add Hetzner Load Balancer (lb11) for Kubernetes API
- Terraform: Add kube_api_lb_ip output
- Ansible: Add community.network collection to requirements
- Ansible: Update inventory to include LB endpoint
- Ansible: Configure secondary CPs and workers to join via LB
- Ansible: Add k3s_join_endpoint variable for HA joins
- Workflow: Add imports for cp-2, cp-3, and worker-3
- Docs: Update STABLE_BASELINE.md with HA topology and phase gates

Topology:
- 3 control planes (cx23 - 2 vCPU, 8GB RAM each)
- 3 workers (cx33 - 4 vCPU, 16GB RAM each)
- 1 Load Balancer (lb11) routing to all 3 control planes on port 6443
- Workers and secondary CPs join via LB endpoint for HA

Cost impact: +~€26/month (2 extra CPs + 1 extra worker + LB)
2026-03-23 02:39:39 +00:00
micqdf 8b4a445b37 Update STABLE_BASELINE.md - CCM/CSI integration achieved
Deploy Cluster / Terraform (push) Successful in 31s
Deploy Cluster / Ansible (push) Successful in 3m36s
Document the successful completion of Hetzner CCM and CSI integration:
- CCM deployed via Ansible before workers join (fixes uninitialized taint)
- CSI provides hcloud-volumes StorageClass for persistent storage
- Two consecutive rebuilds passed all phase gates
- PVC provisioning tested and working

Platform now has full cloud provider integration with persistent volumes.
2026-03-23 02:25:00 +00:00
micqdf 8643bbfc12 fix: defer observability to get clean baseline
Deploy Cluster / Ansible (push) Has been cancelled
Deploy Cluster / Terraform (push) Has been cancelled
2026-03-22 01:03:55 +00:00
micqdf 894e6275b1 docs: update stable baseline to defer ccm/csi
Deploy Cluster / Terraform (push) Successful in 28s
Deploy Cluster / Ansible (push) Failing after 8m35s
2026-03-21 18:41:36 +00:00
micqdf 522626a52b refactor: simplify stable cluster baseline
Deploy Cluster / Terraform (push) Successful in 1m48s
Deploy Cluster / Ansible (push) Failing after 4m7s
2026-03-20 02:24:37 +00:00