HetznerTerra

Author	SHA1	Message	Date
micqdf	a33a993867	fix: harden cluster rebuild determinism Deploy Grafana Content / Grafana Content (push) Failing after 1m14s Details Deploy Cluster / Terraform (push) Failing after 4m59s Details Deploy Cluster / Ansible (push) Has been skipped Details	2026-04-30 07:36:27 +00:00
micqdf	f49b08f50c	fix: reinstall k3s on version drift Deploy Cluster / Terraform (push) Successful in 28s Details Deploy Cluster / Ansible (push) Failing after 33m40s Details	2026-04-30 06:03:53 +00:00
micqdf	d1c31cdb91	fix: rely on k3s service readiness instead of installer exit code Deploy Cluster / Terraform (push) Successful in 27s Details Deploy Cluster / Ansible (push) Failing after 8m9s Details The k3s install script can return non-zero while systemd is still bringing the service up, especially on worker agents. Do not fail immediately on the installer command; wait for the service to become active and only emit install diagnostics if the later readiness check fails.	2026-04-22 04:14:31 +00:00
micqdf	b1dae28aa5	feat: migrate cluster baseline from Hetzner to Proxmox Deploy Cluster / Terraform (push) Failing after 52s Details Deploy Cluster / Ansible (push) Has been skipped Details Deploy Grafana Content / Grafana Content (push) Failing after 1m37s Details Replace Hetzner infrastructure and cloud-provider assumptions with Proxmox VM clones, kube-vip API HA, and NFS-backed storage. Update bootstrap, Flux addons, CI workflows, and docs to target the new private Proxmox baseline while preserving the existing Tailscale, Doppler, Flux, Rancher, and B2 backup flows.	2026-04-22 03:02:13 +00:00
micqdf	f36445d99a	Fix CNI: configure flannel to use private network interface (enp7s0) instead of public Deploy Cluster / Terraform (push) Successful in 34s Details Deploy Cluster / Ansible (push) Successful in 8m42s Details	2026-03-25 01:44:33 +00:00
micqdf	0e52d8f159	Use Tailscale DNS names instead of IPs for TLS SANs Deploy Cluster / Terraform (push) Successful in 2m21s Details Deploy Cluster / Ansible (push) Successful in 9m0s Details Changed from hardcoded Tailscale IPs to DNS names: - k8s-cluster-cp-1.silverside-gopher.ts.net - k8s-cluster-cp-2.silverside-gopher.ts.net - k8s-cluster-cp-3.silverside-gopher.ts.net This is more robust since Tailscale IPs change on rebuild, but DNS names remain consistent. After next rebuild, cluster accessible via: - kubectl --server=https://k8s-cluster-cp-1.silverside-gopher.ts.net:6443	2026-03-23 23:50:48 +00:00
micqdf	4726db2b5b	Add Tailscale IPs to k3s TLS SANs for secure tailnet access Deploy Cluster / Terraform (push) Successful in 2m30s Details Deploy Cluster / Ansible (push) Successful in 9m48s Details Changes: - Add tailscale_control_plane_ips list to k3s-server defaults - Include all 3 control plane Tailscale IPs (100.120.55.97, 100.108.90.123, 100.92.149.85) - Update primary k3s install to add Tailscale IPs to TLS certificates - Enables kubectl access via Tailscale without certificate errors After next deploy, cluster will be accessible via: - kubectl --server=https://100.120.55.97:6443 (or any CP tailscale IP) - kubectl --server=https://k8s-cluster-cp-1:6443 (via tailscale DNS)	2026-03-23 23:04:00 +00:00
micqdf	952a80a742	Fix HA cluster join via Load Balancer private IP Deploy Cluster / Terraform (push) Successful in 36s Details Deploy Cluster / Ansible (push) Failing after 3m5s Details Changes: - Use LB private IP (10.0.1.5) instead of public IP for cluster joins - Add LB private IP to k3s TLS SANs on primary control plane - This allows secondary CPs and workers to verify certificates when joining via LB Fixes x509 certificate validation error when joining via LB public IP.	2026-03-23 02:56:41 +00:00
micqdf	ff31cb4e74	Implement HA control plane with Load Balancer (3-3 topology) Deploy Cluster / Terraform (push) Failing after 10s Details Deploy Cluster / Ansible (push) Has been skipped Details Major changes: - Terraform: Scale to 3 control planes (cx23) + 3 workers (cx33) - Terraform: Add Hetzner Load Balancer (lb11) for Kubernetes API - Terraform: Add kube_api_lb_ip output - Ansible: Add community.network collection to requirements - Ansible: Update inventory to include LB endpoint - Ansible: Configure secondary CPs and workers to join via LB - Ansible: Add k3s_join_endpoint variable for HA joins - Workflow: Add imports for cp-2, cp-3, and worker-3 - Docs: Update STABLE_BASELINE.md with HA topology and phase gates Topology: - 3 control planes (cx23 - 2 vCPU, 8GB RAM each) - 3 workers (cx33 - 4 vCPU, 16GB RAM each) - 1 Load Balancer (lb11) routing to all 3 control planes on port 6443 - Workers and secondary CPs join via LB endpoint for HA Cost impact: +~€26/month (2 extra CPs + 1 extra worker + LB)	2026-03-23 02:39:39 +00:00
micqdf	8d1f9f4944	fix: add k3s reset logic for primary control plane Deploy Cluster / Terraform (push) Successful in 39s Details Deploy Cluster / Ansible (push) Failing after 4m19s Details	2026-03-21 16:10:17 +00:00
micqdf	d4fd43e2f5	refactor: simplify k3s-server bootstrap for	2026-03-21 15:48:33 +00:00
micqdf	9d2f30de32	fix: prepare k3s for external cloud provider Deploy Cluster / Terraform (push) Successful in 46s Details Deploy Cluster / Ansible (push) Successful in 4m4s Details	2026-03-17 01:21:23 +00:00
micqdf	f0dd31c552	fix: only manage kubeconfig on primary control plane Deploy Cluster / Terraform (push) Successful in 20s Details Deploy Cluster / Ansible (push) Failing after 4m31s Details	2026-03-01 03:02:37 +00:00
micqdf	b703cb269b	fix: bootstrap k3s HA on private network with dual SANs Deploy Cluster / Terraform (push) Successful in 2m31s Details Deploy Cluster / Ansible (push) Failing after 4m38s Details	2026-03-01 02:45:00 +00:00
micqdf	a5ea696e0f	chore: capture k3s secondary install diagnostics on failure Deploy Cluster / Terraform (push) Successful in 18s Details Deploy Cluster / Ansible (push) Failing after 2m50s Details	2026-03-01 02:05:07 +00:00
micqdf	2ae16414a0	fix: remove strict 9345 precheck for secondary join Deploy Cluster / Terraform (push) Successful in 20s Details Deploy Cluster / Ansible (push) Failing after 2m46s Details	2026-03-01 01:42:28 +00:00
micqdf	063d6dfcc0	fix: auto-reset broken secondary k3s servers and precheck join ports Deploy Cluster / Terraform (push) Successful in 22s Details Deploy Cluster / Ansible (push) Failing after 4m37s Details	2026-03-01 01:25:20 +00:00
micqdf	f699936172	fix: increase k3s readiness timeout and emit diagnostics on failure Deploy Cluster / Terraform (push) Successful in 21s Details Deploy Cluster / Ansible (push) Has been cancelled Details	2026-03-01 00:59:17 +00:00
micqdf	27b29322cd	fix: use private network IPs for k3s join and node addressing Deploy Cluster / Terraform (push) Successful in 24s Details Deploy Cluster / Ansible (push) Failing after 8m13s Details	2026-03-01 00:42:55 +00:00
micqdf	1db435cd42	fix: Use private IP for k3s HA cluster join and advertise Deploy Cluster / Terraform (push) Successful in 19s Details Deploy Cluster / Ansible (push) Failing after 8m11s Details	2026-03-01 00:32:03 +00:00
micqdf	691b3ed316	fix: Check for k3s service instead of binary for proper HA join detection Deploy Cluster / Terraform (push) Successful in 19s Details Deploy Cluster / Ansible (push) Failing after 8m5s Details	2026-02-28 23:16:39 +00:00
micqdf	3b3084b997	feat: Add HA Kubernetes cluster with Terraform + Ansible Terraform / Validate (push) Failing after 17s Details Terraform / Plan (push) Has been skipped Details Terraform / Apply (push) Has been skipped Details - 3x CX23 control plane nodes (HA) - 4x CX33 worker nodes - k3s with embedded etcd - Hetzner CCM for load balancers - Gitea CI/CD workflows - Backblaze B2 for Terraform state	2026-02-28 20:24:55 +00:00

22 Commits