Commit Graph

16 Commits

Author SHA1 Message Date
4726db2b5b Add Tailscale IPs to k3s TLS SANs for secure tailnet access
All checks were successful
Deploy Cluster / Terraform (push) Successful in 2m30s
Deploy Cluster / Ansible (push) Successful in 9m48s
Changes:
- Add tailscale_control_plane_ips list to k3s-server defaults
- Include all 3 control plane Tailscale IPs (100.120.55.97, 100.108.90.123, 100.92.149.85)
- Update primary k3s install to add Tailscale IPs to TLS certificates
- Enables kubectl access via Tailscale without certificate errors

After next deploy, cluster will be accessible via:
- kubectl --server=https://100.120.55.97:6443 (or any CP tailscale IP)
- kubectl --server=https://k8s-cluster-cp-1:6443 (via tailscale DNS)
2026-03-23 23:04:00 +00:00
952a80a742 Fix HA cluster join via Load Balancer private IP
Some checks failed
Deploy Cluster / Terraform (push) Successful in 36s
Deploy Cluster / Ansible (push) Failing after 3m5s
Changes:
- Use LB private IP (10.0.1.5) instead of public IP for cluster joins
- Add LB private IP to k3s TLS SANs on primary control plane
- This allows secondary CPs and workers to verify certificates when joining via LB

Fixes x509 certificate validation error when joining via LB public IP.
2026-03-23 02:56:41 +00:00
ff31cb4e74 Implement HA control plane with Load Balancer (3-3 topology)
Some checks failed
Deploy Cluster / Terraform (push) Failing after 10s
Deploy Cluster / Ansible (push) Has been skipped
Major changes:
- Terraform: Scale to 3 control planes (cx23) + 3 workers (cx33)
- Terraform: Add Hetzner Load Balancer (lb11) for Kubernetes API
- Terraform: Add kube_api_lb_ip output
- Ansible: Add community.network collection to requirements
- Ansible: Update inventory to include LB endpoint
- Ansible: Configure secondary CPs and workers to join via LB
- Ansible: Add k3s_join_endpoint variable for HA joins
- Workflow: Add imports for cp-2, cp-3, and worker-3
- Docs: Update STABLE_BASELINE.md with HA topology and phase gates

Topology:
- 3 control planes (cx23 - 2 vCPU, 8GB RAM each)
- 3 workers (cx33 - 4 vCPU, 16GB RAM each)
- 1 Load Balancer (lb11) routing to all 3 control planes on port 6443
- Workers and secondary CPs join via LB endpoint for HA

Cost impact: +~€26/month (2 extra CPs + 1 extra worker + LB)
2026-03-23 02:39:39 +00:00
8d1f9f4944 fix: add k3s reset logic for primary control plane
Some checks failed
Deploy Cluster / Terraform (push) Successful in 39s
Deploy Cluster / Ansible (push) Failing after 4m19s
2026-03-21 16:10:17 +00:00
d4fd43e2f5 refactor: simplify k3s-server bootstrap for 2026-03-21 15:48:33 +00:00
9d2f30de32 fix: prepare k3s for external cloud provider
All checks were successful
Deploy Cluster / Terraform (push) Successful in 46s
Deploy Cluster / Ansible (push) Successful in 4m4s
2026-03-17 01:21:23 +00:00
f0dd31c552 fix: only manage kubeconfig on primary control plane
Some checks failed
Deploy Cluster / Terraform (push) Successful in 20s
Deploy Cluster / Ansible (push) Failing after 4m31s
2026-03-01 03:02:37 +00:00
b703cb269b fix: bootstrap k3s HA on private network with dual SANs
Some checks failed
Deploy Cluster / Terraform (push) Successful in 2m31s
Deploy Cluster / Ansible (push) Failing after 4m38s
2026-03-01 02:45:00 +00:00
a5ea696e0f chore: capture k3s secondary install diagnostics on failure
Some checks failed
Deploy Cluster / Terraform (push) Successful in 18s
Deploy Cluster / Ansible (push) Failing after 2m50s
2026-03-01 02:05:07 +00:00
2ae16414a0 fix: remove strict 9345 precheck for secondary join
Some checks failed
Deploy Cluster / Terraform (push) Successful in 20s
Deploy Cluster / Ansible (push) Failing after 2m46s
2026-03-01 01:42:28 +00:00
063d6dfcc0 fix: auto-reset broken secondary k3s servers and precheck join ports
Some checks failed
Deploy Cluster / Terraform (push) Successful in 22s
Deploy Cluster / Ansible (push) Failing after 4m37s
2026-03-01 01:25:20 +00:00
f699936172 fix: increase k3s readiness timeout and emit diagnostics on failure
Some checks failed
Deploy Cluster / Terraform (push) Successful in 21s
Deploy Cluster / Ansible (push) Has been cancelled
2026-03-01 00:59:17 +00:00
27b29322cd fix: use private network IPs for k3s join and node addressing
Some checks failed
Deploy Cluster / Terraform (push) Successful in 24s
Deploy Cluster / Ansible (push) Failing after 8m13s
2026-03-01 00:42:55 +00:00
1db435cd42 fix: Use private IP for k3s HA cluster join and advertise
Some checks failed
Deploy Cluster / Terraform (push) Successful in 19s
Deploy Cluster / Ansible (push) Failing after 8m11s
2026-03-01 00:32:03 +00:00
691b3ed316 fix: Check for k3s service instead of binary for proper HA join detection
Some checks failed
Deploy Cluster / Terraform (push) Successful in 19s
Deploy Cluster / Ansible (push) Failing after 8m5s
2026-02-28 23:16:39 +00:00
3b3084b997 feat: Add HA Kubernetes cluster with Terraform + Ansible
Some checks failed
Terraform / Validate (push) Failing after 17s
Terraform / Plan (push) Has been skipped
Terraform / Apply (push) Has been skipped
- 3x CX23 control plane nodes (HA)
- 4x CX33 worker nodes
- k3s with embedded etcd
- Hetzner CCM for load balancers
- Gitea CI/CD workflows
- Backblaze B2 for Terraform state
2026-02-28 20:24:55 +00:00