Implement HA control plane with Load Balancer (3-3 topology)
Major changes: - Terraform: Scale to 3 control planes (cx23) + 3 workers (cx33) - Terraform: Add Hetzner Load Balancer (lb11) for Kubernetes API - Terraform: Add kube_api_lb_ip output - Ansible: Add community.network collection to requirements - Ansible: Update inventory to include LB endpoint - Ansible: Configure secondary CPs and workers to join via LB - Ansible: Add k3s_join_endpoint variable for HA joins - Workflow: Add imports for cp-2, cp-3, and worker-3 - Docs: Update STABLE_BASELINE.md with HA topology and phase gates Topology: - 3 control planes (cx23 - 2 vCPU, 8GB RAM each) - 3 workers (cx33 - 4 vCPU, 16GB RAM each) - 1 Load Balancer (lb11) routing to all 3 control planes on port 6443 - Workers and secondary CPs join via LB endpoint for HA Cost impact: +~€26/month (2 extra CPs + 1 extra worker + LB)
This commit is contained in:
@@ -4,8 +4,9 @@ This document defines the current engineering target for this repository.
|
||||
|
||||
## Topology
|
||||
|
||||
- 1 control plane
|
||||
- 2 workers
|
||||
- 3 control planes (HA etcd cluster)
|
||||
- 3 workers
|
||||
- Hetzner Load Balancer for Kubernetes API
|
||||
- private Hetzner network
|
||||
- Tailscale operator access
|
||||
|
||||
@@ -13,6 +14,8 @@ This document defines the current engineering target for this repository.
|
||||
|
||||
- Terraform infrastructure bootstrap
|
||||
- Ansible k3s bootstrap with external cloud provider
|
||||
- **HA control plane (3 nodes with etcd quorum)**
|
||||
- **Hetzner Load Balancer for Kubernetes API**
|
||||
- **Hetzner CCM deployed via Ansible (before workers join)**
|
||||
- **Hetzner CSI for persistent volumes (via Flux)**
|
||||
- Flux core reconciliation
|
||||
@@ -26,7 +29,6 @@ This document defines the current engineering target for this repository.
|
||||
|
||||
## Out of Scope
|
||||
|
||||
- HA control plane
|
||||
- public ingress or DNS
|
||||
- public TLS
|
||||
- app workloads
|
||||
@@ -35,21 +37,28 @@ This document defines the current engineering target for this repository.
|
||||
|
||||
## Phase Gates
|
||||
|
||||
1. Terraform apply completes for the default topology.
|
||||
2. k3s server bootstrap completes with external cloud provider enabled.
|
||||
3. **CCM deployed via Ansible before workers join** (fixes uninitialized taint issue).
|
||||
4. Workers join successfully and all nodes show proper `providerID`.
|
||||
5. Flux source and infrastructure reconciliation are healthy.
|
||||
6. **CSI deploys and creates `hcloud-volumes` StorageClass**.
|
||||
7. **PVC provisioning tested and working** (validated with test pod).
|
||||
8. External Secrets sync required secrets.
|
||||
9. Tailscale private access works.
|
||||
10. Terraform destroy succeeds cleanly or via workflow retry.
|
||||
1. Terraform apply completes for HA topology (3 CP, 3 workers, 1 LB).
|
||||
2. Load Balancer is healthy with all 3 control plane targets.
|
||||
3. Primary control plane bootstraps with `--cluster-init`.
|
||||
4. Secondary control planes join via Load Balancer endpoint.
|
||||
5. **CCM deployed via Ansible before workers join** (fixes uninitialized taint issue).
|
||||
6. Workers join successfully via Load Balancer and all nodes show proper `providerID`.
|
||||
7. etcd reports 3 healthy members.
|
||||
8. Flux source and infrastructure reconciliation are healthy.
|
||||
9. **CSI deploys and creates `hcloud-volumes` StorageClass**.
|
||||
10. **PVC provisioning tested and working**.
|
||||
11. External Secrets sync required secrets.
|
||||
12. Tailscale private access works.
|
||||
13. Terraform destroy succeeds cleanly or via workflow retry.
|
||||
|
||||
## Success Criteria
|
||||
|
||||
✅ **ACHIEVED** - Two consecutive fresh rebuilds passed all phase gates with no manual fixes:
|
||||
✅ **ACHIEVED** - HA Cluster with CCM/CSI:
|
||||
- Build 1: Initial CCM/CSI deployment and validation (2026-03-23)
|
||||
- Build 2: Full destroy/rebuild cycle successful (2026-03-23)
|
||||
|
||||
The platform is now stable with cloud provider integration and persistent volume support.
|
||||
🔄 **IN PROGRESS** - HA Control Plane Validation:
|
||||
- Build 3: Deploy 3-3 topology with Load Balancer
|
||||
- Build 4: Destroy/rebuild to validate HA configuration
|
||||
|
||||
Success requires two consecutive HA rebuilds passing all phase gates with no manual fixes.
|
||||
|
||||
Reference in New Issue
Block a user