2.2 KiB
2.2 KiB
Stable Private-Only Baseline
This document defines the current engineering target for this repository.
Topology
- 3 control planes (HA etcd cluster)
- 3 workers
- Hetzner Load Balancer for Kubernetes API
- private Hetzner network
- Tailscale operator access
- Rancher UI exposed only through Tailscale (
rancher.silverside-gopher.ts.net)
In Scope
- Terraform infrastructure bootstrap
- Ansible k3s bootstrap with external cloud provider
- HA control plane (3 nodes with etcd quorum)
- Hetzner Load Balancer for Kubernetes API
- Hetzner CCM deployed via Ansible (before workers join)
- Hetzner CSI for persistent volumes (via Flux)
- Flux core reconciliation
- External Secrets Operator with Doppler
- Tailscale private access
- Persistent volume provisioning validated
Deferred for Later Phases
- Observability stack (deferred - complex helm release needs separate debugging)
Out of Scope
- public ingress or DNS
- public TLS
- app workloads
- DR / backup strategy
- upgrade strategy
Phase Gates
- Terraform apply completes for HA topology (3 CP, 3 workers, 1 LB).
- Load Balancer is healthy with all 3 control plane targets.
- Primary control plane bootstraps with
--cluster-init. - Secondary control planes join via Load Balancer endpoint.
- CCM deployed via Ansible before workers join (fixes uninitialized taint issue).
- Workers join successfully via Load Balancer and all nodes show proper
providerID. - etcd reports 3 healthy members.
- Flux source and infrastructure reconciliation are healthy.
- CSI deploys and creates
hcloud-volumesStorageClass. - PVC provisioning tested and working.
- External Secrets sync required secrets.
- Tailscale private access works, including Rancher UI access.
- Terraform destroy succeeds cleanly or via workflow retry.
Success Criteria
✅ ACHIEVED - HA Cluster with CCM/CSI:
- Build 1: Initial CCM/CSI deployment and validation (2026-03-23)
- Build 2: Full destroy/rebuild cycle successful (2026-03-23)
🔄 IN PROGRESS - HA Control Plane Validation:
- Build 3: Deploy 3-3 topology with Load Balancer
- Build 4: Destroy/rebuild to validate HA configuration
Success requires two consecutive HA rebuilds passing all phase gates with no manual fixes.