66 lines
2.2 KiB
Markdown
66 lines
2.2 KiB
Markdown
# Stable Private-Only Baseline
|
|
|
|
This document defines the current engineering target for this repository.
|
|
|
|
## Topology
|
|
|
|
- 3 control planes (HA etcd cluster)
|
|
- 3 workers
|
|
- Hetzner Load Balancer for Kubernetes API
|
|
- private Hetzner network
|
|
- Tailscale operator access
|
|
- Rancher UI exposed only through Tailscale (`rancher.silverside-gopher.ts.net`)
|
|
|
|
## In Scope
|
|
|
|
- Terraform infrastructure bootstrap
|
|
- Ansible k3s bootstrap with external cloud provider
|
|
- **HA control plane (3 nodes with etcd quorum)**
|
|
- **Hetzner Load Balancer for Kubernetes API**
|
|
- **Hetzner CCM deployed via Ansible (before workers join)**
|
|
- **Hetzner CSI for persistent volumes (via Flux)**
|
|
- Flux core reconciliation
|
|
- External Secrets Operator with Doppler
|
|
- Tailscale private access
|
|
- Persistent volume provisioning validated
|
|
|
|
## Deferred for Later Phases
|
|
|
|
- Observability stack (deferred - complex helm release needs separate debugging)
|
|
|
|
## Out of Scope
|
|
|
|
- public ingress or DNS
|
|
- public TLS
|
|
- app workloads
|
|
- DR / backup strategy
|
|
- upgrade strategy
|
|
|
|
## Phase Gates
|
|
|
|
1. Terraform apply completes for HA topology (3 CP, 3 workers, 1 LB).
|
|
2. Load Balancer is healthy with all 3 control plane targets.
|
|
3. Primary control plane bootstraps with `--cluster-init`.
|
|
4. Secondary control planes join via Load Balancer endpoint.
|
|
5. **CCM deployed via Ansible before workers join** (fixes uninitialized taint issue).
|
|
6. Workers join successfully via Load Balancer and all nodes show proper `providerID`.
|
|
7. etcd reports 3 healthy members.
|
|
8. Flux source and infrastructure reconciliation are healthy.
|
|
9. **CSI deploys and creates `hcloud-volumes` StorageClass**.
|
|
10. **PVC provisioning tested and working**.
|
|
11. External Secrets sync required secrets.
|
|
12. Tailscale private access works, including Rancher UI access.
|
|
13. Terraform destroy succeeds cleanly or via workflow retry.
|
|
|
|
## Success Criteria
|
|
|
|
✅ **ACHIEVED** - HA Cluster with CCM/CSI:
|
|
- Build 1: Initial CCM/CSI deployment and validation (2026-03-23)
|
|
- Build 2: Full destroy/rebuild cycle successful (2026-03-23)
|
|
|
|
🔄 **IN PROGRESS** - HA Control Plane Validation:
|
|
- Build 3: Deploy 3-3 topology with Load Balancer
|
|
- Build 4: Destroy/rebuild to validate HA configuration
|
|
|
|
Success requires two consecutive HA rebuilds passing all phase gates with no manual fixes.
|