Files
HetznerTerra/STABLE_BASELINE.md
MichaelFisher1997 c9df11e65f
All checks were successful
Deploy Cluster / Terraform (push) Successful in 49s
Deploy Cluster / Ansible (push) Successful in 6m1s
fix: Align Rancher tailnet hostname with live proxy
2026-03-28 23:47:09 +00:00

2.2 KiB

Stable Private-Only Baseline

This document defines the current engineering target for this repository.

Topology

  • 3 control planes (HA etcd cluster)
  • 3 workers
  • Hetzner Load Balancer for Kubernetes API
  • private Hetzner network
  • Tailscale operator access
  • Rancher UI exposed only through Tailscale (rancher-1.silverside-gopher.ts.net)

In Scope

  • Terraform infrastructure bootstrap
  • Ansible k3s bootstrap with external cloud provider
  • HA control plane (3 nodes with etcd quorum)
  • Hetzner Load Balancer for Kubernetes API
  • Hetzner CCM deployed via Ansible (before workers join)
  • Hetzner CSI for persistent volumes (via Flux)
  • Flux core reconciliation
  • External Secrets Operator with Doppler
  • Tailscale private access
  • Persistent volume provisioning validated

Deferred for Later Phases

  • Observability stack (deferred - complex helm release needs separate debugging)

Out of Scope

  • public ingress or DNS
  • public TLS
  • app workloads
  • DR / backup strategy
  • upgrade strategy

Phase Gates

  1. Terraform apply completes for HA topology (3 CP, 3 workers, 1 LB).
  2. Load Balancer is healthy with all 3 control plane targets.
  3. Primary control plane bootstraps with --cluster-init.
  4. Secondary control planes join via Load Balancer endpoint.
  5. CCM deployed via Ansible before workers join (fixes uninitialized taint issue).
  6. Workers join successfully via Load Balancer and all nodes show proper providerID.
  7. etcd reports 3 healthy members.
  8. Flux source and infrastructure reconciliation are healthy.
  9. CSI deploys and creates hcloud-volumes StorageClass.
  10. PVC provisioning tested and working.
  11. External Secrets sync required secrets.
  12. Tailscale private access works, including Rancher UI access.
  13. Terraform destroy succeeds cleanly or via workflow retry.

Success Criteria

ACHIEVED - HA Cluster with CCM/CSI:

  • Build 1: Initial CCM/CSI deployment and validation (2026-03-23)
  • Build 2: Full destroy/rebuild cycle successful (2026-03-23)

🔄 IN PROGRESS - HA Control Plane Validation:

  • Build 3: Deploy 3-3 topology with Load Balancer
  • Build 4: Destroy/rebuild to validate HA configuration

Success requires two consecutive HA rebuilds passing all phase gates with no manual fixes.