fix: add tailnet smoke checks and move Tailscale operator to stable
Add a post-deploy smoke test that validates Tailscale DNS, proxy readiness, reachability, and service responses for Rancher, Grafana, and Prometheus. Move the operator to the stable Helm repo/version and align the baseline docs with the current HA private-only architecture.
This commit is contained in:
+14
-15
@@ -8,8 +8,11 @@ This document defines the current engineering target for this repository.
|
||||
- 3 workers
|
||||
- Hetzner Load Balancer for Kubernetes API
|
||||
- private Hetzner network
|
||||
- Tailscale operator access
|
||||
- Rancher UI exposed only through Tailscale (`rancher.silverside-gopher.ts.net`)
|
||||
- Tailscale operator access and service exposure
|
||||
- Rancher exposed through Tailscale (`rancher.silverside-gopher.ts.net`)
|
||||
- Grafana exposed through Tailscale (`grafana.silverside-gopher.ts.net`)
|
||||
- Prometheus exposed through Tailscale (`prometheus.silverside-gopher.ts.net:9090`)
|
||||
- `apps` Kustomization suspended by default
|
||||
|
||||
## In Scope
|
||||
|
||||
@@ -21,12 +24,15 @@ This document defines the current engineering target for this repository.
|
||||
- **Hetzner CSI for persistent volumes (via Flux)**
|
||||
- Flux core reconciliation
|
||||
- External Secrets Operator with Doppler
|
||||
- Tailscale private access
|
||||
- Tailscale private access and smoke-check validation
|
||||
- cert-manager
|
||||
- Rancher and rancher-backup
|
||||
- Observability stack (Grafana, Prometheus, Loki, Promtail)
|
||||
- Persistent volume provisioning validated
|
||||
|
||||
## Deferred for Later Phases
|
||||
|
||||
- Observability stack (deferred - complex helm release needs separate debugging)
|
||||
- app workloads in `apps/`
|
||||
|
||||
## Out of Scope
|
||||
|
||||
@@ -49,17 +55,10 @@ This document defines the current engineering target for this repository.
|
||||
9. **CSI deploys and creates `hcloud-volumes` StorageClass**.
|
||||
10. **PVC provisioning tested and working**.
|
||||
11. External Secrets sync required secrets.
|
||||
12. Tailscale private access works, including Rancher UI access.
|
||||
13. Terraform destroy succeeds cleanly or via workflow retry.
|
||||
12. Tailscale private access works for Rancher, Grafana, and Prometheus.
|
||||
13. CI smoke checks pass for Tailscale DNS resolution, `tailscale ping`, and HTTP reachability.
|
||||
14. Terraform destroy succeeds cleanly or via workflow retry.
|
||||
|
||||
## Success Criteria
|
||||
|
||||
✅ **ACHIEVED** - HA Cluster with CCM/CSI:
|
||||
- Build 1: Initial CCM/CSI deployment and validation (2026-03-23)
|
||||
- Build 2: Full destroy/rebuild cycle successful (2026-03-23)
|
||||
|
||||
🔄 **IN PROGRESS** - HA Control Plane Validation:
|
||||
- Build 3: Deploy 3-3 topology with Load Balancer
|
||||
- Build 4: Destroy/rebuild to validate HA configuration
|
||||
|
||||
Success requires two consecutive HA rebuilds passing all phase gates with no manual fixes.
|
||||
Success requires two consecutive HA rebuilds passing all phase gates with no manual fixes, no manual `kubectl` patching, and no manual Tailscale proxy recreation.
|
||||
|
||||
Reference in New Issue
Block a user