docs: record validated Rancher restore drill
Update the baseline to treat Rancher backup and restore validation as part of the accepted platform state, and capture the successful live drill run performed on 2026-04-18.
This commit is contained in:
+13
-2
@@ -27,6 +27,7 @@ This document defines the current engineering target for this repository.
|
|||||||
- Tailscale private access and smoke-check validation
|
- Tailscale private access and smoke-check validation
|
||||||
- cert-manager
|
- cert-manager
|
||||||
- Rancher and rancher-backup
|
- Rancher and rancher-backup
|
||||||
|
- Rancher backup/restore validation
|
||||||
- Observability stack (Grafana, Prometheus, Loki, Promtail)
|
- Observability stack (Grafana, Prometheus, Loki, Promtail)
|
||||||
- Persistent volume provisioning validated
|
- Persistent volume provisioning validated
|
||||||
|
|
||||||
@@ -39,7 +40,7 @@ This document defines the current engineering target for this repository.
|
|||||||
- public ingress or DNS
|
- public ingress or DNS
|
||||||
- public TLS
|
- public TLS
|
||||||
- app workloads
|
- app workloads
|
||||||
- DR / backup strategy
|
- cross-region / multi-cluster disaster recovery strategy
|
||||||
- upgrade strategy
|
- upgrade strategy
|
||||||
|
|
||||||
## Phase Gates
|
## Phase Gates
|
||||||
@@ -57,8 +58,18 @@ This document defines the current engineering target for this repository.
|
|||||||
11. External Secrets sync required secrets.
|
11. External Secrets sync required secrets.
|
||||||
12. Tailscale private access works for Rancher, Grafana, and Prometheus.
|
12. Tailscale private access works for Rancher, Grafana, and Prometheus.
|
||||||
13. CI smoke checks pass for Tailscale DNS resolution, `tailscale ping`, and HTTP reachability.
|
13. CI smoke checks pass for Tailscale DNS resolution, `tailscale ping`, and HTTP reachability.
|
||||||
14. Terraform destroy succeeds cleanly or via workflow retry.
|
14. A fresh Rancher backup can be created and restored successfully.
|
||||||
|
15. Terraform destroy succeeds cleanly or via workflow retry.
|
||||||
|
|
||||||
## Success Criteria
|
## Success Criteria
|
||||||
|
|
||||||
Success requires two consecutive HA rebuilds passing all phase gates with no manual fixes, no manual `kubectl` patching, and no manual Tailscale proxy recreation.
|
Success requires two consecutive HA rebuilds passing all phase gates with no manual fixes, no manual `kubectl` patching, and no manual Tailscale proxy recreation.
|
||||||
|
|
||||||
|
## Validated Drills
|
||||||
|
|
||||||
|
- 2026-04-18: live Rancher backup/restore drill succeeded on the current cluster.
|
||||||
|
- A fresh one-time backup was created, restored back onto the same cluster, and post-restore validation confirmed:
|
||||||
|
- all nodes remained `Ready`
|
||||||
|
- Flux infrastructure stayed healthy
|
||||||
|
- Rancher backup/restore resources reported `Completed`
|
||||||
|
- Rancher, Grafana, and Prometheus remained reachable through the Tailscale smoke checks
|
||||||
|
|||||||
Reference in New Issue
Block a user