-`STABLE_BASELINE.md` still contains stale Rancher backup/restore references; current workflows and addon manifests do not deploy or restore `rancher-backup`.
- Ansible setup: `ansible-galaxy collection install -r ansible/requirements.yml`, then from `ansible/` run `python3 generate_inventory.py` and `ansible-playbook site.yml --syntax-check`.
- Pushes to `main` run Gitea CI: Terraform fmt/init/validate/plan/apply, Proxmox cleanup/retry, Ansible bootstrap, Flux bootstrap, addon gates, Rancher gate, observability image seeding, health checks, tailnet smoke checks.
- Deploy and destroy workflows share `concurrency.group: prod-cluster`; destroy only requires workflow input `confirm: destroy` and has no backup gate.
- Keep `set -euo pipefail` in workflow shell blocks.
- Terraform retry cleanup has hard-coded target VMIDs/names in `.gitea/workflows/deploy.yml`; update it when changing node counts, names, or VMIDs.
- Fresh VMs have unreliable registry/chart egress, so critical images are prepared by `skopeo` on the runner and imported with `k3s ctr`; update the workflow archive lists when adding bootstrap-time images.
- CI applies `clusters/prod/flux-system/gotk-components.yaml` directly and then patches Flux controller deployments inline; changes only in `gotk-controller-cp1-patches.yaml` do not affect CI bootstrap.
- Vendored charts are intentional: `infrastructure/charts/{cert-manager,traefik,kube-prometheus-stack,tailscale-operator,rancher}`. Do not restore remote `HelmRepository` objects unless cluster-side chart fetch reliability is intentionally changed.
- External Secrets and Loki/Promtail use Flux `OCIRepository`; Rancher, Tailscale, cert-manager, Traefik, and kube-prometheus-stack use `GitRepository` chart paths.
- Use fully qualified `helmchart.source.toolkit.fluxcd.io/...` in scripts; K3s also has `helmcharts.helm.cattle.io`, so `helmchart/...` can target the wrong resource.
-`doppler-bootstrap` only creates the `external-secrets` namespace and Doppler token secret. The deploy workflow creates `ClusterSecretStore/doppler-hetznerterra` after ESO CRDs and webhook endpoints exist.
- The checked-in `infrastructure/addons/external-secrets/clustersecretstore-doppler-hetznerterra.yaml` is not included by that addon kustomization; do not assume Flux applies it.
- Keep Kubernetes manifests one object per file with kebab-case filenames.
## Gotchas
- Rancher chart `2.13.3` requires Kubernetes `<1.35.0-0`; K3s `latest` can break Rancher. Role defaults pin `v1.34.6+k3s1`; do not reintroduce a generated-inventory `k3s_version=latest` override.
- The repo no longer uses a cloud controller manager. `providerID`, Hetzner CCM/CSI, or Hetzner firewall/load-balancer logic is stale.
- Tailscale cleanup must only remove stale offline reserved hostnames before live service proxies exist; do not delete active `rancher`, `grafana`, `prometheus`, or `flux` devices.
- Proxmox endpoint should be the base URL, for example `https://100.105.0.115:8006/`; provider/workflow code strips `/api2/json` when needed.
- Current private URLs: Rancher `https://rancher.silverside-gopher.ts.net/`, Grafana `http://grafana.silverside-gopher.ts.net/`, Prometheus `http://prometheus.silverside-gopher.ts.net:9090/`.