# AGENTS.md Repository guide for OpenCode sessions in this repo. ## Read First - Trust manifests and workflows over prose when they conflict. - Highest-value sources: `terraform/main.tf`, `terraform/variables.tf`, `ansible/site.yml`, `clusters/prod/flux-system/`, `infrastructure/addons/kustomization.yaml`, `.gitea/workflows/deploy.yml`, `.gitea/workflows/destroy.yml`, `README.md`, `STABLE_BASELINE.md`, `scripts/refresh-kubeconfig.sh`, `scripts/smoke-check-tailnet-services.sh`. ## Current Baseline - HA private cluster: 3 control planes, 5 workers on Proxmox. - Proxmox clones come from template `9000` on node `flex`; API VIP is `10.27.27.40` via kube-vip. - Storage is `nfs-subdir-external-provisioner` backed by `10.27.27.22:/TheFlash/k8s-nfs` with StorageClass `flash-nfs`. - Tailscale is the private access path for Rancher and shared services. - Rancher, Grafana, and Prometheus are exposed through Tailscale; Flux UI / Weave GitOps is removed. - `apps/` is suspended by default. - Rancher stores state in embedded etcd; backup/restore uses `rancher-backup` to B2. ## Common Commands - Terraform: `terraform -chdir=terraform fmt -recursive`, `terraform -chdir=terraform validate`, `terraform -chdir=terraform plan -var-file=../terraform.tfvars`, `terraform -chdir=terraform apply -var-file=../terraform.tfvars` - Ansible: `ansible-galaxy collection install -r ansible/requirements.yml`, `cd ansible && python3 generate_inventory.py`, `ansible-playbook -i ansible/inventory.ini ansible/site.yml --syntax-check`, `ansible-playbook ansible/site.yml` - Flux/Kustomize: `kubectl kustomize infrastructure/addons/`, `kubectl kustomize clusters/prod/flux-system` - Kubeconfig refresh: `scripts/refresh-kubeconfig.sh ` - Tailnet smoke check: `ssh ubuntu@ 'bash -s' < scripts/smoke-check-tailnet-services.sh` ## Workflow Rules - Keep diffs small and validate only the directory you edited. - Update manifests and docs together when behavior changes. - Use `set -euo pipefail` in workflow shell blocks. - CI deploy order is Terraform -> Ansible -> Flux bootstrap -> Rancher restore -> health checks. - One object per Kubernetes YAML file; keep filenames kebab-case. - If `kubectl` points at `localhost:8080` after a rebuild, refresh kubeconfig from the primary control-plane IP. - Bootstrap assumptions that matter: SSH user is `ubuntu`, NIC is `ens18`, API join endpoint is the kube-vip address. ## Repo-Specific Gotchas - `rancher-backup` uses a postRenderer to swap the broken hook image to `rancher/kubectl:v1.34.0`; do not put S3 config in HelmRelease values. Put it in the Backup CR. - Tailscale cleanup only runs before service proxies exist; it removes stale offline `rancher`/`grafana`/`prometheus`/`flux` devices, then must stop so live proxies are not deleted. - Keep the Tailscale operator on the stable Helm repo `https://pkgs.tailscale.com/helmcharts` at `1.96.5` unless you have a reason to change it. - The repo no longer uses a cloud controller manager. If you see `providerID` or Hetzner-specific logic, it is stale. - Current private URLs: - Rancher: `https://rancher.silverside-gopher.ts.net/` - Grafana: `http://grafana.silverside-gopher.ts.net/` - Prometheus: `http://prometheus.silverside-gopher.ts.net:9090/` ## Secrets - Runtime secrets live in Doppler + External Secrets. - Bootstrap and CI secrets stay in Gitea; never commit secrets, kubeconfigs, or private keys.