624cd5aab6
The cluster nodes can reach the exported NFS path on 10.27.27.239, not 10.27.27.22. Update the storage addon and repo note so the NFS provisioner mounts the live export and Flux health checks can converge.
3.4 KiB
3.4 KiB
AGENTS.md
Repository guide for OpenCode sessions in this repo.
Read First
- Trust manifests and workflows over prose when they conflict.
- Highest-value sources:
terraform/main.tf,terraform/variables.tf,ansible/site.yml,clusters/prod/flux-system/,infrastructure/addons/kustomization.yaml,.gitea/workflows/deploy.yml,.gitea/workflows/destroy.yml,README.md,STABLE_BASELINE.md,scripts/refresh-kubeconfig.sh,scripts/smoke-check-tailnet-services.sh.
Current Baseline
- HA private cluster: 3 control planes, 5 workers on Proxmox.
- Proxmox clones come from template
9000on nodeflex; API VIP is10.27.27.40via kube-vip. - Storage is
nfs-subdir-external-provisionerbacked by10.27.27.239:/TheFlash/k8s-nfswith StorageClassflash-nfs. - Tailscale is the private access path for Rancher and shared services.
- Rancher, Grafana, and Prometheus are exposed through Tailscale; Flux UI / Weave GitOps is removed.
apps/is suspended by default.- Rancher stores state in embedded etcd; backup/restore uses
rancher-backupto B2.
Common Commands
- Terraform:
terraform -chdir=terraform fmt -recursive,terraform -chdir=terraform validate,terraform -chdir=terraform plan -var-file=../terraform.tfvars,terraform -chdir=terraform apply -var-file=../terraform.tfvars - Ansible:
ansible-galaxy collection install -r ansible/requirements.yml,cd ansible && python3 generate_inventory.py,ansible-playbook -i ansible/inventory.ini ansible/site.yml --syntax-check,ansible-playbook ansible/site.yml - Flux/Kustomize:
kubectl kustomize infrastructure/addons/<addon>,kubectl kustomize clusters/prod/flux-system - Kubeconfig refresh:
scripts/refresh-kubeconfig.sh <cp1-ip> - Tailnet smoke check:
ssh ubuntu@<cp1-ip> 'bash -s' < scripts/smoke-check-tailnet-services.sh
Workflow Rules
- Keep diffs small and validate only the directory you edited.
- Update manifests and docs together when behavior changes.
- Use
set -euo pipefailin workflow shell blocks. - CI deploy order is Terraform -> Ansible -> Flux bootstrap -> Rancher restore -> health checks.
- One object per Kubernetes YAML file; keep filenames kebab-case.
- If
kubectlpoints atlocalhost:8080after a rebuild, refresh kubeconfig from the primary control-plane IP. - Bootstrap assumptions that matter: SSH user is
ubuntu, NIC isens18, API join endpoint is the kube-vip address.
Repo-Specific Gotchas
rancher-backupuses a postRenderer to swap the broken hook image torancher/kubectl:v1.34.0; do not put S3 config in HelmRelease values. Put it in the Backup CR.- Tailscale cleanup only runs before service proxies exist; it removes stale offline
rancher/grafana/prometheus/fluxdevices, then must stop so live proxies are not deleted. - Keep the Tailscale operator on the stable Helm repo
https://pkgs.tailscale.com/helmchartsat1.96.5unless you have a reason to change it. - The repo no longer uses a cloud controller manager. If you see
providerIDor Hetzner-specific logic, it is stale. - Current private URLs:
- Rancher:
https://rancher.silverside-gopher.ts.net/ - Grafana:
http://grafana.silverside-gopher.ts.net/ - Prometheus:
http://prometheus.silverside-gopher.ts.net:9090/
- Rancher:
Secrets
- Runtime secrets live in Doppler + External Secrets.
- Bootstrap and CI secrets stay in Gitea; never commit secrets, kubeconfigs, or private keys.