7.8 KiB
7.8 KiB
AGENTS.md
Repository guide for agentic contributors working in this repo.
Scope
- Infrastructure repo for a Hetzner + k3s + Flux stack running Rancher.
- Primary areas:
terraform/,ansible/,clusters/,infrastructure/,apps/,.gitea/workflows/. - Treat
README.mdandSTABLE_BASELINE.mdas user-facing context, but prefer current manifests and workflows as source of truth. - Keep changes small and reviewable; prefer the narrowest file set that solves the task.
Architecture
- Terraform provisions Hetzner servers, network, firewall, load balancer, SSH keys.
- Ansible bootstraps OS, installs k3s (with external cloud provider), deploys Hetzner CCM, Tailscale, Doppler token.
- Flux reconciles all cluster addons from this repo after Ansible hands off.
- Rancher stores state in embedded etcd (NOT an external DB). Backup/restore uses the
rancher-backupoperator to B2. - cert-manager is required — Tailscale LoadBalancer does L4 TCP passthrough, so Rancher serves its own TLS.
- Secrets flow: Doppler →
ClusterSecretStore(doppler-hetznerterra) →ExternalSecretresources → k8s Secrets. - Rancher is reachable only over Tailscale at
https://rancher.silverside-gopher.ts.net/.
Important Files
terraform/main.tf— provider and version pinsterraform/variables.tf— input surface and defaultsterraform/firewall.tf— firewall rules (tailnet CIDR, internal cluster ports)ansible/site.yml— ordered bootstrap playbook (roles: common → k3s-server → ccm → k3s-agent → private-access → doppler → tailscale-cleanup)ansible/generate_inventory.py— rendersansible/inventory.inifrom Terraform outputs via Jinja2clusters/prod/flux-system/— Flux GitRepository and top-level Kustomization resourcesinfrastructure/addons/kustomization.yaml— root addon graph with dependency orderinginfrastructure/addons/<addon>/— each addon is a self-contained dir with its ownkustomization.yaml.gitea/workflows/deploy.yml— canonical CI: terraform → ansible → flux bootstrap → rancher fix → B2 restore
Build / Validate / Test
Terraform
- Format:
terraform -chdir=terraform fmt -recursive - Check formatting:
terraform -chdir=terraform fmt -check -recursive - Validate:
terraform -chdir=terraform validate - Plan (full):
terraform -chdir=terraform plan -var-file=../terraform.tfvars - Plan one resource:
terraform -chdir=terraform plan -var-file=../terraform.tfvars -target=hcloud_server.control_plane[0] - Apply:
terraform -chdir=terraform apply -var-file=../terraform.tfvars - State inspection:
terraform -chdir=terraform state list/terraform state show <address>
Ansible
- Install collections:
ansible-galaxy collection install -r ansible/requirements.yml - Generate inventory:
cd ansible && python3 generate_inventory.py(requires Terraform outputs) - Syntax check:
ansible-playbook -i ansible/inventory.ini ansible/site.yml --syntax-check - Dry-run one host:
ansible-playbook -i ansible/inventory.ini ansible/site.yml --check --diff -l control_plane[0] - Full bootstrap:
ansible-playbook ansible/site.yml - Targeted:
ansible-playbook ansible/site.yml -t upgradeor-t reset - Dashboards only:
ansible-playbook ansible/dashboards.yml
Python
- Syntax check:
python3 -m py_compile ansible/generate_inventory.py - Run:
cd ansible && python3 generate_inventory.py
Kubernetes / Flux manifests
- Render single addon:
kubectl kustomize infrastructure/addons/<addon> - Render cluster bootstrap:
kubectl kustomize clusters/prod/flux-system - Validate only the directory you edited, not the whole repo.
Kubeconfig refresh
- Preferred:
scripts/refresh-kubeconfig.sh <cp1-public-ip> - Manual:
ssh -i ~/.ssh/infra root@<cp1-ip> "cat /etc/rancher/k3s/k3s.yaml" | sed 's/127.0.0.1/<cp1-ip>/g' > outputs/kubeconfig
Code Style
General
- Match existing style in adjacent files. No new tools/frameworks unless the repo already uses them.
- Prefer ASCII. Keep diffs minimal. No unrelated cleanup.
- No comments unless the logic is non-obvious.
Terraform / HCL
- 2-space indent.
terraform {}block first, then providers, locals, variables, resources, outputs. snake_casefor variables, locals, resources. Descriptions on all variables/outputs.sensitive = trueon secrets. Runterraform fmtinstead of hand-formatting.- Use
localsfor reused or non-trivial logic. Explicitdepends_ononly when required.
Ansible / YAML
- 2-space YAML indent. Descriptive task names in sentence case.
- Idempotent tasks:
changed_when: falseandfailed_when: falsefor probes. command/shellonly when no dedicated module fits.shellonly for pipes/redirection/heredocs.whenguards anddefault(...)filters over duplicated tasks.- Role names and filenames: kebab-case. Variables: snake_case.
- Multi-line shell in workflows:
set -eorset -euo pipefailfor fail-fast.
Kubernetes / Flux YAML
- One object per file. Kebab-case filenames matching repo patterns:
helmrelease-*.yaml,kustomization-*.yaml,*-externalsecret.yaml. - Addon manifests live in
infrastructure/addons/<addon>/with akustomization.yaml. - Flux graph objects in
clusters/prod/flux-system/. - Each addon gets a
kustomization-<addon>.yamlentry ininfrastructure/addons/withdependsOnfor ordering. - Quote strings with
:,*, cron expressions, or shell-sensitive chars. - Preserve existing labels/annotations unless the change specifically needs them.
Python
- PEP 8. Imports ordered: stdlib, third-party, local.
snake_casefor functions/variables. - Scripts small and explicit. Exit non-zero on failure. Clear subprocess error handling.
Known Issues & Workarounds
- rancher-backup post-install job (
rancher-backup-patch-sa) fails becauserancher/kuberlr-kubectlcan't download kubectl. CI patches the SA and deletes the failed job. Do NOT sets3block in HelmRelease values — put S3 config in the Backup CR instead. - B2 ExternalSecret must use key names
accessKeyandsecretKey(notaws_access_key_id/aws_secret_access_key). - Stale Tailscale devices: After cluster rebuild, delete stale offline
rancherdevices before booting. Thetailscale-cleanupAnsible role handles this via the Tailscale API. - Restricted B2 keys:
b2_authorize_accountmay returnallowed.bucketId: null. CI falls back tob2_list_bucketsto resolve bucket ID by name.
Secrets / Security
- Never commit tokens, passwords, kubeconfigs, private keys, or generated secrets.
- Runtime secrets via Gitea secrets (CI), Doppler, or External Secrets Operator.
terraform.tfvarsandoutputs/are gitignored. Never print secret values in logs or commits.
CI Pipeline (.gitea/workflows/deploy.yml)
- Terraform: fmt check → init → validate → import existing servers → plan → apply (main only)
- Ansible: install deps → generate inventory → run site.yml with extra vars (secrets injected from Gitea)
- Flux bootstrap: install kubectl/flux → rewrite kubeconfig → apply CRDs → apply graph → wait for addons
- Rancher post-install: wait for Rancher/backup operator → patch SA → clean failed jobs → force reconcile
- B2 restore: authorize B2 → find latest backup → create Restore CR → poll until ready
- Health checks: nodes, Flux objects, pods, storage class
Editing Practices
- Read target file and adjacent patterns before editing.
- Run the narrowest validation command after edits.
- If you make a live-cluster workaround, also update the declarative manifests so Flux can own it.
- Changes spanning Terraform + Ansible + Flux: update and verify each layer separately.
- Check
git statusbefore and after changes.
Cursor / Copilot Rules
- No
.cursor/rules/,.cursorrules, or.github/copilot-instructions.mdfiles exist. - If added later, mirror their guidance here and treat them as authoritative.