docs: Add agent guidance and sync Rancher docs
All checks were successful
Deploy Cluster / Terraform (push) Successful in 2m33s
Deploy Cluster / Ansible (push) Successful in 9m44s

This commit is contained in:
2026-03-28 22:13:37 +00:00
parent 8c5edcf0a1
commit 43d11ac7e6
3 changed files with 158 additions and 2 deletions

149
AGENTS.md Normal file
View File

@@ -0,0 +1,149 @@
# AGENTS.md
Repository guide for agentic contributors working in this repo.
## Scope
- This is an infrastructure repository for a Hetzner + k3s + Flux stack.
- Primary areas: `terraform/`, `ansible/`, `clusters/`, `infrastructure/`, `.gitea/workflows/`.
- Treat `README.md` and `STABLE_BASELINE.md` as user-facing context, but prefer the repo's current manifests and workflows as the source of truth.
- Keep changes small and reviewable; prefer the narrowest file set that solves the task.
## Current Tooling
- Terraform for cloud infra and state-backed provisioning.
- Ansible for bootstrap, OS prep, k3s install, and pre-Flux prerequisites.
- Flux/Kustomize for cluster and addon reconciliation.
- Python for inventory generation (`ansible/generate_inventory.py`).
## Important Files
- `terraform/main.tf` - provider and version pins.
- `terraform/variables.tf` - input surface and defaults.
- `terraform/*.tf` - Hetzner network, firewall, servers, SSH, outputs.
- `ansible/site.yml` - ordered bootstrap playbook.
- `ansible/generate_inventory.py` - renders `ansible/inventory.ini` from Terraform outputs.
- `clusters/prod/flux-system/` - Flux source and top-level reconciliation graph.
- `infrastructure/addons/<addon>/` - Flux-managed addon manifests.
- `.gitea/workflows/*.yml` - CI/CD entry points and the best reference for expected commands.
## Build / Validate / Test
### Terraform
- Format all Terraform: `terraform -chdir=terraform fmt -recursive`
- Check formatting: `terraform -chdir=terraform fmt -check -recursive`
- Validate config: `terraform -chdir=terraform validate`
- Full plan: `terraform -chdir=terraform plan -var-file=../terraform.tfvars`
- Apply: `terraform -chdir=terraform apply -var-file=../terraform.tfvars`
- Destroy: `terraform -chdir=terraform destroy -var-file=../terraform.tfvars`
### Terraform, single-target / focused checks
- Plan one resource: `terraform -chdir=terraform plan -var-file=../terraform.tfvars -target=hcloud_server.control_plane[0]`
- Import/check existing state: use `terraform state list` and `terraform state show <address>` before editing imports.
- If you touch only Terraform formatting, run `terraform fmt -check -recursive` first.
### Ansible
- Install collections: `ansible-galaxy collection install -r ansible/requirements.yml`
- Generate inventory: `cd ansible && python3 generate_inventory.py`
- Syntax check: `ansible-playbook -i ansible/inventory.ini ansible/site.yml --syntax-check`
- Dry-run one host: `ansible-playbook -i ansible/inventory.ini ansible/site.yml --check --diff -l control_plane[0]`
- Run the bootstrap playbook: `ansible-playbook ansible/site.yml`
- Targeted maintenance: `ansible-playbook ansible/site.yml -t upgrade` or `-t reset`
- Dashboards only: `ansible-playbook ansible/dashboards.yml`
### Python
- Syntax check the inventory generator: `python3 -m py_compile ansible/generate_inventory.py`
- If you modify the script, run it after Terraform outputs exist: `cd ansible && python3 generate_inventory.py`.
### Kubernetes / Flux manifests
- Render a single addon: `kubectl kustomize infrastructure/addons/<addon>`
- Render cluster bootstrap objects: `kubectl kustomize clusters/prod/flux-system`
- Prefer validating the exact directory you edited, not the whole repo, unless the change is cross-cutting.
- For Flux changes, verify the relevant `Kustomization`/`HelmRelease`/`ExternalSecret` manifests render cleanly before committing.
## Code Style
### General
- Match the existing style in adjacent files.
- Prefer ASCII unless the file already uses Unicode or a Unicode character is necessary.
- Do not introduce new tools, frameworks, or abstractions unless the repo already uses them.
- Keep diffs minimal and avoid unrelated cleanup.
### Terraform / HCL
- Use 2-space indentation.
- Keep `terraform {}` blocks first, then providers, locals, variables, resources, and outputs in a logical order.
- Name variables, locals, and resources in `snake_case`.
- Keep descriptions on variables and outputs.
- Mark sensitive values with `sensitive = true`.
- Use aligned `=` formatting when practical; run `terraform fmt` instead of hand-formatting.
- Prefer explicit `depends_on` only when required.
- Keep logic in `locals` if it is reused or non-trivial.
### Ansible / YAML
- Use 2-space YAML indentation.
- Use descriptive task names in sentence case (e.g. `Install k3s server`).
- Keep tasks idempotent; use `changed_when: false` and `failed_when: false` for probes and checks.
- Use `command`/`shell` only when a dedicated module is not a better fit.
- Use `shell` only when you need pipes, redirection, heredocs, or shell expansion.
- Prefer `when` guards and `default(...)` filters over duplicating tasks.
- Keep role names and file names kebab-case; keep variables snake_case.
- For multi-line shell snippets in workflows or tasks, use `set -e` or `set -euo pipefail` when the command sequence should fail fast.
### Kubernetes / Flux YAML
- Keep one Kubernetes object per file unless the repo already groups a small set of tightly related objects.
- Use kebab-case filenames that match the repo pattern (`helmrelease-*.yaml`, `kustomization-*.yaml`, `*-externalsecret.yaml`).
- Keep addon manifests under `infrastructure/addons/<addon>/` with a nested `kustomization.yaml`.
- Keep Flux graph objects in `clusters/prod/flux-system/`.
- Quote strings that contain `:`, `*`, cron expressions, or shell-sensitive characters.
- Preserve existing labels/annotations unless the change specifically needs them.
### Python
- Follow PEP 8 style and keep imports ordered: stdlib, third-party, local.
- Use `snake_case` for functions and variables.
- Keep scripts small and explicit; exit non-zero on failure.
- Prefer clear subprocess error handling over silent failures.
## Editing Practices
- Read the target file and adjacent patterns before editing.
- Preserve user changes; do not overwrite unrelated diffs.
- Prefer `apply_patch` for small single-file edits.
- Use scripting only when it is cleaner than repeated manual edits.
- Keep comments minimal and only add them for non-obvious logic.
## Secrets / Security
- Never commit tokens, passwords, kubeconfigs, private keys, or generated secrets.
- Use Gitea secrets, Doppler, or External Secrets for runtime secrets.
- Avoid printing secret values in logs, comments, or commit messages.
- If you must inspect a secret locally, only verify shape/length or compare values indirectly.
## Workflow Expectations
- Read the target file and nearby patterns before editing.
- Check `git status` before and after your changes.
- Run the narrowest relevant validation command after edits.
- If you make a live-cluster workaround, also update the declarative manifests so Flux can own it.
- Do not overwrite user changes you did not make.
- If a change spans Terraform + Ansible + Flux, update and verify each layer separately.
## CI / Workflow Notes
- CI currently uses `.gitea/workflows/deploy.yml`, `.gitea/workflows/destroy.yml`, and `.gitea/workflows/dashboards.yml` as the canonical automation references.
- The workflows run `terraform fmt -check -recursive`, `terraform validate`, Terraform plan/apply, Ansible bootstrap, and targeted Flux bootstrap steps.
- If you change workflow behavior, keep the repo docs and the workflow commands in sync.
## Cursor / Copilot Rules
- No `.cursor/rules/`, `.cursorrules`, or `.github/copilot-instructions.md` files were present when this file was created.
- If those files are added later, mirror their guidance here and treat them as authoritative.

View File

@@ -11,7 +11,7 @@ Production-ready Kubernetes cluster on Hetzner Cloud using Terraform and Ansible
| **Total Cost** | €28.93/mo | | **Total Cost** | €28.93/mo |
| **K8s** | k3s (latest, HA) | | **K8s** | k3s (latest, HA) |
| **Addons** | Hetzner CCM + CSI + Prometheus + Grafana + Loki | | **Addons** | Hetzner CCM + CSI + Prometheus + Grafana + Loki |
| **Access** | SSH/API restricted to Tailnet | | **Access** | SSH/API and Rancher UI restricted to Tailnet |
| **Bootstrap** | Terraform + Ansible | | **Bootstrap** | Terraform + Ansible |
### Cluster Resources ### Cluster Resources
@@ -239,6 +239,12 @@ Terraform/bootstrap secrets remain in Gitea Actions secrets and are not managed
- Ansible is limited to cluster bootstrap, private-access setup, and prerequisite secret creation for Flux-managed addons. - Ansible is limited to cluster bootstrap, private-access setup, and prerequisite secret creation for Flux-managed addons.
- `addon-flux-ui` is optional for the stable-baseline phase and is not a blocker for rebuild success. - `addon-flux-ui` is optional for the stable-baseline phase and is not a blocker for rebuild success.
### Rancher access
- Rancher is private-only and exposed through Tailscale at `https://rancher.silverside-gopher.ts.net/dashboard/`.
- The public Hetzner load balancer path is not used for Rancher.
- Rancher uses the CNPG-backed PostgreSQL cluster in `cnpg-cluster`.
### Stable baseline acceptance ### Stable baseline acceptance
A rebuild is considered successful only when all of the following pass without manual intervention: A rebuild is considered successful only when all of the following pass without manual intervention:

View File

@@ -9,6 +9,7 @@ This document defines the current engineering target for this repository.
- Hetzner Load Balancer for Kubernetes API - Hetzner Load Balancer for Kubernetes API
- private Hetzner network - private Hetzner network
- Tailscale operator access - Tailscale operator access
- Rancher UI exposed only through Tailscale
## In Scope ## In Scope
@@ -48,7 +49,7 @@ This document defines the current engineering target for this repository.
9. **CSI deploys and creates `hcloud-volumes` StorageClass**. 9. **CSI deploys and creates `hcloud-volumes` StorageClass**.
10. **PVC provisioning tested and working**. 10. **PVC provisioning tested and working**.
11. External Secrets sync required secrets. 11. External Secrets sync required secrets.
12. Tailscale private access works. 12. Tailscale private access works, including Rancher UI access.
13. Terraform destroy succeeds cleanly or via workflow retry. 13. Terraform destroy succeeds cleanly or via workflow retry.
## Success Criteria ## Success Criteria