diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..eb50746 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,149 @@ +# AGENTS.md + +Repository guide for agentic contributors working in this repo. + +## Scope + +- This is an infrastructure repository for a Hetzner + k3s + Flux stack. +- Primary areas: `terraform/`, `ansible/`, `clusters/`, `infrastructure/`, `.gitea/workflows/`. +- Treat `README.md` and `STABLE_BASELINE.md` as user-facing context, but prefer the repo's current manifests and workflows as the source of truth. +- Keep changes small and reviewable; prefer the narrowest file set that solves the task. + +## Current Tooling + +- Terraform for cloud infra and state-backed provisioning. +- Ansible for bootstrap, OS prep, k3s install, and pre-Flux prerequisites. +- Flux/Kustomize for cluster and addon reconciliation. +- Python for inventory generation (`ansible/generate_inventory.py`). + +## Important Files + +- `terraform/main.tf` - provider and version pins. +- `terraform/variables.tf` - input surface and defaults. +- `terraform/*.tf` - Hetzner network, firewall, servers, SSH, outputs. +- `ansible/site.yml` - ordered bootstrap playbook. +- `ansible/generate_inventory.py` - renders `ansible/inventory.ini` from Terraform outputs. +- `clusters/prod/flux-system/` - Flux source and top-level reconciliation graph. +- `infrastructure/addons//` - Flux-managed addon manifests. +- `.gitea/workflows/*.yml` - CI/CD entry points and the best reference for expected commands. + +## Build / Validate / Test + +### Terraform + +- Format all Terraform: `terraform -chdir=terraform fmt -recursive` +- Check formatting: `terraform -chdir=terraform fmt -check -recursive` +- Validate config: `terraform -chdir=terraform validate` +- Full plan: `terraform -chdir=terraform plan -var-file=../terraform.tfvars` +- Apply: `terraform -chdir=terraform apply -var-file=../terraform.tfvars` +- Destroy: `terraform -chdir=terraform destroy -var-file=../terraform.tfvars` + +### Terraform, single-target / focused checks + +- Plan one resource: `terraform -chdir=terraform plan -var-file=../terraform.tfvars -target=hcloud_server.control_plane[0]` +- Import/check existing state: use `terraform state list` and `terraform state show
` before editing imports. +- If you touch only Terraform formatting, run `terraform fmt -check -recursive` first. + +### Ansible + +- Install collections: `ansible-galaxy collection install -r ansible/requirements.yml` +- Generate inventory: `cd ansible && python3 generate_inventory.py` +- Syntax check: `ansible-playbook -i ansible/inventory.ini ansible/site.yml --syntax-check` +- Dry-run one host: `ansible-playbook -i ansible/inventory.ini ansible/site.yml --check --diff -l control_plane[0]` +- Run the bootstrap playbook: `ansible-playbook ansible/site.yml` +- Targeted maintenance: `ansible-playbook ansible/site.yml -t upgrade` or `-t reset` +- Dashboards only: `ansible-playbook ansible/dashboards.yml` + +### Python + +- Syntax check the inventory generator: `python3 -m py_compile ansible/generate_inventory.py` +- If you modify the script, run it after Terraform outputs exist: `cd ansible && python3 generate_inventory.py`. + +### Kubernetes / Flux manifests + +- Render a single addon: `kubectl kustomize infrastructure/addons/` +- Render cluster bootstrap objects: `kubectl kustomize clusters/prod/flux-system` +- Prefer validating the exact directory you edited, not the whole repo, unless the change is cross-cutting. +- For Flux changes, verify the relevant `Kustomization`/`HelmRelease`/`ExternalSecret` manifests render cleanly before committing. + +## Code Style + +### General + +- Match the existing style in adjacent files. +- Prefer ASCII unless the file already uses Unicode or a Unicode character is necessary. +- Do not introduce new tools, frameworks, or abstractions unless the repo already uses them. +- Keep diffs minimal and avoid unrelated cleanup. + +### Terraform / HCL + +- Use 2-space indentation. +- Keep `terraform {}` blocks first, then providers, locals, variables, resources, and outputs in a logical order. +- Name variables, locals, and resources in `snake_case`. +- Keep descriptions on variables and outputs. +- Mark sensitive values with `sensitive = true`. +- Use aligned `=` formatting when practical; run `terraform fmt` instead of hand-formatting. +- Prefer explicit `depends_on` only when required. +- Keep logic in `locals` if it is reused or non-trivial. + +### Ansible / YAML + +- Use 2-space YAML indentation. +- Use descriptive task names in sentence case (e.g. `Install k3s server`). +- Keep tasks idempotent; use `changed_when: false` and `failed_when: false` for probes and checks. +- Use `command`/`shell` only when a dedicated module is not a better fit. +- Use `shell` only when you need pipes, redirection, heredocs, or shell expansion. +- Prefer `when` guards and `default(...)` filters over duplicating tasks. +- Keep role names and file names kebab-case; keep variables snake_case. +- For multi-line shell snippets in workflows or tasks, use `set -e` or `set -euo pipefail` when the command sequence should fail fast. + +### Kubernetes / Flux YAML + +- Keep one Kubernetes object per file unless the repo already groups a small set of tightly related objects. +- Use kebab-case filenames that match the repo pattern (`helmrelease-*.yaml`, `kustomization-*.yaml`, `*-externalsecret.yaml`). +- Keep addon manifests under `infrastructure/addons//` with a nested `kustomization.yaml`. +- Keep Flux graph objects in `clusters/prod/flux-system/`. +- Quote strings that contain `:`, `*`, cron expressions, or shell-sensitive characters. +- Preserve existing labels/annotations unless the change specifically needs them. + +### Python + +- Follow PEP 8 style and keep imports ordered: stdlib, third-party, local. +- Use `snake_case` for functions and variables. +- Keep scripts small and explicit; exit non-zero on failure. +- Prefer clear subprocess error handling over silent failures. + +## Editing Practices + +- Read the target file and adjacent patterns before editing. +- Preserve user changes; do not overwrite unrelated diffs. +- Prefer `apply_patch` for small single-file edits. +- Use scripting only when it is cleaner than repeated manual edits. +- Keep comments minimal and only add them for non-obvious logic. + +## Secrets / Security + +- Never commit tokens, passwords, kubeconfigs, private keys, or generated secrets. +- Use Gitea secrets, Doppler, or External Secrets for runtime secrets. +- Avoid printing secret values in logs, comments, or commit messages. +- If you must inspect a secret locally, only verify shape/length or compare values indirectly. + +## Workflow Expectations + +- Read the target file and nearby patterns before editing. +- Check `git status` before and after your changes. +- Run the narrowest relevant validation command after edits. +- If you make a live-cluster workaround, also update the declarative manifests so Flux can own it. +- Do not overwrite user changes you did not make. +- If a change spans Terraform + Ansible + Flux, update and verify each layer separately. + +## CI / Workflow Notes + +- CI currently uses `.gitea/workflows/deploy.yml`, `.gitea/workflows/destroy.yml`, and `.gitea/workflows/dashboards.yml` as the canonical automation references. +- The workflows run `terraform fmt -check -recursive`, `terraform validate`, Terraform plan/apply, Ansible bootstrap, and targeted Flux bootstrap steps. +- If you change workflow behavior, keep the repo docs and the workflow commands in sync. + +## Cursor / Copilot Rules + +- No `.cursor/rules/`, `.cursorrules`, or `.github/copilot-instructions.md` files were present when this file was created. +- If those files are added later, mirror their guidance here and treat them as authoritative. diff --git a/README.md b/README.md index 0baa2f3..5e159c7 100644 --- a/README.md +++ b/README.md @@ -11,7 +11,7 @@ Production-ready Kubernetes cluster on Hetzner Cloud using Terraform and Ansible | **Total Cost** | €28.93/mo | | **K8s** | k3s (latest, HA) | | **Addons** | Hetzner CCM + CSI + Prometheus + Grafana + Loki | -| **Access** | SSH/API restricted to Tailnet | +| **Access** | SSH/API and Rancher UI restricted to Tailnet | | **Bootstrap** | Terraform + Ansible | ### Cluster Resources @@ -239,6 +239,12 @@ Terraform/bootstrap secrets remain in Gitea Actions secrets and are not managed - Ansible is limited to cluster bootstrap, private-access setup, and prerequisite secret creation for Flux-managed addons. - `addon-flux-ui` is optional for the stable-baseline phase and is not a blocker for rebuild success. +### Rancher access + +- Rancher is private-only and exposed through Tailscale at `https://rancher.silverside-gopher.ts.net/dashboard/`. +- The public Hetzner load balancer path is not used for Rancher. +- Rancher uses the CNPG-backed PostgreSQL cluster in `cnpg-cluster`. + ### Stable baseline acceptance A rebuild is considered successful only when all of the following pass without manual intervention: diff --git a/STABLE_BASELINE.md b/STABLE_BASELINE.md index 1e953c0..ecfa1c4 100644 --- a/STABLE_BASELINE.md +++ b/STABLE_BASELINE.md @@ -9,6 +9,7 @@ This document defines the current engineering target for this repository. - Hetzner Load Balancer for Kubernetes API - private Hetzner network - Tailscale operator access +- Rancher UI exposed only through Tailscale ## In Scope @@ -48,7 +49,7 @@ This document defines the current engineering target for this repository. 9. **CSI deploys and creates `hcloud-volumes` StorageClass**. 10. **PVC provisioning tested and working**. 11. External Secrets sync required secrets. -12. Tailscale private access works. +12. Tailscale private access works, including Rancher UI access. 13. Terraform destroy succeeds cleanly or via workflow retry. ## Success Criteria