OpenStaticFish/HetznerTerra

Fork 0

Files

T

micqdf 624cd5aab6

Deploy Cluster / Terraform (push) Successful in 27s

Details

Deploy Cluster / Ansible (push) Failing after 18m51s

Details

fix: point NFS provisioner at active Proxmox host export

The cluster nodes can reach the exported NFS path on 10.27.27.239, not
10.27.27.22. Update the storage addon and repo note so the NFS provisioner
mounts the live export and Flux health checks can converge.

2026-04-22 09:46:01 +00:00

3.4 KiB

Raw Blame History

AGENTS.md

Repository guide for OpenCode sessions in this repo.

Read First

Trust manifests and workflows over prose when they conflict.
Highest-value sources: terraform/main.tf, terraform/variables.tf, ansible/site.yml, clusters/prod/flux-system/, infrastructure/addons/kustomization.yaml, .gitea/workflows/deploy.yml, .gitea/workflows/destroy.yml, README.md, STABLE_BASELINE.md, scripts/refresh-kubeconfig.sh, scripts/smoke-check-tailnet-services.sh.

Current Baseline

HA private cluster: 3 control planes, 5 workers on Proxmox.
Proxmox clones come from template 9000 on node flex; API VIP is 10.27.27.40 via kube-vip.
Storage is nfs-subdir-external-provisioner backed by 10.27.27.239:/TheFlash/k8s-nfs with StorageClass flash-nfs.
Tailscale is the private access path for Rancher and shared services.
Rancher, Grafana, and Prometheus are exposed through Tailscale; Flux UI / Weave GitOps is removed.
apps/ is suspended by default.
Rancher stores state in embedded etcd; backup/restore uses rancher-backup to B2.

Common Commands

Terraform: terraform -chdir=terraform fmt -recursive, terraform -chdir=terraform validate, terraform -chdir=terraform plan -var-file=../terraform.tfvars, terraform -chdir=terraform apply -var-file=../terraform.tfvars
Ansible: ansible-galaxy collection install -r ansible/requirements.yml, cd ansible && python3 generate_inventory.py, ansible-playbook -i ansible/inventory.ini ansible/site.yml --syntax-check, ansible-playbook ansible/site.yml
Flux/Kustomize: kubectl kustomize infrastructure/addons/<addon>, kubectl kustomize clusters/prod/flux-system
Kubeconfig refresh: scripts/refresh-kubeconfig.sh <cp1-ip>
Tailnet smoke check: ssh ubuntu@<cp1-ip> 'bash -s' < scripts/smoke-check-tailnet-services.sh

Workflow Rules

Keep diffs small and validate only the directory you edited.
Update manifests and docs together when behavior changes.
Use set -euo pipefail in workflow shell blocks.
CI deploy order is Terraform -> Ansible -> Flux bootstrap -> Rancher restore -> health checks.
One object per Kubernetes YAML file; keep filenames kebab-case.
If kubectl points at localhost:8080 after a rebuild, refresh kubeconfig from the primary control-plane IP.
Bootstrap assumptions that matter: SSH user is ubuntu, NIC is ens18, API join endpoint is the kube-vip address.

Repo-Specific Gotchas

rancher-backup uses a postRenderer to swap the broken hook image to rancher/kubectl:v1.34.0; do not put S3 config in HelmRelease values. Put it in the Backup CR.
Tailscale cleanup only runs before service proxies exist; it removes stale offline rancher/grafana/prometheus/flux devices, then must stop so live proxies are not deleted.
Keep the Tailscale operator on the stable Helm repo https://pkgs.tailscale.com/helmcharts at 1.96.5 unless you have a reason to change it.
The repo no longer uses a cloud controller manager. If you see providerID or Hetzner-specific logic, it is stale.
Current private URLs:
- Rancher: https://rancher.silverside-gopher.ts.net/
- Grafana: http://grafana.silverside-gopher.ts.net/
- Prometheus: http://prometheus.silverside-gopher.ts.net:9090/

Secrets

Runtime secrets live in Doppler + External Secrets.
Bootstrap and CI secrets stay in Gitea; never commit secrets, kubeconfigs, or private keys.

3.4 KiB Raw Blame History