Fixes error: 'no matches for kind HelmRelease in version v2beta1' The cluster uses HelmRelease v2 API but Weave GitOps v0.38.0 was looking for the old v2beta1 API. Updated image tag to v0.41.0 which supports the newer API version.
Hetzner Kubernetes Cluster
Production-ready Kubernetes cluster on Hetzner Cloud using Terraform and Ansible.
Architecture
| Component | Details |
|---|---|
| Control Plane | 3x CX23 (HA) |
| Workers | 4x CX33 |
| Total Cost | €28.93/mo |
| K8s | k3s (latest, HA) |
| Addons | Hetzner CCM + CSI + Prometheus + Grafana + Loki |
| Access | SSH/API restricted to Tailnet |
| Bootstrap | Terraform + Ansible |
Cluster Resources
- 22 vCPU total (6 CP + 16 workers)
- 44 GB RAM total (12 CP + 32 workers)
- 440 GB SSD storage
- 140 TB bandwidth allocation
Prerequisites
1. Hetzner Cloud API Token
- Go to Hetzner Cloud Console
- Select your project (or create a new one)
- Navigate to Security → API Tokens
- Click Generate API Token
- Set description:
k8s-cluster-terraform - Select permissions: Read & Write
- Click Generate API Token
- Copy the token immediately - it won't be shown again!
2. Backblaze B2 Bucket (for Terraform State)
- Go to Backblaze B2
- Click Create a Bucket
- Set bucket name:
k8s-terraform-state(must be globally unique) - Choose Private access
- Click Create Bucket
- Create application key:
- Go to App Keys → Add a New Application Key
- Name:
terraform-state - Allow access to:
k8s-terraform-statebucket only - Type: Read and Write
- Copy keyID (access key) and applicationKey (secret key)
- Note your bucket's S3 endpoint (e.g.,
https://s3.eu-central-003.backblazeb2.com)
3. SSH Key Pair
ssh-keygen -t ed25519 -C "k8s@hetzner" -f ~/.ssh/hetzner_k8s
4. Local Tools
Setup
1. Clone Repository
git clone <your-gitea-repo>/HetznerTerra.git
cd HetznerTerra
2. Configure Variables
cp terraform.tfvars.example terraform.tfvars
Edit terraform.tfvars:
hcloud_token = "your-hetzner-api-token"
ssh_public_key = "~/.ssh/hetzner_k8s.pub"
ssh_private_key = "~/.ssh/hetzner_k8s"
s3_access_key = "your-backblaze-key-id"
s3_secret_key = "your-backblaze-application-key"
s3_endpoint = "https://s3.eu-central-003.backblazeb2.com"
s3_bucket = "k8s-terraform-state"
tailscale_auth_key = "tskey-auth-..."
tailscale_tailnet = "yourtailnet.ts.net"
restrict_api_ssh_to_tailnet = true
tailnet_cidr = "100.64.0.0/10"
enable_nodeport_public = false
allowed_ssh_ips = []
allowed_api_ips = []
3. Initialize Terraform
cd terraform
# Create backend config file (or use CLI args)
cat > backend.hcl << EOF
endpoint = "https://s3.eu-central-003.backblazeb2.com"
bucket = "k8s-terraform-state"
access_key = "your-backblaze-key-id"
secret_key = "your-backblaze-application-key"
skip_requesting_account_id = true
EOF
terraform init -backend-config=backend.hcl
4. Plan and Apply
terraform plan -var-file=../terraform.tfvars
terraform apply -var-file=../terraform.tfvars
5. Generate Ansible Inventory
cd ../ansible
python3 generate_inventory.py
6. Bootstrap Cluster
ansible-playbook site.yml
7. Get Kubeconfig
export KUBECONFIG=$(pwd)/outputs/kubeconfig
kubectl get nodes
Kubeconfig endpoint is rewritten to the primary control-plane tailnet hostname (k8s-cluster-cp-1.<your-tailnet>).
Gitea CI/CD
This repository includes Gitea workflows for:
- terraform-plan: Runs on PRs, shows planned changes
- terraform-apply: Runs on main branch after merge
- ansible-deploy: Runs after terraform apply
- dashboards: Fast workflow that updates Grafana datasources/dashboards only
Required Gitea Secrets
Set these in your Gitea repository settings (Settings → Secrets → Actions):
| Secret | Description |
|---|---|
HCLOUD_TOKEN |
Hetzner Cloud API token |
S3_ACCESS_KEY |
Backblaze B2 keyID |
S3_SECRET_KEY |
Backblaze B2 applicationKey |
S3_ENDPOINT |
Backblaze S3 endpoint (e.g., https://s3.eu-central-003.backblazeb2.com) |
S3_BUCKET |
S3 bucket name (e.g., k8s-terraform-state) |
TAILSCALE_AUTH_KEY |
Tailscale auth key for node bootstrap |
TAILSCALE_TAILNET |
Tailnet domain (e.g., yourtailnet.ts.net) |
TAILSCALE_OAUTH_CLIENT_ID |
Tailscale OAuth client ID for Kubernetes Operator |
TAILSCALE_OAUTH_CLIENT_SECRET |
Tailscale OAuth client secret for Kubernetes Operator |
DOPPLER_HETZNERTERRA_SERVICE_TOKEN |
Doppler service token for hetznerterra runtime secrets |
GRAFANA_ADMIN_PASSWORD |
Optional admin password for Grafana (auto-generated if unset) |
RUNNER_ALLOWED_CIDRS |
Optional CIDR list for CI runner access if you choose to pass it via tfvars/secrets |
SSH_PUBLIC_KEY |
SSH public key content |
SSH_PRIVATE_KEY |
SSH private key content |
GitOps (Flux)
This repo uses Flux for continuous reconciliation after Terraform + Ansible bootstrap.
Stable private-only baseline
The current default target is a deliberately simplified baseline:
1control plane node2worker nodes- private Hetzner network only
- Tailscale for operator access
- Flux-managed core addons only
Detailed phase gates and success criteria live in STABLE_BASELINE.md.
This is the default until rebuilds are consistently green. High availability, public ingress, and app-layer expansion come later.
Runtime secrets
Runtime cluster secrets are moving to Doppler + External Secrets Operator.
- Doppler project:
hetznerterra - Initial auth: service token via
DOPPLER_HETZNERTERRA_SERVICE_TOKEN - First synced secrets:
GRAFANA_ADMIN_PASSWORDWEAVE_GITOPS_ADMIN_USERNAMEWEAVE_GITOPS_ADMIN_PASSWORD_BCRYPT_HASH
Terraform/bootstrap secrets remain in Gitea Actions secrets and are not managed by Doppler.
Repository layout
clusters/prod/: cluster entrypoint and Flux reconciliation objectsclusters/prod/flux-system/:GitRepositorysource and top-levelKustomizationgraphinfrastructure/: infrastructure addon reconciliation graphinfrastructure/addons/*: per-addon manifests for Flux-managed cluster addonsapps/: application workload layer (currently scaffolded)
Reconciliation graph
infrastructure(top-level)addon-ccmaddon-csidepends onaddon-ccmaddon-tailscale-operatoraddon-observabilityaddon-observability-contentdepends onaddon-observability
appsdepends oninfrastructure
Bootstrap notes
- Install Flux controllers in
flux-system. - Create the Flux deploy key/secret named
flux-systeminflux-systemnamespace. - Apply
clusters/prod/flux-system/once to establish source + reconciliation graph. - Bootstrap-only Ansible creates prerequisite secrets; Flux manages addon lifecycle after bootstrap.
Current addon status
- Core infrastructure addons are Flux-managed from
infrastructure/addons/. - Active Flux addons for stable baseline:
addon-tailscale-operator,addon-tailscale-proxyclass,addon-external-secrets. - Deferred addons:
addon-ccm,addon-csi,addon-observability,addon-observability-content(to be added after baseline is stable). - Ansible is limited to cluster bootstrap, private-access setup, and prerequisite secret creation for Flux-managed addons.
addon-flux-uiis optional for the stable-baseline phase and is not a blocker for rebuild success.
Stable baseline acceptance
A rebuild is considered successful only when all of the following pass without manual intervention:
- Terraform create succeeds for the default
1control plane and2workers. - Ansible bootstrap succeeds end-to-end.
- All nodes become
Ready. - Flux core reconciliation is healthy.
- External Secrets Operator is ready.
- Tailscale operator is ready.
- Terraform destroy succeeds cleanly or succeeds after workflow retries.
Note: Observability stack (Grafana/Prometheus) is deferred and will be added once the core platform baseline is stable.
Observability Stack
Flux deploys a lightweight observability stack in the observability namespace:
kube-prometheus-stack(Prometheus + Grafana)lokipromtail
Grafana content is managed as code via ConfigMaps in infrastructure/addons/observability-content/.
Grafana and Prometheus are exposed through a single Tailscale front door backed by Traefik when the Tailscale Kubernetes Operator is healthy.
Access Grafana and Prometheus
Preferred private access:
- Grafana:
http://k8s-cluster-cp-1.<your-tailnet>:30080/ - Prometheus:
http://k8s-cluster-cp-1.<your-tailnet>:30990/ - Flux UI:
http://k8s-cluster-cp-1.<your-tailnet>:30901/
This access path is bootstrapped automatically by Ansible on control_plane[0] using persistent kubectl port-forward systemd services plus tailscale serve, so it survives cluster rebuilds.
Fallback (port-forward from a tailnet-connected machine):
Run from a tailnet-connected machine:
export KUBECONFIG=$(pwd)/outputs/kubeconfig
kubectl -n observability port-forward svc/kube-prometheus-stack-grafana 3000:80
kubectl -n observability port-forward svc/kube-prometheus-stack-prometheus 9090:9090
Then open:
- Grafana: http://127.0.0.1:3000
- Prometheus: http://127.0.0.1:9090
Grafana user: admin
Grafana password: value of GRAFANA_ADMIN_PASSWORD secret (or the generated value shown by Ansible output)
Verify Tailscale exposure
export KUBECONFIG=$(pwd)/outputs/kubeconfig
kubectl -n tailscale-system get pods
kubectl -n observability get svc kube-prometheus-stack-grafana kube-prometheus-stack-prometheus
kubectl -n observability describe svc kube-prometheus-stack-grafana | grep TailscaleProxyReady
kubectl -n observability describe svc kube-prometheus-stack-prometheus | grep TailscaleProxyReady
If TailscaleProxyReady=False, check:
kubectl -n tailscale-system logs deployment/operator --tail=100
Common cause: OAuth client missing tag/scopes permissions.
Fast dashboard iteration workflow
Use the Deploy Grafana Content workflow when changing dashboard/data source templates.
It avoids full cluster provisioning and only applies Grafana content resources:
ansible/roles/observability-content/templates/grafana-datasources.yaml.j2ansible/roles/observability-content/templates/grafana-dashboard-k8s-overview.yaml.j2ansible/dashboards.yml
File Structure
.
├── terraform/
│ ├── main.tf
│ ├── variables.tf
│ ├── network.tf
│ ├── firewall.tf
│ ├── ssh.tf
│ ├── servers.tf
│ ├── outputs.tf
│ └── backend.tf
├── ansible/
│ ├── inventory.tmpl
│ ├── generate_inventory.py
│ ├── site.yml
│ ├── roles/
│ │ ├── common/
│ │ ├── k3s-server/
│ │ ├── k3s-agent/
│ │ ├── addon-secrets-bootstrap/
│ │ ├── observability-content/
│ │ └── observability/
│ └── ansible.cfg
├── .gitea/
│ └── workflows/
│ ├── terraform.yml
│ ├── ansible.yml
│ └── dashboards.yml
├── outputs/
├── terraform.tfvars.example
└── README.md
Firewall Rules
| Port | Source | Purpose |
|---|---|---|
| 22 | Tailnet CIDR | SSH |
| 6443 | Tailnet CIDR + internal | Kubernetes API |
| 41641/udp | Any | Tailscale WireGuard |
| 9345 | 10.0.0.0/16 | k3s Supervisor (HA join) |
| 2379 | 10.0.0.0/16 | etcd Client |
| 2380 | 10.0.0.0/16 | etcd Peer |
| 8472 | 10.0.0.0/16 | Flannel VXLAN |
| 10250 | 10.0.0.0/16 | Kubelet |
| 30000-32767 | Optional | NodePorts (disabled by default) |
Operations
Scale Workers
Edit terraform.tfvars:
worker_count = 5
Then:
terraform apply
ansible-playbook site.yml
Upgrade k3s
ansible-playbook site.yml -t upgrade
Destroy Cluster
terraform destroy
Troubleshooting
Check k3s Logs
ssh root@<control-plane-ip> journalctl -u k3s -f
Reset k3s
ansible-playbook site.yml -t reset
Costs Breakdown
| Resource | Quantity | Unit Price | Monthly |
|---|---|---|---|
| CX23 (Control Plane) | 3 | €2.99 | €8.97 |
| CX33 (Workers) | 4 | €4.99 | €19.96 |
| Backblaze B2 | ~1 GB | Free (first 10GB) | €0.00 |
| Total | €28.93/mo |
Security Notes
- Control plane has HA (3 nodes, can survive 1 failure)
- Consider adding Hetzner load balancer for API server
- Rotate API tokens regularly
- Use network policies in Kubernetes
- Enable audit logging for production
License
MIT