MichaelFisher1997 e47ec2a3e7
All checks were successful
Deploy Cluster / Terraform (push) Successful in 37s
Deploy Cluster / Ansible (push) Successful in 4m30s
Update Weave GitOps to v0.41.0 to support HelmRelease v2 API
Fixes error: 'no matches for kind HelmRelease in version v2beta1'

The cluster uses HelmRelease v2 API but Weave GitOps v0.38.0 was looking
for the old v2beta1 API. Updated image tag to v0.41.0 which supports
the newer API version.
2026-03-24 01:33:10 +00:00

Hetzner Kubernetes Cluster

Production-ready Kubernetes cluster on Hetzner Cloud using Terraform and Ansible.

Architecture

Component Details
Control Plane 3x CX23 (HA)
Workers 4x CX33
Total Cost €28.93/mo
K8s k3s (latest, HA)
Addons Hetzner CCM + CSI + Prometheus + Grafana + Loki
Access SSH/API restricted to Tailnet
Bootstrap Terraform + Ansible

Cluster Resources

  • 22 vCPU total (6 CP + 16 workers)
  • 44 GB RAM total (12 CP + 32 workers)
  • 440 GB SSD storage
  • 140 TB bandwidth allocation

Prerequisites

1. Hetzner Cloud API Token

  1. Go to Hetzner Cloud Console
  2. Select your project (or create a new one)
  3. Navigate to SecurityAPI Tokens
  4. Click Generate API Token
  5. Set description: k8s-cluster-terraform
  6. Select permissions: Read & Write
  7. Click Generate API Token
  8. Copy the token immediately - it won't be shown again!

2. Backblaze B2 Bucket (for Terraform State)

  1. Go to Backblaze B2
  2. Click Create a Bucket
  3. Set bucket name: k8s-terraform-state (must be globally unique)
  4. Choose Private access
  5. Click Create Bucket
  6. Create application key:
    • Go to App KeysAdd a New Application Key
    • Name: terraform-state
    • Allow access to: k8s-terraform-state bucket only
    • Type: Read and Write
    • Copy keyID (access key) and applicationKey (secret key)
  7. Note your bucket's S3 endpoint (e.g., https://s3.eu-central-003.backblazeb2.com)

3. SSH Key Pair

ssh-keygen -t ed25519 -C "k8s@hetzner" -f ~/.ssh/hetzner_k8s

4. Local Tools

Setup

1. Clone Repository

git clone <your-gitea-repo>/HetznerTerra.git
cd HetznerTerra

2. Configure Variables

cp terraform.tfvars.example terraform.tfvars

Edit terraform.tfvars:

hcloud_token = "your-hetzner-api-token"

ssh_public_key  = "~/.ssh/hetzner_k8s.pub"
ssh_private_key = "~/.ssh/hetzner_k8s"

s3_access_key = "your-backblaze-key-id"
s3_secret_key = "your-backblaze-application-key"
s3_endpoint   = "https://s3.eu-central-003.backblazeb2.com"
s3_bucket     = "k8s-terraform-state"

tailscale_auth_key = "tskey-auth-..."
tailscale_tailnet  = "yourtailnet.ts.net"

restrict_api_ssh_to_tailnet = true
tailnet_cidr                = "100.64.0.0/10"
enable_nodeport_public      = false

allowed_ssh_ips = []
allowed_api_ips = []

3. Initialize Terraform

cd terraform

# Create backend config file (or use CLI args)
cat > backend.hcl << EOF
endpoint                    = "https://s3.eu-central-003.backblazeb2.com"
bucket                      = "k8s-terraform-state"
access_key                  = "your-backblaze-key-id"
secret_key                  = "your-backblaze-application-key"
skip_requesting_account_id  = true
EOF

terraform init -backend-config=backend.hcl

4. Plan and Apply

terraform plan -var-file=../terraform.tfvars
terraform apply -var-file=../terraform.tfvars

5. Generate Ansible Inventory

cd ../ansible
python3 generate_inventory.py

6. Bootstrap Cluster

ansible-playbook site.yml

7. Get Kubeconfig

export KUBECONFIG=$(pwd)/outputs/kubeconfig
kubectl get nodes

Kubeconfig endpoint is rewritten to the primary control-plane tailnet hostname (k8s-cluster-cp-1.<your-tailnet>).

Gitea CI/CD

This repository includes Gitea workflows for:

  • terraform-plan: Runs on PRs, shows planned changes
  • terraform-apply: Runs on main branch after merge
  • ansible-deploy: Runs after terraform apply
  • dashboards: Fast workflow that updates Grafana datasources/dashboards only

Required Gitea Secrets

Set these in your Gitea repository settings (SettingsSecretsActions):

Secret Description
HCLOUD_TOKEN Hetzner Cloud API token
S3_ACCESS_KEY Backblaze B2 keyID
S3_SECRET_KEY Backblaze B2 applicationKey
S3_ENDPOINT Backblaze S3 endpoint (e.g., https://s3.eu-central-003.backblazeb2.com)
S3_BUCKET S3 bucket name (e.g., k8s-terraform-state)
TAILSCALE_AUTH_KEY Tailscale auth key for node bootstrap
TAILSCALE_TAILNET Tailnet domain (e.g., yourtailnet.ts.net)
TAILSCALE_OAUTH_CLIENT_ID Tailscale OAuth client ID for Kubernetes Operator
TAILSCALE_OAUTH_CLIENT_SECRET Tailscale OAuth client secret for Kubernetes Operator
DOPPLER_HETZNERTERRA_SERVICE_TOKEN Doppler service token for hetznerterra runtime secrets
GRAFANA_ADMIN_PASSWORD Optional admin password for Grafana (auto-generated if unset)
RUNNER_ALLOWED_CIDRS Optional CIDR list for CI runner access if you choose to pass it via tfvars/secrets
SSH_PUBLIC_KEY SSH public key content
SSH_PRIVATE_KEY SSH private key content

GitOps (Flux)

This repo uses Flux for continuous reconciliation after Terraform + Ansible bootstrap.

Stable private-only baseline

The current default target is a deliberately simplified baseline:

  • 1 control plane node
  • 2 worker nodes
  • private Hetzner network only
  • Tailscale for operator access
  • Flux-managed core addons only

Detailed phase gates and success criteria live in STABLE_BASELINE.md.

This is the default until rebuilds are consistently green. High availability, public ingress, and app-layer expansion come later.

Runtime secrets

Runtime cluster secrets are moving to Doppler + External Secrets Operator.

  • Doppler project: hetznerterra
  • Initial auth: service token via DOPPLER_HETZNERTERRA_SERVICE_TOKEN
  • First synced secrets:
    • GRAFANA_ADMIN_PASSWORD
    • WEAVE_GITOPS_ADMIN_USERNAME
    • WEAVE_GITOPS_ADMIN_PASSWORD_BCRYPT_HASH

Terraform/bootstrap secrets remain in Gitea Actions secrets and are not managed by Doppler.

Repository layout

  • clusters/prod/: cluster entrypoint and Flux reconciliation objects
  • clusters/prod/flux-system/: GitRepository source and top-level Kustomization graph
  • infrastructure/: infrastructure addon reconciliation graph
  • infrastructure/addons/*: per-addon manifests for Flux-managed cluster addons
  • apps/: application workload layer (currently scaffolded)

Reconciliation graph

  • infrastructure (top-level)
    • addon-ccm
    • addon-csi depends on addon-ccm
    • addon-tailscale-operator
    • addon-observability
    • addon-observability-content depends on addon-observability
  • apps depends on infrastructure

Bootstrap notes

  1. Install Flux controllers in flux-system.
  2. Create the Flux deploy key/secret named flux-system in flux-system namespace.
  3. Apply clusters/prod/flux-system/ once to establish source + reconciliation graph.
  4. Bootstrap-only Ansible creates prerequisite secrets; Flux manages addon lifecycle after bootstrap.

Current addon status

  • Core infrastructure addons are Flux-managed from infrastructure/addons/.
  • Active Flux addons for stable baseline: addon-tailscale-operator, addon-tailscale-proxyclass, addon-external-secrets.
  • Deferred addons: addon-ccm, addon-csi, addon-observability, addon-observability-content (to be added after baseline is stable).
  • Ansible is limited to cluster bootstrap, private-access setup, and prerequisite secret creation for Flux-managed addons.
  • addon-flux-ui is optional for the stable-baseline phase and is not a blocker for rebuild success.

Stable baseline acceptance

A rebuild is considered successful only when all of the following pass without manual intervention:

  • Terraform create succeeds for the default 1 control plane and 2 workers.
  • Ansible bootstrap succeeds end-to-end.
  • All nodes become Ready.
  • Flux core reconciliation is healthy.
  • External Secrets Operator is ready.
  • Tailscale operator is ready.
  • Terraform destroy succeeds cleanly or succeeds after workflow retries.

Note: Observability stack (Grafana/Prometheus) is deferred and will be added once the core platform baseline is stable.

Observability Stack

Flux deploys a lightweight observability stack in the observability namespace:

  • kube-prometheus-stack (Prometheus + Grafana)
  • loki
  • promtail

Grafana content is managed as code via ConfigMaps in infrastructure/addons/observability-content/.

Grafana and Prometheus are exposed through a single Tailscale front door backed by Traefik when the Tailscale Kubernetes Operator is healthy.

Access Grafana and Prometheus

Preferred private access:

  • Grafana: http://k8s-cluster-cp-1.<your-tailnet>:30080/
  • Prometheus: http://k8s-cluster-cp-1.<your-tailnet>:30990/
  • Flux UI: http://k8s-cluster-cp-1.<your-tailnet>:30901/

This access path is bootstrapped automatically by Ansible on control_plane[0] using persistent kubectl port-forward systemd services plus tailscale serve, so it survives cluster rebuilds.

Fallback (port-forward from a tailnet-connected machine):

Run from a tailnet-connected machine:

export KUBECONFIG=$(pwd)/outputs/kubeconfig

kubectl -n observability port-forward svc/kube-prometheus-stack-grafana 3000:80
kubectl -n observability port-forward svc/kube-prometheus-stack-prometheus 9090:9090

Then open:

Grafana user: admin Grafana password: value of GRAFANA_ADMIN_PASSWORD secret (or the generated value shown by Ansible output)

Verify Tailscale exposure

export KUBECONFIG=$(pwd)/outputs/kubeconfig

kubectl -n tailscale-system get pods
kubectl -n observability get svc kube-prometheus-stack-grafana kube-prometheus-stack-prometheus
kubectl -n observability describe svc kube-prometheus-stack-grafana | grep TailscaleProxyReady
kubectl -n observability describe svc kube-prometheus-stack-prometheus | grep TailscaleProxyReady

If TailscaleProxyReady=False, check:

kubectl -n tailscale-system logs deployment/operator --tail=100

Common cause: OAuth client missing tag/scopes permissions.

Fast dashboard iteration workflow

Use the Deploy Grafana Content workflow when changing dashboard/data source templates. It avoids full cluster provisioning and only applies Grafana content resources:

  • ansible/roles/observability-content/templates/grafana-datasources.yaml.j2
  • ansible/roles/observability-content/templates/grafana-dashboard-k8s-overview.yaml.j2
  • ansible/dashboards.yml

File Structure

.
├── terraform/
│   ├── main.tf
│   ├── variables.tf
│   ├── network.tf
│   ├── firewall.tf
│   ├── ssh.tf
│   ├── servers.tf
│   ├── outputs.tf
│   └── backend.tf
├── ansible/
│   ├── inventory.tmpl
│   ├── generate_inventory.py
│   ├── site.yml
│   ├── roles/
│   │   ├── common/
│   │   ├── k3s-server/
│   │   ├── k3s-agent/
│   │   ├── addon-secrets-bootstrap/
│   │   ├── observability-content/
│   │   └── observability/
│   └── ansible.cfg
├── .gitea/
│   └── workflows/
│       ├── terraform.yml
│       ├── ansible.yml
│       └── dashboards.yml
├── outputs/
├── terraform.tfvars.example
└── README.md

Firewall Rules

Port Source Purpose
22 Tailnet CIDR SSH
6443 Tailnet CIDR + internal Kubernetes API
41641/udp Any Tailscale WireGuard
9345 10.0.0.0/16 k3s Supervisor (HA join)
2379 10.0.0.0/16 etcd Client
2380 10.0.0.0/16 etcd Peer
8472 10.0.0.0/16 Flannel VXLAN
10250 10.0.0.0/16 Kubelet
30000-32767 Optional NodePorts (disabled by default)

Operations

Scale Workers

Edit terraform.tfvars:

worker_count = 5

Then:

terraform apply
ansible-playbook site.yml

Upgrade k3s

ansible-playbook site.yml -t upgrade

Destroy Cluster

terraform destroy

Troubleshooting

Check k3s Logs

ssh root@<control-plane-ip> journalctl -u k3s -f

Reset k3s

ansible-playbook site.yml -t reset

Costs Breakdown

Resource Quantity Unit Price Monthly
CX23 (Control Plane) 3 €2.99 €8.97
CX33 (Workers) 4 €4.99 €19.96
Backblaze B2 ~1 GB Free (first 10GB) €0.00
Total €28.93/mo

Security Notes

  • Control plane has HA (3 nodes, can survive 1 failure)
  • Consider adding Hetzner load balancer for API server
  • Rotate API tokens regularly
  • Use network policies in Kubernetes
  • Enable audit logging for production

License

MIT

Description
No description provided
Readme 968 KiB
Languages
HCL 55%
Jinja 29.8%
Python 10%
Shell 5.2%