HomeInfra/HetznerTerra

Fork 0

Files

MichaelFisher1997 f95e0051a5

Deploy Cluster / Terraform (push) Successful in 47s

Details

Deploy Cluster / Ansible (push) Successful in 9m45s

Details

feat: automate private tailnet access on cp1

2026-03-08 04:16:06 +00:00

11 KiB

Raw Blame History

Hetzner Kubernetes Cluster

Production-ready Kubernetes cluster on Hetzner Cloud using Terraform and Ansible.

Architecture

Component	Details
Control Plane	3x CX23 (HA)
Workers	4x CX33
Total Cost	€28.93/mo
K8s	k3s (latest, HA)
Addons	Hetzner CCM + CSI + Prometheus + Grafana + Loki
Access	SSH/API restricted to Tailnet
Bootstrap	Terraform + Ansible

Cluster Resources

22 vCPU total (6 CP + 16 workers)
44 GB RAM total (12 CP + 32 workers)
440 GB SSD storage
140 TB bandwidth allocation

Prerequisites

1. Hetzner Cloud API Token

Go to Hetzner Cloud Console
Select your project (or create a new one)
Navigate to Security → API Tokens
Click Generate API Token
Set description: k8s-cluster-terraform
Select permissions: Read & Write
Click Generate API Token
Copy the token immediately - it won't be shown again!

2. Backblaze B2 Bucket (for Terraform State)

Go to Backblaze B2
Click Create a Bucket
Set bucket name: k8s-terraform-state (must be globally unique)
Choose Private access
Click Create Bucket
Create application key:
- Go to App Keys → Add a New Application Key
- Name: terraform-state
- Allow access to: k8s-terraform-state bucket only
- Type: Read and Write
- Copy keyID (access key) and applicationKey (secret key)
Note your bucket's S3 endpoint (e.g., https://s3.eu-central-003.backblazeb2.com)

3. SSH Key Pair

ssh-keygen -t ed25519 -C "k8s@hetzner" -f ~/.ssh/hetzner_k8s

4. Local Tools

Terraform >= 1.0
Ansible >= 2.9
Python 3 with jinja2 and pyyaml

Setup

1. Clone Repository

git clone <your-gitea-repo>/HetznerTerra.git
cd HetznerTerra

2. Configure Variables

cp terraform.tfvars.example terraform.tfvars

Edit terraform.tfvars:

hcloud_token = "your-hetzner-api-token"

ssh_public_key  = "~/.ssh/hetzner_k8s.pub"
ssh_private_key = "~/.ssh/hetzner_k8s"

s3_access_key = "your-backblaze-key-id"
s3_secret_key = "your-backblaze-application-key"
s3_endpoint   = "https://s3.eu-central-003.backblazeb2.com"
s3_bucket     = "k8s-terraform-state"

tailscale_auth_key = "tskey-auth-..."
tailscale_tailnet  = "yourtailnet.ts.net"

restrict_api_ssh_to_tailnet = true
tailnet_cidr                = "100.64.0.0/10"
enable_nodeport_public      = false

allowed_ssh_ips = []
allowed_api_ips = []

3. Initialize Terraform

cd terraform

# Create backend config file (or use CLI args)
cat > backend.hcl << EOF
endpoint                    = "https://s3.eu-central-003.backblazeb2.com"
bucket                      = "k8s-terraform-state"
access_key                  = "your-backblaze-key-id"
secret_key                  = "your-backblaze-application-key"
skip_requesting_account_id  = true
EOF

terraform init -backend-config=backend.hcl

4. Plan and Apply

terraform plan -var-file=../terraform.tfvars
terraform apply -var-file=../terraform.tfvars

5. Generate Ansible Inventory

cd ../ansible
python3 generate_inventory.py

6. Bootstrap Cluster

ansible-playbook site.yml

7. Get Kubeconfig

export KUBECONFIG=$(pwd)/outputs/kubeconfig
kubectl get nodes

Kubeconfig endpoint is rewritten to the primary control-plane tailnet hostname (k8s-cluster-cp-1.<your-tailnet>).

Gitea CI/CD

This repository includes Gitea workflows for:

terraform-plan: Runs on PRs, shows planned changes
terraform-apply: Runs on main branch after merge
ansible-deploy: Runs after terraform apply
dashboards: Fast workflow that updates Grafana datasources/dashboards only

Required Gitea Secrets

Set these in your Gitea repository settings (Settings → Secrets → Actions):

Secret	Description
`HCLOUD_TOKEN`	Hetzner Cloud API token
`S3_ACCESS_KEY`	Backblaze B2 keyID
`S3_SECRET_KEY`	Backblaze B2 applicationKey
`S3_ENDPOINT`	Backblaze S3 endpoint (e.g., `https://s3.eu-central-003.backblazeb2.com`)
`S3_BUCKET`	S3 bucket name (e.g., `k8s-terraform-state`)
`TAILSCALE_AUTH_KEY`	Tailscale auth key for node bootstrap
`TAILSCALE_TAILNET`	Tailnet domain (e.g., `yourtailnet.ts.net`)
`TAILSCALE_OAUTH_CLIENT_ID`	Tailscale OAuth client ID for Kubernetes Operator
`TAILSCALE_OAUTH_CLIENT_SECRET`	Tailscale OAuth client secret for Kubernetes Operator
`GRAFANA_ADMIN_PASSWORD`	Optional admin password for Grafana (auto-generated if unset)
`RUNNER_ALLOWED_CIDRS`	Optional CIDR list for CI runner access if you choose to pass it via tfvars/secrets
`SSH_PUBLIC_KEY`	SSH public key content
`SSH_PRIVATE_KEY`	SSH private key content

GitOps (Flux)

This repo now includes a Flux GitOps layout for phased migration from imperative Ansible applies to continuous reconciliation.

Repository layout

clusters/prod/: cluster entrypoint and Flux reconciliation objects
clusters/prod/flux-system/: GitRepository source and top-level Kustomization graph
infrastructure/: infrastructure addon reconciliation graph
infrastructure/addons/*: per-addon manifests (observability + observability-content migrated)
apps/: application workload layer (currently scaffolded)

Reconciliation graph

infrastructure (top-level)
- addon-ccm
- addon-csi depends on addon-ccm
- addon-tailscale-operator
- addon-observability
- addon-observability-content depends on addon-observability
apps depends on infrastructure

Bootstrap notes

Install Flux controllers in flux-system.
Create the Flux deploy key/secret named flux-system in flux-system namespace.
Apply clusters/prod/flux-system/ once to establish source + reconciliation graph.
Unsuspend addon Kustomization objects one-by-one as each addon is migrated from Ansible.

Current migration status

addon-observability-content is now GitOps-managed from infrastructure/addons/observability-content/.
addon-observability is now GitOps-managed from infrastructure/addons/observability/ using Flux HelmRelease resources for:
- kube-prometheus-stack
- loki
- promtail
Remaining addons stay suspended until migrated.
During transition, avoid applying Grafana content from both Flux and Ansible at the same time.

Ansible site.yml now skips observability and observability-content roles by default when observability_gitops_enabled=true (default).

Observability Stack

Flux deploys a lightweight observability stack in the observability namespace:

kube-prometheus-stack (Prometheus + Grafana)
loki
promtail

Grafana content is managed as code via ConfigMaps in infrastructure/addons/observability-content/ (Flux), migrated from ansible/roles/observability-content/.

Grafana and Prometheus are exposed through a single Tailscale front door backed by Traefik when the Tailscale Kubernetes Operator is healthy.

Access Grafana and Prometheus

Preferred private access:

Grafana: http://k8s-cluster-cp-1.<your-tailnet>:30080/
Prometheus: http://k8s-cluster-cp-1.<your-tailnet>:30990/
Flux UI: http://k8s-cluster-cp-1.<your-tailnet>:30901/

This access path is bootstrapped automatically by Ansible on control_plane[0] using persistent kubectl port-forward systemd services plus tailscale serve, so it survives cluster rebuilds.

Fallback (port-forward from a tailnet-connected machine):

Run from a tailnet-connected machine:

export KUBECONFIG=$(pwd)/outputs/kubeconfig

kubectl -n observability port-forward svc/kube-prometheus-stack-grafana 3000:80
kubectl -n observability port-forward svc/kube-prometheus-stack-prometheus 9090:9090

Then open:

Grafana: http://127.0.0.1:3000
Prometheus: http://127.0.0.1:9090

Grafana user: admin Grafana password: value of GRAFANA_ADMIN_PASSWORD secret (or the generated value shown by Ansible output)

Verify Tailscale exposure

export KUBECONFIG=$(pwd)/outputs/kubeconfig

kubectl -n tailscale-system get pods
kubectl -n observability get svc kube-prometheus-stack-grafana kube-prometheus-stack-prometheus
kubectl -n observability describe svc kube-prometheus-stack-grafana | grep TailscaleProxyReady
kubectl -n observability describe svc kube-prometheus-stack-prometheus | grep TailscaleProxyReady

If TailscaleProxyReady=False, check:

kubectl -n tailscale-system logs deployment/operator --tail=100

Common cause: OAuth client missing tag/scopes permissions.

Fast dashboard iteration workflow

Use the Deploy Grafana Content workflow when changing dashboard/data source templates. It avoids full cluster provisioning and only applies Grafana content resources:

ansible/roles/observability-content/templates/grafana-datasources.yaml.j2
ansible/roles/observability-content/templates/grafana-dashboard-k8s-overview.yaml.j2
ansible/dashboards.yml

File Structure

.
├── terraform/
│   ├── main.tf
│   ├── variables.tf
│   ├── network.tf
│   ├── firewall.tf
│   ├── ssh.tf
│   ├── servers.tf
│   ├── outputs.tf
│   └── backend.tf
├── ansible/
│   ├── inventory.tmpl
│   ├── generate_inventory.py
│   ├── site.yml
│   ├── roles/
│   │   ├── common/
│   │   ├── k3s-server/
│   │   ├── k3s-agent/
│   │   ├── ccm/
│   │   ├── csi/
│   │   ├── tailscale-operator/
│   │   ├── observability-content/
│   │   └── observability/
│   └── ansible.cfg
├── .gitea/
│   └── workflows/
│       ├── terraform.yml
│       ├── ansible.yml
│       └── dashboards.yml
├── outputs/
├── terraform.tfvars.example
└── README.md

Firewall Rules

Port	Source	Purpose
22	Tailnet CIDR	SSH
6443	Tailnet CIDR + internal	Kubernetes API
41641/udp	Any	Tailscale WireGuard
9345	10.0.0.0/16	k3s Supervisor (HA join)
2379	10.0.0.0/16	etcd Client
2380	10.0.0.0/16	etcd Peer
8472	10.0.0.0/16	Flannel VXLAN
10250	10.0.0.0/16	Kubelet
30000-32767	Optional	NodePorts (disabled by default)

Operations

Scale Workers