Files
HetznerTerra/README.md
MichaelFisher1997 b30977a158
Some checks failed
Deploy Cluster / Terraform (push) Successful in 45s
Deploy Cluster / Ansible (push) Has been cancelled
feat: deploy lightweight observability stack via Ansible
2026-03-02 01:33:41 +00:00

314 lines
7.6 KiB
Markdown

# Hetzner Kubernetes Cluster
Production-ready Kubernetes cluster on Hetzner Cloud using Terraform and Ansible.
## Architecture
| Component | Details |
|-----------|---------|
| **Control Plane** | 3x CX23 (HA) |
| **Workers** | 4x CX33 |
| **Total Cost** | €28.93/mo |
| **K8s** | k3s (latest, HA) |
| **Addons** | Hetzner CCM + CSI + Prometheus + Grafana + Loki |
| **Access** | SSH/API restricted to Tailnet |
| **Bootstrap** | Terraform + Ansible |
### Cluster Resources
- 22 vCPU total (6 CP + 16 workers)
- 44 GB RAM total (12 CP + 32 workers)
- 440 GB SSD storage
- 140 TB bandwidth allocation
## Prerequisites
### 1. Hetzner Cloud API Token
1. Go to [Hetzner Cloud Console](https://console.hetzner.com/)
2. Select your project (or create a new one)
3. Navigate to **Security****API Tokens**
4. Click **Generate API Token**
5. Set description: `k8s-cluster-terraform`
6. Select permissions: **Read & Write**
7. Click **Generate API Token**
8. **Copy the token immediately** - it won't be shown again!
### 2. Backblaze B2 Bucket (for Terraform State)
1. Go to [Backblaze B2](https://secure.backblaze.com/b2_buckets.htm)
2. Click **Create a Bucket**
3. Set bucket name: `k8s-terraform-state` (must be globally unique)
4. Choose **Private** access
5. Click **Create Bucket**
6. Create application key:
- Go to **App Keys****Add a New Application Key**
- Name: `terraform-state`
- Allow access to: `k8s-terraform-state` bucket only
- Type: **Read and Write**
- Copy **keyID** (access key) and **applicationKey** (secret key)
7. Note your bucket's S3 endpoint (e.g., `https://s3.eu-central-003.backblazeb2.com`)
### 3. SSH Key Pair
```bash
ssh-keygen -t ed25519 -C "k8s@hetzner" -f ~/.ssh/hetzner_k8s
```
### 4. Local Tools
- [Terraform](https://terraform.io/downloads) >= 1.0
- [Ansible](https://docs.ansible.com/ansible/latest/installation_guide/intro_installation.html) >= 2.9
- Python 3 with `jinja2` and `pyyaml`
## Setup
### 1. Clone Repository
```bash
git clone <your-gitea-repo>/HetznerTerra.git
cd HetznerTerra
```
### 2. Configure Variables
```bash
cp terraform.tfvars.example terraform.tfvars
```
Edit `terraform.tfvars`:
```hcl
hcloud_token = "your-hetzner-api-token"
ssh_public_key = "~/.ssh/hetzner_k8s.pub"
ssh_private_key = "~/.ssh/hetzner_k8s"
s3_access_key = "your-backblaze-key-id"
s3_secret_key = "your-backblaze-application-key"
s3_endpoint = "https://s3.eu-central-003.backblazeb2.com"
s3_bucket = "k8s-terraform-state"
tailscale_auth_key = "tskey-auth-..."
tailscale_tailnet = "yourtailnet.ts.net"
restrict_api_ssh_to_tailnet = true
tailnet_cidr = "100.64.0.0/10"
enable_nodeport_public = false
allowed_ssh_ips = []
allowed_api_ips = []
```
### 3. Initialize Terraform
```bash
cd terraform
# Create backend config file (or use CLI args)
cat > backend.hcl << EOF
endpoint = "https://s3.eu-central-003.backblazeb2.com"
bucket = "k8s-terraform-state"
access_key = "your-backblaze-key-id"
secret_key = "your-backblaze-application-key"
skip_requesting_account_id = true
EOF
terraform init -backend-config=backend.hcl
```
### 4. Plan and Apply
```bash
terraform plan -var-file=../terraform.tfvars
terraform apply -var-file=../terraform.tfvars
```
### 5. Generate Ansible Inventory
```bash
cd ../ansible
python3 generate_inventory.py
```
### 6. Bootstrap Cluster
```bash
ansible-playbook site.yml
```
### 7. Get Kubeconfig
```bash
export KUBECONFIG=$(pwd)/outputs/kubeconfig
kubectl get nodes
```
Kubeconfig endpoint is rewritten to the primary control-plane tailnet hostname (`k8s-cluster-cp-1.<your-tailnet>`).
## Gitea CI/CD
This repository includes Gitea workflows for:
- **terraform-plan**: Runs on PRs, shows planned changes
- **terraform-apply**: Runs on main branch after merge
- **ansible-deploy**: Runs after terraform apply
### Required Gitea Secrets
Set these in your Gitea repository settings (**Settings** → **Secrets****Actions**):
| Secret | Description |
|--------|-------------|
| `HCLOUD_TOKEN` | Hetzner Cloud API token |
| `S3_ACCESS_KEY` | Backblaze B2 keyID |
| `S3_SECRET_KEY` | Backblaze B2 applicationKey |
| `S3_ENDPOINT` | Backblaze S3 endpoint (e.g., `https://s3.eu-central-003.backblazeb2.com`) |
| `S3_BUCKET` | S3 bucket name (e.g., `k8s-terraform-state`) |
| `TAILSCALE_AUTH_KEY` | Tailscale auth key for node bootstrap |
| `TAILSCALE_TAILNET` | Tailnet domain (e.g., `yourtailnet.ts.net`) |
| `GRAFANA_ADMIN_PASSWORD` | Optional admin password for Grafana (auto-generated if unset) |
| `RUNNER_ALLOWED_CIDRS` | Optional CIDR list for CI runner access if you choose to pass it via tfvars/secrets |
| `SSH_PUBLIC_KEY` | SSH public key content |
| `SSH_PRIVATE_KEY` | SSH private key content |
## Observability Stack
The Ansible playbook deploys a lightweight observability stack in the `observability` namespace:
- `kube-prometheus-stack` (Prometheus + Grafana)
- `loki`
- `promtail`
Services are kept internal for tailnet-first access.
### Access Grafana and Prometheus
Run from a tailnet-connected machine:
```bash
export KUBECONFIG=$(pwd)/outputs/kubeconfig
kubectl -n observability port-forward svc/kube-prometheus-stack-grafana 3000:80
kubectl -n observability port-forward svc/kube-prometheus-stack-prometheus 9090:9090
```
Then open:
- Grafana: http://127.0.0.1:3000
- Prometheus: http://127.0.0.1:9090
Grafana user: `admin`
Grafana password: value of `GRAFANA_ADMIN_PASSWORD` secret (or the generated value shown by Ansible output)
## File Structure
```
.
├── terraform/
│ ├── main.tf
│ ├── variables.tf
│ ├── network.tf
│ ├── firewall.tf
│ ├── ssh.tf
│ ├── servers.tf
│ ├── outputs.tf
│ └── backend.tf
├── ansible/
│ ├── inventory.tmpl
│ ├── generate_inventory.py
│ ├── site.yml
│ ├── roles/
│ │ ├── common/
│ │ ├── k3s-server/
│ │ ├── k3s-agent/
│ │ ├── ccm/
│ │ ├── csi/
│ │ └── observability/
│ └── ansible.cfg
├── .gitea/
│ └── workflows/
│ ├── terraform.yml
│ └── ansible.yml
├── outputs/
├── terraform.tfvars.example
└── README.md
```
## Firewall Rules
| Port | Source | Purpose |
|------|--------|---------|
| 22 | Tailnet CIDR | SSH |
| 6443 | Tailnet CIDR + internal | Kubernetes API |
| 41641/udp | Any | Tailscale WireGuard |
| 9345 | 10.0.0.0/16 | k3s Supervisor (HA join) |
| 2379 | 10.0.0.0/16 | etcd Client |
| 2380 | 10.0.0.0/16 | etcd Peer |
| 8472 | 10.0.0.0/16 | Flannel VXLAN |
| 10250 | 10.0.0.0/16 | Kubelet |
| 30000-32767 | Optional | NodePorts (disabled by default) |
## Operations
### Scale Workers
Edit `terraform.tfvars`:
```hcl
worker_count = 5
```
Then:
```bash
terraform apply
ansible-playbook site.yml
```
### Upgrade k3s
```bash
ansible-playbook site.yml -t upgrade
```
### Destroy Cluster
```bash
terraform destroy
```
## Troubleshooting
### Check k3s Logs
```bash
ssh root@<control-plane-ip> journalctl -u k3s -f
```
### Reset k3s
```bash
ansible-playbook site.yml -t reset
```
## Costs Breakdown
| Resource | Quantity | Unit Price | Monthly |
|----------|----------|------------|---------|
| CX23 (Control Plane) | 3 | €2.99 | €8.97 |
| CX33 (Workers) | 4 | €4.99 | €19.96 |
| Backblaze B2 | ~1 GB | Free (first 10GB) | €0.00 |
| **Total** | | | **€28.93/mo** |
## Security Notes
- Control plane has HA (3 nodes, can survive 1 failure)
- Consider adding Hetzner load balancer for API server
- Rotate API tokens regularly
- Use network policies in Kubernetes
- Enable audit logging for production
## License
MIT