feat: migrate cluster baseline from Hetzner to Proxmox
Replace Hetzner infrastructure and cloud-provider assumptions with Proxmox VM clones, kube-vip API HA, and NFS-backed storage. Update bootstrap, Flux addons, CI workflows, and docs to target the new private Proxmox baseline while preserving the existing Tailscale, Doppler, Flux, Rancher, and B2 backup flows.
This commit is contained in:
@@ -1,30 +1,28 @@
|
||||
# Hetzner Kubernetes Cluster
|
||||
# Proxmox Kubernetes Cluster
|
||||
|
||||
Production-ready Kubernetes cluster on Hetzner Cloud using Terraform and Ansible.
|
||||
Production-ready private Kubernetes cluster on Proxmox using Terraform, Ansible, and Flux.
|
||||
|
||||
## Architecture
|
||||
|
||||
| Component | Details |
|
||||
|-----------|---------|
|
||||
| **Control Plane** | 3x CX23 (HA) |
|
||||
| **Workers** | 3x CX33 |
|
||||
| **Control Plane** | 3x Proxmox VMs (2 vCPU / 4 GiB / 32 GiB) |
|
||||
| **Workers** | 5x Proxmox VMs (4 vCPU / 8 GiB / 64 GiB) |
|
||||
| **K8s** | k3s (latest, HA) |
|
||||
| **Addons** | Hetzner CCM + CSI + Prometheus + Grafana + Loki |
|
||||
| **Addons** | NFS provisioner + Prometheus + Grafana + Loki + Rancher |
|
||||
| **Access** | SSH/API and private services restricted to Tailnet |
|
||||
| **Bootstrap** | Terraform + Ansible + Flux |
|
||||
|
||||
## Prerequisites
|
||||
|
||||
### 1. Hetzner Cloud API Token
|
||||
### 1. Proxmox API Token
|
||||
|
||||
1. Go to [Hetzner Cloud Console](https://console.hetzner.com/)
|
||||
2. Select your project (or create a new one)
|
||||
3. Navigate to **Security** → **API Tokens**
|
||||
4. Click **Generate API Token**
|
||||
5. Set description: `k8s-cluster-terraform`
|
||||
6. Select permissions: **Read & Write**
|
||||
7. Click **Generate API Token**
|
||||
8. **Copy the token immediately** - it won't be shown again!
|
||||
Create an API token for the Proxmox VE user used by Terraform. The repo expects the `bpg/proxmox` provider with:
|
||||
|
||||
- endpoint: `https://100.105.0.115:8006/`
|
||||
- node: `flex`
|
||||
- clone source: template `9000` (`ubuntu-2404-k8s-template`)
|
||||
- auth: API token
|
||||
|
||||
### 2. Backblaze B2 Bucket (for Terraform State)
|
||||
|
||||
@@ -44,7 +42,7 @@ Production-ready Kubernetes cluster on Hetzner Cloud using Terraform and Ansible
|
||||
### 3. SSH Key Pair
|
||||
|
||||
```bash
|
||||
ssh-keygen -t ed25519 -C "k8s@hetzner" -f ~/.ssh/hetzner_k8s
|
||||
ssh-keygen -t ed25519 -C "k8s@proxmox" -f ~/.ssh/infra
|
||||
```
|
||||
|
||||
### 4. Local Tools
|
||||
@@ -71,10 +69,12 @@ cp terraform.tfvars.example terraform.tfvars
|
||||
Edit `terraform.tfvars`:
|
||||
|
||||
```hcl
|
||||
hcloud_token = "your-hetzner-api-token"
|
||||
proxmox_endpoint = "https://100.105.0.115:8006/"
|
||||
proxmox_api_token_id = "terraform-prov@pve!k8s-cluster"
|
||||
proxmox_api_token_secret = "your-proxmox-token-secret"
|
||||
|
||||
ssh_public_key = "~/.ssh/hetzner_k8s.pub"
|
||||
ssh_private_key = "~/.ssh/hetzner_k8s"
|
||||
ssh_public_key = "~/.ssh/infra.pub"
|
||||
ssh_private_key = "~/.ssh/infra"
|
||||
|
||||
s3_access_key = "your-backblaze-key-id"
|
||||
s3_secret_key = "your-backblaze-application-key"
|
||||
@@ -84,12 +84,7 @@ s3_bucket = "k8s-terraform-state"
|
||||
tailscale_auth_key = "tskey-auth-..."
|
||||
tailscale_tailnet = "yourtailnet.ts.net"
|
||||
|
||||
restrict_api_ssh_to_tailnet = true
|
||||
tailnet_cidr = "100.64.0.0/10"
|
||||
enable_nodeport_public = false
|
||||
|
||||
allowed_ssh_ips = []
|
||||
allowed_api_ips = []
|
||||
kube_api_vip = "10.27.27.40"
|
||||
```
|
||||
|
||||
### 3. Initialize Terraform
|
||||
@@ -152,7 +147,9 @@ Set these in your Gitea repository settings (**Settings** → **Secrets** → **
|
||||
|
||||
| Secret | Description |
|
||||
|--------|-------------|
|
||||
| `HCLOUD_TOKEN` | Hetzner Cloud API token |
|
||||
| `PROXMOX_ENDPOINT` | Proxmox API endpoint (for example `https://100.105.0.115:8006/`) |
|
||||
| `PROXMOX_API_TOKEN_ID` | Proxmox API token ID |
|
||||
| `PROXMOX_API_TOKEN_SECRET` | Proxmox API token secret |
|
||||
| `S3_ACCESS_KEY` | Backblaze B2 keyID |
|
||||
| `S3_SECRET_KEY` | Backblaze B2 applicationKey |
|
||||
| `S3_ENDPOINT` | Backblaze S3 endpoint (e.g., `https://s3.eu-central-003.backblazeb2.com`) |
|
||||
@@ -163,7 +160,6 @@ Set these in your Gitea repository settings (**Settings** → **Secrets** → **
|
||||
| `TAILSCALE_OAUTH_CLIENT_SECRET` | Tailscale OAuth client secret for Kubernetes Operator |
|
||||
| `DOPPLER_HETZNERTERRA_SERVICE_TOKEN` | Doppler service token for `hetznerterra` runtime secrets |
|
||||
| `GRAFANA_ADMIN_PASSWORD` | Optional admin password for Grafana (auto-generated if unset) |
|
||||
| `RUNNER_ALLOWED_CIDRS` | Optional CIDR list for CI runner access if you choose to pass it via tfvars/secrets |
|
||||
| `SSH_PUBLIC_KEY` | SSH public key content |
|
||||
| `SSH_PRIVATE_KEY` | SSH private key content |
|
||||
|
||||
@@ -176,8 +172,8 @@ This repo uses Flux for continuous reconciliation after Terraform + Ansible boot
|
||||
The current default target is the HA private baseline:
|
||||
|
||||
- `3` control plane nodes
|
||||
- `3` worker nodes
|
||||
- private Hetzner network only
|
||||
- `5` worker nodes
|
||||
- private Proxmox network only
|
||||
- Tailscale for operator and service access
|
||||
- Flux-managed platform addons with `apps` suspended by default
|
||||
|
||||
@@ -207,8 +203,7 @@ Terraform/bootstrap secrets remain in Gitea Actions secrets and are not managed
|
||||
### Reconciliation graph
|
||||
|
||||
- `infrastructure` (top-level)
|
||||
- `addon-ccm`
|
||||
- `addon-csi` depends on `addon-ccm`
|
||||
- `addon-nfs-storage`
|
||||
- `addon-tailscale-operator`
|
||||
- `addon-observability`
|
||||
- `addon-observability-content` depends on `addon-observability`
|
||||
@@ -224,7 +219,7 @@ Terraform/bootstrap secrets remain in Gitea Actions secrets and are not managed
|
||||
### Current addon status
|
||||
|
||||
- Core infrastructure addons are Flux-managed from `infrastructure/addons/`.
|
||||
- Active Flux addons for the current baseline: `addon-ccm`, `addon-csi`, `addon-cert-manager`, `addon-external-secrets`, `addon-tailscale-operator`, `addon-tailscale-proxyclass`, `addon-observability`, `addon-observability-content`, `addon-rancher`, `addon-rancher-config`, `addon-rancher-backup`, `addon-rancher-backup-config`.
|
||||
- Active Flux addons for the current baseline: `addon-nfs-storage`, `addon-cert-manager`, `addon-external-secrets`, `addon-tailscale-operator`, `addon-tailscale-proxyclass`, `addon-observability`, `addon-observability-content`, `addon-rancher`, `addon-rancher-config`, `addon-rancher-backup`, `addon-rancher-backup-config`.
|
||||
- `apps` remains suspended until workload rollout is explicitly enabled.
|
||||
- Ansible is limited to cluster bootstrap, prerequisite secret creation, pre-proxy Tailscale cleanup, and kubeconfig finalization.
|
||||
- Weave GitOps / Flux UI is no longer deployed; use Rancher or the `flux` CLI for Flux operations.
|
||||
@@ -232,14 +227,14 @@ Terraform/bootstrap secrets remain in Gitea Actions secrets and are not managed
|
||||
### Rancher access
|
||||
|
||||
- Rancher is private-only and exposed through Tailscale at `https://rancher.silverside-gopher.ts.net/`.
|
||||
- The public Hetzner load balancer path is not used for Rancher.
|
||||
- Rancher and the Kubernetes API stay private; kube-vip provides the API VIP on the LAN.
|
||||
- Rancher stores state in embedded etcd; no external database is used.
|
||||
|
||||
### Stable baseline acceptance
|
||||
|
||||
A rebuild is considered successful only when all of the following pass without manual intervention:
|
||||
|
||||
- Terraform create succeeds for the default `3` control planes and `3` workers.
|
||||
- Terraform create succeeds for the default `3` control planes and `5` workers.
|
||||
- Ansible bootstrap succeeds end-to-end.
|
||||
- All nodes become `Ready`.
|
||||
- Flux core reconciliation is healthy.
|
||||
@@ -323,9 +318,6 @@ It avoids full cluster provisioning and only applies Grafana content resources:
|
||||
├── terraform/
|
||||
│ ├── main.tf
|
||||
│ ├── variables.tf
|
||||
│ ├── network.tf
|
||||
│ ├── firewall.tf
|
||||
│ ├── ssh.tf
|
||||
│ ├── servers.tf
|
||||
│ ├── outputs.tf
|
||||
│ └── backend.tf
|
||||
@@ -353,17 +345,19 @@ It avoids full cluster provisioning and only applies Grafana content resources:
|
||||
|
||||
## Firewall Rules
|
||||
|
||||
This repo no longer manages cloud firewalls. Access control is expected to be handled on your LAN infrastructure and through Tailscale.
|
||||
|
||||
Important cluster-local ports still in use:
|
||||
|
||||
| Port | Source | Purpose |
|
||||
|------|--------|---------|
|
||||
| 22 | Tailnet CIDR | SSH |
|
||||
| 6443 | Tailnet CIDR + internal | Kubernetes API |
|
||||
| 41641/udp | Any | Tailscale WireGuard |
|
||||
| 9345 | 10.0.0.0/16 | k3s Supervisor (HA join) |
|
||||
| 2379 | 10.0.0.0/16 | etcd Client |
|
||||
| 2380 | 10.0.0.0/16 | etcd Peer |
|
||||
| 8472 | 10.0.0.0/16 | Flannel VXLAN |
|
||||
| 10250 | 10.0.0.0/16 | Kubelet |
|
||||
| 30000-32767 | Optional | NodePorts (disabled by default) |
|
||||
| 22 | Admin hosts / CI | SSH |
|
||||
| 6443 | 10.27.27.0/24 + VIP | Kubernetes API |
|
||||
| 9345 | 10.27.27.0/24 | k3s Supervisor |
|
||||
| 2379 | 10.27.27.0/24 | etcd Client |
|
||||
| 2380 | 10.27.27.0/24 | etcd Peer |
|
||||
| 8472/udp | 10.27.27.0/24 | Flannel VXLAN |
|
||||
| 10250 | 10.27.27.0/24 | Kubelet |
|
||||
|
||||
## Operations
|
||||
|
||||
@@ -399,7 +393,7 @@ terraform destroy
|
||||
### Check k3s Logs
|
||||
|
||||
```bash
|
||||
ssh root@<control-plane-ip> journalctl -u k3s -f
|
||||
ssh ubuntu@<control-plane-ip> sudo journalctl -u k3s -f
|
||||
```
|
||||
|
||||
### Reset k3s
|
||||
@@ -408,19 +402,10 @@ ssh root@<control-plane-ip> journalctl -u k3s -f
|
||||
ansible-playbook site.yml -t reset
|
||||
```
|
||||
|
||||
## Costs Breakdown
|
||||
|
||||
| Resource | Quantity | Unit Price | Monthly |
|
||||
|----------|----------|------------|---------|
|
||||
| CX23 (Control Plane) | 3 | €2.99 | €8.97 |
|
||||
| CX33 (Workers) | 4 | €4.99 | €19.96 |
|
||||
| Backblaze B2 | ~1 GB | Free (first 10GB) | €0.00 |
|
||||
| **Total** | | | **€28.93/mo** |
|
||||
|
||||
## Security Notes
|
||||
|
||||
- Control plane has HA (3 nodes, can survive 1 failure)
|
||||
- Consider adding Hetzner load balancer for API server
|
||||
- Kubernetes API HA is provided by kube-vip on `10.27.27.40`
|
||||
- Rotate API tokens regularly
|
||||
- Use network policies in Kubernetes
|
||||
- Enable audit logging for production
|
||||
|
||||
Reference in New Issue
Block a user