b1dae28aa5
Replace Hetzner infrastructure and cloud-provider assumptions with Proxmox VM clones, kube-vip API HA, and NFS-backed storage. Update bootstrap, Flux addons, CI workflows, and docs to target the new private Proxmox baseline while preserving the existing Tailscale, Doppler, Flux, Rancher, and B2 backup flows.
416 lines
12 KiB
Markdown
416 lines
12 KiB
Markdown
# Proxmox Kubernetes Cluster
|
|
|
|
Production-ready private Kubernetes cluster on Proxmox using Terraform, Ansible, and Flux.
|
|
|
|
## Architecture
|
|
|
|
| Component | Details |
|
|
|-----------|---------|
|
|
| **Control Plane** | 3x Proxmox VMs (2 vCPU / 4 GiB / 32 GiB) |
|
|
| **Workers** | 5x Proxmox VMs (4 vCPU / 8 GiB / 64 GiB) |
|
|
| **K8s** | k3s (latest, HA) |
|
|
| **Addons** | NFS provisioner + Prometheus + Grafana + Loki + Rancher |
|
|
| **Access** | SSH/API and private services restricted to Tailnet |
|
|
| **Bootstrap** | Terraform + Ansible + Flux |
|
|
|
|
## Prerequisites
|
|
|
|
### 1. Proxmox API Token
|
|
|
|
Create an API token for the Proxmox VE user used by Terraform. The repo expects the `bpg/proxmox` provider with:
|
|
|
|
- endpoint: `https://100.105.0.115:8006/`
|
|
- node: `flex`
|
|
- clone source: template `9000` (`ubuntu-2404-k8s-template`)
|
|
- auth: API token
|
|
|
|
### 2. Backblaze B2 Bucket (for Terraform State)
|
|
|
|
1. Go to [Backblaze B2](https://secure.backblaze.com/b2_buckets.htm)
|
|
2. Click **Create a Bucket**
|
|
3. Set bucket name: `k8s-terraform-state` (must be globally unique)
|
|
4. Choose **Private** access
|
|
5. Click **Create Bucket**
|
|
6. Create application key:
|
|
- Go to **App Keys** → **Add a New Application Key**
|
|
- Name: `terraform-state`
|
|
- Allow access to: `k8s-terraform-state` bucket only
|
|
- Type: **Read and Write**
|
|
- Copy **keyID** (access key) and **applicationKey** (secret key)
|
|
7. Note your bucket's S3 endpoint (e.g., `https://s3.eu-central-003.backblazeb2.com`)
|
|
|
|
### 3. SSH Key Pair
|
|
|
|
```bash
|
|
ssh-keygen -t ed25519 -C "k8s@proxmox" -f ~/.ssh/infra
|
|
```
|
|
|
|
### 4. Local Tools
|
|
|
|
- [Terraform](https://terraform.io/downloads) >= 1.0
|
|
- [Ansible](https://docs.ansible.com/ansible/latest/installation_guide/intro_installation.html) >= 2.9
|
|
- Python 3 with `jinja2` and `pyyaml`
|
|
|
|
## Setup
|
|
|
|
### 1. Clone Repository
|
|
|
|
```bash
|
|
git clone <your-gitea-repo>/HetznerTerra.git
|
|
cd HetznerTerra
|
|
```
|
|
|
|
### 2. Configure Variables
|
|
|
|
```bash
|
|
cp terraform.tfvars.example terraform.tfvars
|
|
```
|
|
|
|
Edit `terraform.tfvars`:
|
|
|
|
```hcl
|
|
proxmox_endpoint = "https://100.105.0.115:8006/"
|
|
proxmox_api_token_id = "terraform-prov@pve!k8s-cluster"
|
|
proxmox_api_token_secret = "your-proxmox-token-secret"
|
|
|
|
ssh_public_key = "~/.ssh/infra.pub"
|
|
ssh_private_key = "~/.ssh/infra"
|
|
|
|
s3_access_key = "your-backblaze-key-id"
|
|
s3_secret_key = "your-backblaze-application-key"
|
|
s3_endpoint = "https://s3.eu-central-003.backblazeb2.com"
|
|
s3_bucket = "k8s-terraform-state"
|
|
|
|
tailscale_auth_key = "tskey-auth-..."
|
|
tailscale_tailnet = "yourtailnet.ts.net"
|
|
|
|
kube_api_vip = "10.27.27.40"
|
|
```
|
|
|
|
### 3. Initialize Terraform
|
|
|
|
```bash
|
|
cd terraform
|
|
|
|
# Create backend config file (or use CLI args)
|
|
cat > backend.hcl << EOF
|
|
endpoint = "https://s3.eu-central-003.backblazeb2.com"
|
|
bucket = "k8s-terraform-state"
|
|
access_key = "your-backblaze-key-id"
|
|
secret_key = "your-backblaze-application-key"
|
|
skip_requesting_account_id = true
|
|
EOF
|
|
|
|
terraform init -backend-config=backend.hcl
|
|
```
|
|
|
|
### 4. Plan and Apply
|
|
|
|
```bash
|
|
terraform plan -var-file=../terraform.tfvars
|
|
terraform apply -var-file=../terraform.tfvars
|
|
```
|
|
|
|
### 5. Generate Ansible Inventory
|
|
|
|
```bash
|
|
cd ../ansible
|
|
python3 generate_inventory.py
|
|
```
|
|
|
|
### 6. Bootstrap Cluster
|
|
|
|
```bash
|
|
ansible-playbook site.yml
|
|
```
|
|
|
|
### 7. Get Kubeconfig
|
|
|
|
```bash
|
|
export KUBECONFIG=$(pwd)/outputs/kubeconfig
|
|
kubectl get nodes
|
|
```
|
|
|
|
Use `scripts/refresh-kubeconfig.sh <cp1-public-ip>` to refresh kubeconfig against the primary control-plane public IP after rebuilds.
|
|
|
|
## Gitea CI/CD
|
|
|
|
This repository includes Gitea workflows for:
|
|
|
|
- **deploy**: End-to-end Terraform + Ansible + Flux bootstrap + restore + health checks
|
|
- **destroy**: Cluster teardown with backup-aware cleanup
|
|
- **dashboards**: Fast workflow that updates Grafana datasources/dashboards only
|
|
|
|
### Required Gitea Secrets
|
|
|
|
Set these in your Gitea repository settings (**Settings** → **Secrets** → **Actions**):
|
|
|
|
| Secret | Description |
|
|
|--------|-------------|
|
|
| `PROXMOX_ENDPOINT` | Proxmox API endpoint (for example `https://100.105.0.115:8006/`) |
|
|
| `PROXMOX_API_TOKEN_ID` | Proxmox API token ID |
|
|
| `PROXMOX_API_TOKEN_SECRET` | Proxmox API token secret |
|
|
| `S3_ACCESS_KEY` | Backblaze B2 keyID |
|
|
| `S3_SECRET_KEY` | Backblaze B2 applicationKey |
|
|
| `S3_ENDPOINT` | Backblaze S3 endpoint (e.g., `https://s3.eu-central-003.backblazeb2.com`) |
|
|
| `S3_BUCKET` | S3 bucket name (e.g., `k8s-terraform-state`) |
|
|
| `TAILSCALE_AUTH_KEY` | Tailscale auth key for node bootstrap |
|
|
| `TAILSCALE_TAILNET` | Tailnet domain (e.g., `yourtailnet.ts.net`) |
|
|
| `TAILSCALE_OAUTH_CLIENT_ID` | Tailscale OAuth client ID for Kubernetes Operator |
|
|
| `TAILSCALE_OAUTH_CLIENT_SECRET` | Tailscale OAuth client secret for Kubernetes Operator |
|
|
| `DOPPLER_HETZNERTERRA_SERVICE_TOKEN` | Doppler service token for `hetznerterra` runtime secrets |
|
|
| `GRAFANA_ADMIN_PASSWORD` | Optional admin password for Grafana (auto-generated if unset) |
|
|
| `SSH_PUBLIC_KEY` | SSH public key content |
|
|
| `SSH_PRIVATE_KEY` | SSH private key content |
|
|
|
|
## GitOps (Flux)
|
|
|
|
This repo uses Flux for continuous reconciliation after Terraform + Ansible bootstrap.
|
|
|
|
### Stable private-only baseline
|
|
|
|
The current default target is the HA private baseline:
|
|
|
|
- `3` control plane nodes
|
|
- `5` worker nodes
|
|
- private Proxmox network only
|
|
- Tailscale for operator and service access
|
|
- Flux-managed platform addons with `apps` suspended by default
|
|
|
|
Detailed phase gates and success criteria live in `STABLE_BASELINE.md`.
|
|
|
|
This is the default until rebuilds are consistently green. High availability, public ingress, and app-layer expansion come later.
|
|
|
|
### Runtime secrets
|
|
|
|
Runtime cluster secrets are moving to Doppler + External Secrets Operator.
|
|
|
|
- Doppler project: `hetznerterra`
|
|
- Initial auth: service token via `DOPPLER_HETZNERTERRA_SERVICE_TOKEN`
|
|
- First synced secrets:
|
|
- `GRAFANA_ADMIN_PASSWORD`
|
|
|
|
Terraform/bootstrap secrets remain in Gitea Actions secrets and are not managed by Doppler.
|
|
|
|
### Repository layout
|
|
|
|
- `clusters/prod/`: cluster entrypoint and Flux reconciliation objects
|
|
- `clusters/prod/flux-system/`: `GitRepository` source and top-level `Kustomization` graph
|
|
- `infrastructure/`: infrastructure addon reconciliation graph
|
|
- `infrastructure/addons/*`: per-addon manifests for Flux-managed cluster addons
|
|
- `apps/`: application workload layer (currently scaffolded)
|
|
|
|
### Reconciliation graph
|
|
|
|
- `infrastructure` (top-level)
|
|
- `addon-nfs-storage`
|
|
- `addon-tailscale-operator`
|
|
- `addon-observability`
|
|
- `addon-observability-content` depends on `addon-observability`
|
|
- `apps` depends on `infrastructure`
|
|
|
|
### Bootstrap notes
|
|
|
|
1. Install Flux controllers in `flux-system`.
|
|
2. Create the Flux deploy key/secret named `flux-system` in `flux-system` namespace.
|
|
3. Apply `clusters/prod/flux-system/` once to establish source + reconciliation graph.
|
|
4. Bootstrap-only Ansible creates prerequisite secrets; Flux manages addon lifecycle after bootstrap.
|
|
|
|
### Current addon status
|
|
|
|
- Core infrastructure addons are Flux-managed from `infrastructure/addons/`.
|
|
- Active Flux addons for the current baseline: `addon-nfs-storage`, `addon-cert-manager`, `addon-external-secrets`, `addon-tailscale-operator`, `addon-tailscale-proxyclass`, `addon-observability`, `addon-observability-content`, `addon-rancher`, `addon-rancher-config`, `addon-rancher-backup`, `addon-rancher-backup-config`.
|
|
- `apps` remains suspended until workload rollout is explicitly enabled.
|
|
- Ansible is limited to cluster bootstrap, prerequisite secret creation, pre-proxy Tailscale cleanup, and kubeconfig finalization.
|
|
- Weave GitOps / Flux UI is no longer deployed; use Rancher or the `flux` CLI for Flux operations.
|
|
|
|
### Rancher access
|
|
|
|
- Rancher is private-only and exposed through Tailscale at `https://rancher.silverside-gopher.ts.net/`.
|
|
- Rancher and the Kubernetes API stay private; kube-vip provides the API VIP on the LAN.
|
|
- Rancher stores state in embedded etcd; no external database is used.
|
|
|
|
### Stable baseline acceptance
|
|
|
|
A rebuild is considered successful only when all of the following pass without manual intervention:
|
|
|
|
- Terraform create succeeds for the default `3` control planes and `5` workers.
|
|
- Ansible bootstrap succeeds end-to-end.
|
|
- All nodes become `Ready`.
|
|
- Flux core reconciliation is healthy.
|
|
- External Secrets Operator is ready.
|
|
- Tailscale operator is ready.
|
|
- Tailnet smoke checks pass for Rancher, Grafana, and Prometheus.
|
|
- Terraform destroy succeeds cleanly or succeeds after workflow retries.
|
|
|
|
## Observability Stack
|
|
|
|
Flux deploys a lightweight observability stack in the `observability` namespace:
|
|
|
|
- `kube-prometheus-stack` (Prometheus + Grafana)
|
|
- `loki`
|
|
- `promtail`
|
|
|
|
Grafana content is managed as code via ConfigMaps in `infrastructure/addons/observability-content/`.
|
|
|
|
Grafana and Prometheus are exposed through dedicated Tailscale LoadBalancer services when the Tailscale Kubernetes Operator is healthy.
|
|
|
|
### Access Grafana and Prometheus
|
|
|
|
Preferred private access:
|
|
|
|
- Grafana: `http://grafana.silverside-gopher.ts.net/`
|
|
- Prometheus: `http://prometheus.silverside-gopher.ts.net:9090/`
|
|
|
|
Fallback (port-forward from a tailnet-connected machine):
|
|
|
|
Run from a tailnet-connected machine:
|
|
|
|
```bash
|
|
export KUBECONFIG=$(pwd)/outputs/kubeconfig
|
|
|
|
kubectl -n observability port-forward svc/kube-prometheus-stack-grafana 3000:80
|
|
kubectl -n observability port-forward svc/kube-prometheus-stack-prometheus 9090:9090
|
|
```
|
|
|
|
Then open:
|
|
|
|
- Grafana: http://127.0.0.1:3000
|
|
- Prometheus: http://127.0.0.1:9090
|
|
|
|
Grafana user: `admin`
|
|
Grafana password: value of `GRAFANA_ADMIN_PASSWORD` secret (or the generated value shown by Ansible output)
|
|
|
|
### Verify Tailscale exposure
|
|
|
|
```bash
|
|
export KUBECONFIG=$(pwd)/outputs/kubeconfig
|
|
|
|
kubectl -n tailscale-system get pods
|
|
kubectl -n cattle-system get svc rancher-tailscale
|
|
kubectl -n observability get svc grafana-tailscale prometheus-tailscale
|
|
kubectl -n cattle-system describe svc rancher-tailscale | grep TailscaleProxyReady
|
|
kubectl -n observability describe svc grafana-tailscale | grep TailscaleProxyReady
|
|
kubectl -n observability describe svc prometheus-tailscale | grep TailscaleProxyReady
|
|
```
|
|
|
|
If `TailscaleProxyReady=False`, check:
|
|
|
|
```bash
|
|
kubectl -n tailscale-system logs deployment/operator --tail=100
|
|
```
|
|
|
|
Common cause: OAuth client missing tag/scopes permissions.
|
|
|
|
### Fast dashboard iteration workflow
|
|
|
|
Use the `Deploy Grafana Content` workflow when changing dashboard/data source templates.
|
|
It avoids full cluster provisioning and only applies Grafana content resources:
|
|
|
|
- `ansible/roles/observability-content/templates/grafana-datasources.yaml.j2`
|
|
- `ansible/roles/observability-content/templates/grafana-dashboard-k8s-overview.yaml.j2`
|
|
- `ansible/dashboards.yml`
|
|
|
|
## File Structure
|
|
|
|
```
|
|
.
|
|
├── terraform/
|
|
│ ├── main.tf
|
|
│ ├── variables.tf
|
|
│ ├── servers.tf
|
|
│ ├── outputs.tf
|
|
│ └── backend.tf
|
|
├── ansible/
|
|
│ ├── inventory.tmpl
|
|
│ ├── generate_inventory.py
|
|
│ ├── site.yml
|
|
│ ├── roles/
|
|
│ │ ├── common/
|
|
│ │ ├── k3s-server/
|
|
│ │ ├── k3s-agent/
|
|
│ │ ├── addon-secrets-bootstrap/
|
|
│ │ ├── observability-content/
|
|
│ │ └── observability/
|
|
│ └── ansible.cfg
|
|
├── .gitea/
|
|
│ └── workflows/
|
|
│ ├── terraform.yml
|
|
│ ├── ansible.yml
|
|
│ └── dashboards.yml
|
|
├── outputs/
|
|
├── terraform.tfvars.example
|
|
└── README.md
|
|
```
|
|
|
|
## Firewall Rules
|
|
|
|
This repo no longer manages cloud firewalls. Access control is expected to be handled on your LAN infrastructure and through Tailscale.
|
|
|
|
Important cluster-local ports still in use:
|
|
|
|
| Port | Source | Purpose |
|
|
|------|--------|---------|
|
|
| 22 | Admin hosts / CI | SSH |
|
|
| 6443 | 10.27.27.0/24 + VIP | Kubernetes API |
|
|
| 9345 | 10.27.27.0/24 | k3s Supervisor |
|
|
| 2379 | 10.27.27.0/24 | etcd Client |
|
|
| 2380 | 10.27.27.0/24 | etcd Peer |
|
|
| 8472/udp | 10.27.27.0/24 | Flannel VXLAN |
|
|
| 10250 | 10.27.27.0/24 | Kubelet |
|
|
|
|
## Operations
|
|
|
|
### Scale Workers
|
|
|
|
Edit `terraform.tfvars`:
|
|
|
|
```hcl
|
|
worker_count = 5
|
|
```
|
|
|
|
Then:
|
|
|
|
```bash
|
|
terraform apply
|
|
ansible-playbook site.yml
|
|
```
|
|
|
|
### Upgrade k3s
|
|
|
|
```bash
|
|
ansible-playbook site.yml -t upgrade
|
|
```
|
|
|
|
### Destroy Cluster
|
|
|
|
```bash
|
|
terraform destroy
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Check k3s Logs
|
|
|
|
```bash
|
|
ssh ubuntu@<control-plane-ip> sudo journalctl -u k3s -f
|
|
```
|
|
|
|
### Reset k3s
|
|
|
|
```bash
|
|
ansible-playbook site.yml -t reset
|
|
```
|
|
|
|
## Security Notes
|
|
|
|
- Control plane has HA (3 nodes, can survive 1 failure)
|
|
- Kubernetes API HA is provided by kube-vip on `10.27.27.40`
|
|
- Rotate API tokens regularly
|
|
- Use network policies in Kubernetes
|
|
- Enable audit logging for production
|
|
|
|
## License
|
|
|
|
MIT
|