# Proxmox Kubernetes Cluster

Production-ready private Kubernetes cluster on Proxmox using Terraform, Ansible, and Flux.

## Architecture

| Component | Details |
|-----------|---------|
| **Control Plane** | 3x Proxmox VMs (2 vCPU / 4 GiB / 32 GiB) |
| **Workers** | 5x Proxmox VMs (4 vCPU / 8 GiB / 64 GiB) |
| **K8s** | k3s (latest, HA) |
| **Addons** | NFS provisioner + Prometheus + Grafana + Loki + Rancher |
| **Access** | SSH/API and private services restricted to Tailnet |
| **Bootstrap** | Terraform + Ansible + Flux |

## Prerequisites

### 1. Proxmox API Token

Create an API token for the Proxmox VE user used by Terraform. The repo expects the `bpg/proxmox` provider with:

- endpoint: `https://100.105.0.115:8006/`
- node: `flex`
- clone source: template `9000` (`ubuntu-2404-k8s-template`)
- auth: API token

### 2. Backblaze B2 Bucket (for Terraform State)

1. Go to [Backblaze B2](https://secure.backblaze.com/b2_buckets.htm)
2. Click **Create a Bucket**
3. Set bucket name: `k8s-terraform-state` (must be globally unique)
4. Choose **Private** access
5. Click **Create Bucket**
6. Create application key:
   - Go to **App Keys** → **Add a New Application Key**
   - Name: `terraform-state`
   - Allow access to: `k8s-terraform-state` bucket only
   - Type: **Read and Write**
   - Copy **keyID** (access key) and **applicationKey** (secret key)
7. Note your bucket's S3 endpoint (e.g., `https://s3.eu-central-003.backblazeb2.com`)

### 3. SSH Key Pair

```bash
ssh-keygen -t ed25519 -C "k8s@proxmox" -f ~/.ssh/infra
```

### 4. Local Tools

- [Terraform](https://terraform.io/downloads) >= 1.0
- [Ansible](https://docs.ansible.com/ansible/latest/installation_guide/intro_installation.html) >= 2.9
- Python 3 with `jinja2` and `pyyaml`

## Setup

### 1. Clone Repository

```bash
git clone <your-gitea-repo>/HetznerTerra.git
cd HetznerTerra
```

### 2. Configure Variables

```bash
cp terraform.tfvars.example terraform.tfvars
```

Edit `terraform.tfvars`:

```hcl
 proxmox_endpoint         = "https://100.105.0.115:8006/"
 proxmox_api_token_id     = "terraform-prov@pve!k8s-cluster"
 proxmox_api_token_secret = "your-proxmox-token-secret"

 ssh_public_key  = "~/.ssh/infra.pub"
 ssh_private_key = "~/.ssh/infra"

s3_access_key = "your-backblaze-key-id"
s3_secret_key = "your-backblaze-application-key"
s3_endpoint   = "https://s3.eu-central-003.backblazeb2.com"
s3_bucket     = "k8s-terraform-state"

tailscale_auth_key = "tskey-auth-..."
tailscale_tailnet  = "yourtailnet.ts.net"

 kube_api_vip = "10.27.27.40"
```

### 3. Initialize Terraform

```bash
cd terraform

# Create backend config file (or use CLI args)
cat > backend.hcl << EOF
endpoint                    = "https://s3.eu-central-003.backblazeb2.com"
bucket                      = "k8s-terraform-state"
access_key                  = "your-backblaze-key-id"
secret_key                  = "your-backblaze-application-key"
skip_requesting_account_id  = true
EOF

terraform init -backend-config=backend.hcl
```

### 4. Plan and Apply

```bash
terraform plan -var-file=../terraform.tfvars
terraform apply -var-file=../terraform.tfvars
```

### 5. Generate Ansible Inventory

```bash
cd ../ansible
python3 generate_inventory.py
```

### 6. Bootstrap Cluster

```bash
ansible-playbook site.yml
```

### 7. Get Kubeconfig

```bash
export KUBECONFIG=$(pwd)/outputs/kubeconfig
kubectl get nodes
```

Use `scripts/refresh-kubeconfig.sh <cp1-public-ip>` to refresh kubeconfig against the primary control-plane public IP after rebuilds.

## Gitea CI/CD

This repository includes Gitea workflows for:

- **deploy**: End-to-end Terraform + Ansible + Flux bootstrap + restore + health checks
- **destroy**: Cluster teardown with backup-aware cleanup
- **dashboards**: Fast workflow that updates Grafana datasources/dashboards only

### Required Gitea Secrets

Set these in your Gitea repository settings (**Settings** → **Secrets** → **Actions**):

| Secret | Description |
|--------|-------------|
| `PROXMOX_ENDPOINT` | Proxmox API endpoint (for example `https://100.105.0.115:8006/`) |
| `PROXMOX_API_TOKEN_ID` | Proxmox API token ID |
| `PROXMOX_API_TOKEN_SECRET` | Proxmox API token secret |
| `S3_ACCESS_KEY` | Backblaze B2 keyID |
| `S3_SECRET_KEY` | Backblaze B2 applicationKey |
| `S3_ENDPOINT` | Backblaze S3 endpoint (e.g., `https://s3.eu-central-003.backblazeb2.com`) |
| `S3_BUCKET` | S3 bucket name (e.g., `k8s-terraform-state`) |
| `TAILSCALE_AUTH_KEY` | Tailscale auth key for node bootstrap |
| `TAILSCALE_TAILNET` | Tailnet domain (e.g., `yourtailnet.ts.net`) |
| `TAILSCALE_OAUTH_CLIENT_ID` | Tailscale OAuth client ID for Kubernetes Operator |
| `TAILSCALE_OAUTH_CLIENT_SECRET` | Tailscale OAuth client secret for Kubernetes Operator |
| `DOPPLER_HETZNERTERRA_SERVICE_TOKEN` | Doppler service token for `hetznerterra` runtime secrets |
| `GRAFANA_ADMIN_PASSWORD` | Optional admin password for Grafana (auto-generated if unset) |
| `SSH_PUBLIC_KEY` | SSH public key content |
| `SSH_PRIVATE_KEY` | SSH private key content |

## GitOps (Flux)

This repo uses Flux for continuous reconciliation after Terraform + Ansible bootstrap.

### Stable private-only baseline

The current default target is the HA private baseline:

- `3` control plane nodes
- `5` worker nodes
- private Proxmox network only
- Tailscale for operator and service access
- Flux-managed platform addons with `apps` suspended by default

Detailed phase gates and success criteria live in `STABLE_BASELINE.md`.

This is the default until rebuilds are consistently green. High availability, public ingress, and app-layer expansion come later.

### Runtime secrets

Runtime cluster secrets are moving to Doppler + External Secrets Operator.

- Doppler project: `hetznerterra`
- Initial auth: service token via `DOPPLER_HETZNERTERRA_SERVICE_TOKEN`
- First synced secrets:
  - `GRAFANA_ADMIN_PASSWORD`

Terraform/bootstrap secrets remain in Gitea Actions secrets and are not managed by Doppler.

### Repository layout

- `clusters/prod/`: cluster entrypoint and Flux reconciliation objects
- `clusters/prod/flux-system/`: `GitRepository` source and top-level `Kustomization` graph
- `infrastructure/`: infrastructure addon reconciliation graph
- `infrastructure/addons/*`: per-addon manifests for Flux-managed cluster addons
- `apps/`: application workload layer (currently scaffolded)

### Reconciliation graph

- `infrastructure` (top-level)
  - `addon-nfs-storage`
  - `addon-tailscale-operator`
  - `addon-observability`
  - `addon-observability-content` depends on `addon-observability`
- `apps` depends on `infrastructure`

### Bootstrap notes

1. Install Flux controllers in `flux-system`.
2. Create the Flux deploy key/secret named `flux-system` in `flux-system` namespace.
3. Apply `clusters/prod/flux-system/` once to establish source + reconciliation graph.
4. Bootstrap-only Ansible creates prerequisite secrets; Flux manages addon lifecycle after bootstrap.

### Current addon status

- Core infrastructure addons are Flux-managed from `infrastructure/addons/`.
- Active Flux addons for the current baseline: `addon-nfs-storage`, `addon-cert-manager`, `addon-external-secrets`, `addon-tailscale-operator`, `addon-tailscale-proxyclass`, `addon-observability`, `addon-observability-content`, `addon-rancher`, `addon-rancher-config`, `addon-rancher-backup`, `addon-rancher-backup-config`.
- `apps` remains suspended until workload rollout is explicitly enabled.
- Ansible is limited to cluster bootstrap, prerequisite secret creation, pre-proxy Tailscale cleanup, and kubeconfig finalization.
- Weave GitOps / Flux UI is no longer deployed; use Rancher or the `flux` CLI for Flux operations.

### Rancher access

- Rancher is private-only and exposed through Tailscale at `https://rancher.silverside-gopher.ts.net/`.
- Rancher and the Kubernetes API stay private; kube-vip provides the API VIP on the LAN.
- Rancher stores state in embedded etcd; no external database is used.

### Stable baseline acceptance

A rebuild is considered successful only when all of the following pass without manual intervention:

- Terraform create succeeds for the default `3` control planes and `5` workers.
- Ansible bootstrap succeeds end-to-end.
- All nodes become `Ready`.
- Flux core reconciliation is healthy.
- External Secrets Operator is ready.
- Tailscale operator is ready.
- Tailnet smoke checks pass for Rancher, Grafana, and Prometheus.
- Terraform destroy succeeds cleanly or succeeds after workflow retries.

## Observability Stack

Flux deploys a lightweight observability stack in the `observability` namespace:

- `kube-prometheus-stack` (Prometheus + Grafana)
- `loki`
- `promtail`

Grafana content is managed as code via ConfigMaps in `infrastructure/addons/observability-content/`.

Grafana and Prometheus are exposed through dedicated Tailscale LoadBalancer services when the Tailscale Kubernetes Operator is healthy.

### Access Grafana and Prometheus

Preferred private access:

- Grafana: `http://grafana.silverside-gopher.ts.net/`
- Prometheus: `http://prometheus.silverside-gopher.ts.net:9090/`

Fallback (port-forward from a tailnet-connected machine):

Run from a tailnet-connected machine:

```bash
export KUBECONFIG=$(pwd)/outputs/kubeconfig

kubectl -n observability port-forward svc/kube-prometheus-stack-grafana 3000:80
kubectl -n observability port-forward svc/kube-prometheus-stack-prometheus 9090:9090
```

Then open:

- Grafana: http://127.0.0.1:3000
- Prometheus: http://127.0.0.1:9090

Grafana user: `admin`
Grafana password: value of `GRAFANA_ADMIN_PASSWORD` secret (or the generated value shown by Ansible output)

### Verify Tailscale exposure

```bash
export KUBECONFIG=$(pwd)/outputs/kubeconfig

kubectl -n tailscale-system get pods
kubectl -n cattle-system get svc rancher-tailscale
kubectl -n observability get svc grafana-tailscale prometheus-tailscale
kubectl -n cattle-system describe svc rancher-tailscale | grep TailscaleProxyReady
kubectl -n observability describe svc grafana-tailscale | grep TailscaleProxyReady
kubectl -n observability describe svc prometheus-tailscale | grep TailscaleProxyReady
```

If `TailscaleProxyReady=False`, check:

```bash
kubectl -n tailscale-system logs deployment/operator --tail=100
```

Common cause: OAuth client missing tag/scopes permissions.

### Fast dashboard iteration workflow

Use the `Deploy Grafana Content` workflow when changing dashboard/data source templates.
It avoids full cluster provisioning and only applies Grafana content resources:

- `ansible/roles/observability-content/templates/grafana-datasources.yaml.j2`
- `ansible/roles/observability-content/templates/grafana-dashboard-k8s-overview.yaml.j2`
- `ansible/dashboards.yml`

## File Structure

```
.
├── terraform/
│   ├── main.tf
│   ├── variables.tf
│   ├── servers.tf
│   ├── outputs.tf
│   └── backend.tf
├── ansible/
│   ├── inventory.tmpl
│   ├── generate_inventory.py
│   ├── site.yml
│   ├── roles/
│   │   ├── common/
│   │   ├── k3s-server/
│   │   ├── k3s-agent/
│   │   ├── addon-secrets-bootstrap/
│   │   ├── observability-content/
│   │   └── observability/
│   └── ansible.cfg
├── .gitea/
│   └── workflows/
│       ├── terraform.yml
│       ├── ansible.yml
│       └── dashboards.yml
├── outputs/
├── terraform.tfvars.example
└── README.md
```

## Firewall Rules

This repo no longer manages cloud firewalls. Access control is expected to be handled on your LAN infrastructure and through Tailscale.

Important cluster-local ports still in use:

| Port | Source | Purpose |
|------|--------|---------|
| 22 | Admin hosts / CI | SSH |
| 6443 | 10.27.27.0/24 + VIP | Kubernetes API |
| 9345 | 10.27.27.0/24 | k3s Supervisor |
| 2379 | 10.27.27.0/24 | etcd Client |
| 2380 | 10.27.27.0/24 | etcd Peer |
| 8472/udp | 10.27.27.0/24 | Flannel VXLAN |
| 10250 | 10.27.27.0/24 | Kubelet |

## Operations

### Scale Workers

Edit `terraform.tfvars`:

```hcl
worker_count = 5
```

Then:

```bash
terraform apply
ansible-playbook site.yml
```

### Upgrade k3s

```bash
ansible-playbook site.yml -t upgrade
```

### Destroy Cluster

```bash
terraform destroy
```

## Troubleshooting

### Check k3s Logs

```bash
ssh ubuntu@<control-plane-ip> sudo journalctl -u k3s -f
```

### Reset k3s

```bash
ansible-playbook site.yml -t reset
```

## Security Notes

- Control plane has HA (3 nodes, can survive 1 failure)
- Kubernetes API HA is provided by kube-vip on `10.27.27.40`
- Rotate API tokens regularly
- Use network policies in Kubernetes
- Enable audit logging for production

## License

MIT