nixos/kubeadm/README.md

# Kubeadm Cluster Layout (NixOS)

This folder defines role-based NixOS configs for a kubeadm cluster.

## Topology

- Control planes: `cp-1`, `cp-2`, `cp-3`
- Workers: `wk-1`, `wk-2`, `wk-3`

## What this provides

- Shared Kubernetes/node prerequisites in `modules/k8s-common.nix`
- Shared cluster defaults in `modules/k8s-cluster-settings.nix`
- Role-specific settings for control planes and workers
- Generated per-node host configs from `flake.nix` (no duplicated host files)
- Bootstrap helper commands on each node:
  - `th-kubeadm-init`
  - `th-kubeadm-join-control-plane`
  - `th-kubeadm-join-worker`
  - `th-kubeadm-status`
- A Python bootstrap controller for orchestration:
  - `bootstrap/controller.py`

## Layered architecture

- `terraform/`: VM lifecycle only
- `nixos/kubeadm/modules/`: declarative node OS config only
- `nixos/kubeadm/bootstrap/controller.py`: imperative cluster reconciliation state machine

## Hardware config files

The flake automatically imports `hosts/hardware/<host>.nix` if present.
Copy each node's generated hardware config into this folder:

```bash
sudo nixos-generate-config
sudo cp /etc/nixos/hardware-configuration.nix ./hosts/hardware/cp-1.nix
```

Repeat for each node (`cp-2`, `cp-3`, `wk-1`, `wk-2`, `wk-3`).

## Deploy approach

Start from one node at a time while experimenting:

```bash
sudo nixos-rebuild switch --flake .#cp-1
```

For remote target-host workflows, use your preferred deploy wrapper later
(`nixos-rebuild --target-host ...` or deploy-rs/colmena).

## Bootstrap runbook (kubeadm + kube-vip + Flannel)

1. Apply Nix config on all nodes (`cp-*`, then `wk-*`).
2. On `cp-1`, run:

```bash
sudo th-kubeadm-init
```

This infers the control-plane VIP as `<node-subnet>.250` on `eth0`, creates the
kube-vip static pod manifest, and runs `kubeadm init`.

3. Install Flannel from `cp-1`:

```bash
kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/v0.25.5/Documentation/kube-flannel.yml
```

4. Generate join commands on `cp-1`:

```bash
sudo kubeadm token create --print-join-command
sudo kubeadm init phase upload-certs --upload-certs
```

5. Join `cp-2` and `cp-3`:

```bash
sudo th-kubeadm-join-control-plane '<kubeadm join ... --control-plane --certificate-key ...>'
```

6. Join workers:

```bash
sudo th-kubeadm-join-worker '<kubeadm join ...>'
```

7. Validate from a control plane:

```bash
kubectl get nodes -o wide
kubectl -n kube-system get pods -o wide
```

## Fresh bootstrap flow (recommended)

1. Copy and edit inventory:

```bash
cp ./scripts/inventory.example.env ./scripts/inventory.env
$EDITOR ./scripts/inventory.env
```

2. Rebuild all nodes and bootstrap a fresh cluster:

```bash
./scripts/rebuild-and-bootstrap.sh
```

Optional tuning env vars:

```bash
FAST_MODE=1 WORKER_PARALLELISM=3 REBUILD_TIMEOUT=45m REBUILD_RETRIES=2 ./scripts/rebuild-and-bootstrap.sh
```

- `FAST_MODE=1` skips pre-rebuild remote GC cleanup to reduce wall-clock time.
- Set `FAST_MODE=0` for a slower but more aggressive space cleanup pass.

### Bootstrap controller state

The controller stores checkpoints in both places:

- Remote (source of truth): `/var/lib/terrahome/bootstrap-state.json` on `cp-1`
- Local copy (workflow/debug artifact): `nixos/kubeadm/bootstrap/bootstrap-state-last.json`

This makes retries resumable and keeps failure context visible from CI.

3. If you only want to reset Kubernetes state on existing VMs:

```bash
./scripts/reset-cluster-nodes.sh
```

For a full nuke/recreate lifecycle:
- run Terraform destroy/apply for VMs first,
- then run `./scripts/rebuild-and-bootstrap.sh` again.

Node lists now come directly from static Terraform outputs, so bootstrap no longer
depends on Proxmox guest-agent IP discovery or SSH subnet scanning.

## Optional Gitea workflow automation

Primary flow:

- Push to `master` triggers `.gitea/workflows/terraform-apply.yml`
- That workflow now does Terraform apply and then runs a fresh kubeadm bootstrap automatically

Manual dispatch workflows are available:

- `.gitea/workflows/kubeadm-bootstrap.yml`
- `.gitea/workflows/kubeadm-reset.yml`

Required repository secrets:

- Existing Terraform/backend secrets used by current workflows (`B2_*`, `PM_API_TOKEN_SECRET`, `SSH_KEY_PUBLIC`)
- SSH private key: prefer `KUBEADM_SSH_PRIVATE_KEY`, fallback to existing `SSH_KEY_PRIVATE`

Optional secrets:

- `KUBEADM_SSH_USER` (defaults to `micqdf`)
Node IPs are rendered directly from static Terraform outputs (`control_plane_vm_ipv4`, `worker_vm_ipv4`), so you do not need per-node IP secrets or SSH discovery fallbacks.

## Notes

- Scripts are intentionally manual-triggered (predictable for homelab bring-up).
- If `.250` on the node subnet is already in use, change `controlPlaneVipSuffix`
  in `modules/k8s-cluster-settings.nix` before bootstrap.
feat: refactor infra to cp/wk kubeadm topology Provision 3 thin control planes and 3 workers with role-specific sizing and VMID ranges (701/711), generate per-node cloud-init snippets with SSH key injection, and add NixOS kubeadm host/module scaffolding for cp-1..3 and wk-1..3. 2026-02-28 14:16:55 +00:00			`# Kubeadm Cluster Layout (NixOS)`

			`This folder defines role-based NixOS configs for a kubeadm cluster.`

			`## Topology`

			- Control planes: `cp-1`, `cp-2`, `cp-3`
			- Workers: `wk-1`, `wk-2`, `wk-3`

			`## What this provides`

			- Shared Kubernetes/node prerequisites in `modules/k8s-common.nix`
feat: implement kubeadm bootstrap scaffolding for Nix nodes 2026-02-28 16:04:14 +00:00			- Shared cluster defaults in `modules/k8s-cluster-settings.nix`
feat: refactor infra to cp/wk kubeadm topology Provision 3 thin control planes and 3 workers with role-specific sizing and VMID ranges (701/711), generate per-node cloud-init snippets with SSH key injection, and add NixOS kubeadm host/module scaffolding for cp-1..3 and wk-1..3. 2026-02-28 14:16:55 +00:00			`- Role-specific settings for control planes and workers`
refactor: generate kubeadm host configs from flake 2026-02-28 16:09:05 +00:00			- Generated per-node host configs from `flake.nix` (no duplicated host files)
refactor: add Python bootstrap controller with resumable state Introduce a clean orchestration layer in nixos/kubeadm/bootstrap/controller.py and slim rebuild-and-bootstrap.sh into a thin wrapper. The controller now owns preflight, rebuild, init, CNI install, join, and verify stages with persisted checkpoints on cp-1 plus a local state copy for CI debugging. 2026-03-03 00:09:10 +00:00			`- Bootstrap helper commands on each node:`
feat: implement kubeadm bootstrap scaffolding for Nix nodes 2026-02-28 16:04:14 +00:00			- `th-kubeadm-init`
			- `th-kubeadm-join-control-plane`
			- `th-kubeadm-join-worker`
			- `th-kubeadm-status`
refactor: add Python bootstrap controller with resumable state Introduce a clean orchestration layer in nixos/kubeadm/bootstrap/controller.py and slim rebuild-and-bootstrap.sh into a thin wrapper. The controller now owns preflight, rebuild, init, CNI install, join, and verify stages with persisted checkpoints on cp-1 plus a local state copy for CI debugging. 2026-03-03 00:09:10 +00:00			`- A Python bootstrap controller for orchestration:`
			- `bootstrap/controller.py`

			`## Layered architecture`

			- `terraform/`: VM lifecycle only
			- `nixos/kubeadm/modules/`: declarative node OS config only
			- `nixos/kubeadm/bootstrap/controller.py`: imperative cluster reconciliation state machine
feat: refactor infra to cp/wk kubeadm topology Provision 3 thin control planes and 3 workers with role-specific sizing and VMID ranges (701/711), generate per-node cloud-init snippets with SSH key injection, and add NixOS kubeadm host/module scaffolding for cp-1..3 and wk-1..3. 2026-02-28 14:16:55 +00:00
			`## Hardware config files`

refactor: generate kubeadm host configs from flake 2026-02-28 16:09:05 +00:00			The flake automatically imports `hosts/hardware/<host>.nix` if present.
feat: refactor infra to cp/wk kubeadm topology Provision 3 thin control planes and 3 workers with role-specific sizing and VMID ranges (701/711), generate per-node cloud-init snippets with SSH key injection, and add NixOS kubeadm host/module scaffolding for cp-1..3 and wk-1..3. 2026-02-28 14:16:55 +00:00			`Copy each node's generated hardware config into this folder:`

			```bash
			`sudo nixos-generate-config`
			`sudo cp /etc/nixos/hardware-configuration.nix ./hosts/hardware/cp-1.nix`
			```

			Repeat for each node (`cp-2`, `cp-3`, `wk-1`, `wk-2`, `wk-3`).

			`## Deploy approach`

			`Start from one node at a time while experimenting:`

			```bash
			`sudo nixos-rebuild switch --flake .#cp-1`
			```

			`For remote target-host workflows, use your preferred deploy wrapper later`
			(`nixos-rebuild --target-host ...` or deploy-rs/colmena).

refactor: simplify homelab bootstrap around static IPs and fresh runs Make Terraform the source of truth for node IPs, remove guest-agent/SSH discovery from the normal workflow path, simplify the bootstrap controller to a fresh-run flow, and swap the initial CNI to Flannel so cluster readiness is easier to prove before reintroducing more complex reconcile behavior. 2026-03-07 00:52:35 +00:00			`## Bootstrap runbook (kubeadm + kube-vip + Flannel)`
feat: implement kubeadm bootstrap scaffolding for Nix nodes 2026-02-28 16:04:14 +00:00
			1. Apply Nix config on all nodes (`cp-`, then `wk-`).
			2. On `cp-1`, run:

			```bash
			`sudo th-kubeadm-init`
			```

			This infers the control-plane VIP as `<node-subnet>.250` on `eth0`, creates the
			kube-vip static pod manifest, and runs `kubeadm init`.

refactor: simplify homelab bootstrap around static IPs and fresh runs Make Terraform the source of truth for node IPs, remove guest-agent/SSH discovery from the normal workflow path, simplify the bootstrap controller to a fresh-run flow, and swap the initial CNI to Flannel so cluster readiness is easier to prove before reintroducing more complex reconcile behavior. 2026-03-07 00:52:35 +00:00			3. Install Flannel from `cp-1`:
feat: implement kubeadm bootstrap scaffolding for Nix nodes 2026-02-28 16:04:14 +00:00
			```bash
refactor: simplify homelab bootstrap around static IPs and fresh runs Make Terraform the source of truth for node IPs, remove guest-agent/SSH discovery from the normal workflow path, simplify the bootstrap controller to a fresh-run flow, and swap the initial CNI to Flannel so cluster readiness is easier to prove before reintroducing more complex reconcile behavior. 2026-03-07 00:52:35 +00:00			`kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/v0.25.5/Documentation/kube-flannel.yml`
feat: implement kubeadm bootstrap scaffolding for Nix nodes 2026-02-28 16:04:14 +00:00			```

			4. Generate join commands on `cp-1`:

			```bash
			`sudo kubeadm token create --print-join-command`
			`sudo kubeadm init phase upload-certs --upload-certs`
			```

			5. Join `cp-2` and `cp-3`:

			```bash
			`sudo th-kubeadm-join-control-plane '<kubeadm join ... --control-plane --certificate-key ...>'`
			```

			`6. Join workers:`

			```bash
			`sudo th-kubeadm-join-worker '<kubeadm join ...>'`
			```

			`7. Validate from a control plane:`

			```bash
			`kubectl get nodes -o wide`
			`kubectl -n kube-system get pods -o wide`
			```

refactor: simplify homelab bootstrap around static IPs and fresh runs Make Terraform the source of truth for node IPs, remove guest-agent/SSH discovery from the normal workflow path, simplify the bootstrap controller to a fresh-run flow, and swap the initial CNI to Flannel so cluster readiness is easier to prove before reintroducing more complex reconcile behavior. 2026-03-07 00:52:35 +00:00			`## Fresh bootstrap flow (recommended)`
feat: add repeatable kubeadm rebuild and reset scripts 2026-02-28 16:24:45 +00:00
			`1. Copy and edit inventory:`

			```bash
			`cp ./scripts/inventory.example.env ./scripts/inventory.env`
			`$EDITOR ./scripts/inventory.env`
			```

refactor: simplify homelab bootstrap around static IPs and fresh runs Make Terraform the source of truth for node IPs, remove guest-agent/SSH discovery from the normal workflow path, simplify the bootstrap controller to a fresh-run flow, and swap the initial CNI to Flannel so cluster readiness is easier to prove before reintroducing more complex reconcile behavior. 2026-03-07 00:52:35 +00:00			`2. Rebuild all nodes and bootstrap a fresh cluster:`
feat: add repeatable kubeadm rebuild and reset scripts 2026-02-28 16:24:45 +00:00
			```bash
			`./scripts/rebuild-and-bootstrap.sh`
			```

feat: parallelize worker rebuilds with retry and timeout 2026-02-28 22:15:40 +00:00			`Optional tuning env vars:`

			```bash
perf: speed up first bootstrap with fast-mode defaults 2026-03-01 03:33:33 +00:00			`FAST_MODE=1 WORKER_PARALLELISM=3 REBUILD_TIMEOUT=45m REBUILD_RETRIES=2 ./scripts/rebuild-and-bootstrap.sh`
feat: parallelize worker rebuilds with retry and timeout 2026-02-28 22:15:40 +00:00			```

perf: speed up first bootstrap with fast-mode defaults 2026-03-01 03:33:33 +00:00			- `FAST_MODE=1` skips pre-rebuild remote GC cleanup to reduce wall-clock time.
			- Set `FAST_MODE=0` for a slower but more aggressive space cleanup pass.

refactor: add Python bootstrap controller with resumable state Introduce a clean orchestration layer in nixos/kubeadm/bootstrap/controller.py and slim rebuild-and-bootstrap.sh into a thin wrapper. The controller now owns preflight, rebuild, init, CNI install, join, and verify stages with persisted checkpoints on cp-1 plus a local state copy for CI debugging. 2026-03-03 00:09:10 +00:00			`### Bootstrap controller state`

			`The controller stores checkpoints in both places:`

			- Remote (source of truth): `/var/lib/terrahome/bootstrap-state.json` on `cp-1`
			- Local copy (workflow/debug artifact): `nixos/kubeadm/bootstrap/bootstrap-state-last.json`

			`This makes retries resumable and keeps failure context visible from CI.`

feat: add repeatable kubeadm rebuild and reset scripts 2026-02-28 16:24:45 +00:00			`3. If you only want to reset Kubernetes state on existing VMs:`

			```bash
			`./scripts/reset-cluster-nodes.sh`
			```

			`For a full nuke/recreate lifecycle:`
			`- run Terraform destroy/apply for VMs first,`
			- then run `./scripts/rebuild-and-bootstrap.sh` again.

refactor: simplify homelab bootstrap around static IPs and fresh runs Make Terraform the source of truth for node IPs, remove guest-agent/SSH discovery from the normal workflow path, simplify the bootstrap controller to a fresh-run flow, and swap the initial CNI to Flannel so cluster readiness is easier to prove before reintroducing more complex reconcile behavior. 2026-03-07 00:52:35 +00:00			`Node lists now come directly from static Terraform outputs, so bootstrap no longer`
			`depends on Proxmox guest-agent IP discovery or SSH subnet scanning.`
feat: make kubeadm workflows auto-scale with terraform outputs 2026-02-28 16:43:22 +00:00
feat: add gitea workflows for kubeadm bootstrap and reset 2026-02-28 16:26:51 +00:00			`## Optional Gitea workflow automation`

feat: run kubeadm reconcile after terraform apply on master 2026-02-28 16:39:04 +00:00			`Primary flow:`

			- Push to `master` triggers `.gitea/workflows/terraform-apply.yml`
refactor: simplify homelab bootstrap around static IPs and fresh runs Make Terraform the source of truth for node IPs, remove guest-agent/SSH discovery from the normal workflow path, simplify the bootstrap controller to a fresh-run flow, and swap the initial CNI to Flannel so cluster readiness is easier to prove before reintroducing more complex reconcile behavior. 2026-03-07 00:52:35 +00:00			`- That workflow now does Terraform apply and then runs a fresh kubeadm bootstrap automatically`
feat: run kubeadm reconcile after terraform apply on master 2026-02-28 16:39:04 +00:00
feat: add gitea workflows for kubeadm bootstrap and reset 2026-02-28 16:26:51 +00:00			`Manual dispatch workflows are available:`

			- `.gitea/workflows/kubeadm-bootstrap.yml`
			- `.gitea/workflows/kubeadm-reset.yml`

			`Required repository secrets:`

feat: auto-discover kubeadm node IPs from terraform state 2026-02-28 16:31:23 +00:00			- Existing Terraform/backend secrets used by current workflows (`B2_*`, `PM_API_TOKEN_SECRET`, `SSH_KEY_PUBLIC`)
			- SSH private key: prefer `KUBEADM_SSH_PRIVATE_KEY`, fallback to existing `SSH_KEY_PRIVATE`

			`Optional secrets:`

			- `KUBEADM_SSH_USER` (defaults to `micqdf`)
refactor: simplify homelab bootstrap around static IPs and fresh runs Make Terraform the source of truth for node IPs, remove guest-agent/SSH discovery from the normal workflow path, simplify the bootstrap controller to a fresh-run flow, and swap the initial CNI to Flannel so cluster readiness is easier to prove before reintroducing more complex reconcile behavior. 2026-03-07 00:52:35 +00:00			Node IPs are rendered directly from static Terraform outputs (`control_plane_vm_ipv4`, `worker_vm_ipv4`), so you do not need per-node IP secrets or SSH discovery fallbacks.
feat: add gitea workflows for kubeadm bootstrap and reset 2026-02-28 16:26:51 +00:00
feat: refactor infra to cp/wk kubeadm topology Provision 3 thin control planes and 3 workers with role-specific sizing and VMID ranges (701/711), generate per-node cloud-init snippets with SSH key injection, and add NixOS kubeadm host/module scaffolding for cp-1..3 and wk-1..3. 2026-02-28 14:16:55 +00:00			`## Notes`

feat: implement kubeadm bootstrap scaffolding for Nix nodes 2026-02-28 16:04:14 +00:00			`- Scripts are intentionally manual-triggered (predictable for homelab bring-up).`
			- If `.250` on the node subnet is already in use, change `controlPlaneVipSuffix`
			in `modules/k8s-cluster-settings.nix` before bootstrap.