2026-02-28 14:16:55 +00:00
# Kubeadm Cluster Layout (NixOS)
This folder defines role-based NixOS configs for a kubeadm cluster.
## Topology
- Control planes: `cp-1` , `cp-2` , `cp-3`
- Workers: `wk-1` , `wk-2` , `wk-3`
## What this provides
- Shared Kubernetes/node prerequisites in `modules/k8s-common.nix`
2026-02-28 16:04:14 +00:00
- Shared cluster defaults in `modules/k8s-cluster-settings.nix`
2026-02-28 14:16:55 +00:00
- Role-specific settings for control planes and workers
2026-02-28 16:09:05 +00:00
- Generated per-node host configs from `flake.nix` (no duplicated host files)
2026-03-03 00:09:10 +00:00
- Bootstrap helper commands on each node:
2026-02-28 16:04:14 +00:00
- `th-kubeadm-init`
- `th-kubeadm-join-control-plane`
- `th-kubeadm-join-worker`
- `th-kubeadm-status`
2026-03-03 00:09:10 +00:00
- A Python bootstrap controller for orchestration:
- `bootstrap/controller.py`
## Layered architecture
- `terraform/` : VM lifecycle only
- `nixos/kubeadm/modules/` : declarative node OS config only
- `nixos/kubeadm/bootstrap/controller.py` : imperative cluster reconciliation state machine
2026-02-28 14:16:55 +00:00
## Hardware config files
2026-02-28 16:09:05 +00:00
The flake automatically imports `hosts/hardware/<host>.nix` if present.
2026-02-28 14:16:55 +00:00
Copy each node's generated hardware config into this folder:
```bash
sudo nixos-generate-config
sudo cp /etc/nixos/hardware-configuration.nix ./hosts/hardware/cp-1.nix
```
Repeat for each node (`cp-2` , `cp-3` , `wk-1` , `wk-2` , `wk-3` ).
## Deploy approach
Start from one node at a time while experimenting:
```bash
sudo nixos-rebuild switch --flake .#cp -1
```
For remote target-host workflows, use your preferred deploy wrapper later
(`nixos-rebuild --target-host ...` or deploy-rs/colmena).
2026-03-07 00:52:35 +00:00
## Bootstrap runbook (kubeadm + kube-vip + Flannel)
2026-02-28 16:04:14 +00:00
1. Apply Nix config on all nodes (`cp-*` , then `wk-*` ).
2. On `cp-1` , run:
```bash
sudo th-kubeadm-init
```
This infers the control-plane VIP as `<node-subnet>.250` on `eth0` , creates the
kube-vip static pod manifest, and runs `kubeadm init` .
2026-03-07 00:52:35 +00:00
3. Install Flannel from `cp-1` :
2026-02-28 16:04:14 +00:00
```bash
2026-03-07 00:52:35 +00:00
kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/v0.25.5/Documentation/kube-flannel.yml
2026-02-28 16:04:14 +00:00
```
4. Generate join commands on `cp-1` :
```bash
sudo kubeadm token create --print-join-command
sudo kubeadm init phase upload-certs --upload-certs
```
5. Join `cp-2` and `cp-3` :
```bash
sudo th-kubeadm-join-control-plane '<kubeadm join ... --control-plane --certificate-key ...>'
```
6. Join workers:
```bash
sudo th-kubeadm-join-worker '<kubeadm join ...>'
```
7. Validate from a control plane:
```bash
kubectl get nodes -o wide
kubectl -n kube-system get pods -o wide
```
2026-03-07 00:52:35 +00:00
## Fresh bootstrap flow (recommended)
2026-02-28 16:24:45 +00:00
1. Copy and edit inventory:
```bash
cp ./scripts/inventory.example.env ./scripts/inventory.env
$EDITOR ./scripts/inventory.env
```
2026-03-07 00:52:35 +00:00
2. Rebuild all nodes and bootstrap a fresh cluster:
2026-02-28 16:24:45 +00:00
```bash
./scripts/rebuild-and-bootstrap.sh
```
2026-02-28 22:15:40 +00:00
Optional tuning env vars:
```bash
2026-03-01 03:33:33 +00:00
FAST_MODE=1 WORKER_PARALLELISM=3 REBUILD_TIMEOUT=45m REBUILD_RETRIES=2 ./scripts/rebuild-and-bootstrap.sh
2026-02-28 22:15:40 +00:00
```
2026-03-01 03:33:33 +00:00
- `FAST_MODE=1` skips pre-rebuild remote GC cleanup to reduce wall-clock time.
- Set `FAST_MODE=0` for a slower but more aggressive space cleanup pass.
2026-03-03 00:09:10 +00:00
### Bootstrap controller state
The controller stores checkpoints in both places:
- Remote (source of truth): `/var/lib/terrahome/bootstrap-state.json` on `cp-1`
- Local copy (workflow/debug artifact): `nixos/kubeadm/bootstrap/bootstrap-state-last.json`
This makes retries resumable and keeps failure context visible from CI.
2026-02-28 16:24:45 +00:00
3. If you only want to reset Kubernetes state on existing VMs:
```bash
./scripts/reset-cluster-nodes.sh
```
For a full nuke/recreate lifecycle:
- run Terraform destroy/apply for VMs first,
- then run `./scripts/rebuild-and-bootstrap.sh` again.
2026-03-07 00:52:35 +00:00
Node lists now come directly from static Terraform outputs, so bootstrap no longer
depends on Proxmox guest-agent IP discovery or SSH subnet scanning.
2026-02-28 16:43:22 +00:00
2026-02-28 16:26:51 +00:00
## Optional Gitea workflow automation
2026-02-28 16:39:04 +00:00
Primary flow:
- Push to `master` triggers `.gitea/workflows/terraform-apply.yml`
2026-03-07 00:52:35 +00:00
- That workflow now does Terraform apply and then runs a fresh kubeadm bootstrap automatically
2026-02-28 16:39:04 +00:00
2026-02-28 16:26:51 +00:00
Manual dispatch workflows are available:
- `.gitea/workflows/kubeadm-bootstrap.yml`
- `.gitea/workflows/kubeadm-reset.yml`
Required repository secrets:
2026-02-28 16:31:23 +00:00
- Existing Terraform/backend secrets used by current workflows (`B2_*` , `PM_API_TOKEN_SECRET` , `SSH_KEY_PUBLIC` )
- SSH private key: prefer `KUBEADM_SSH_PRIVATE_KEY` , fallback to existing `SSH_KEY_PRIVATE`
Optional secrets:
- `KUBEADM_SSH_USER` (defaults to `micqdf` )
2026-03-07 00:52:35 +00:00
Node IPs are rendered directly from static Terraform outputs (`control_plane_vm_ipv4` , `worker_vm_ipv4` ), so you do not need per-node IP secrets or SSH discovery fallbacks.
2026-02-28 16:26:51 +00:00
2026-02-28 14:16:55 +00:00
## Notes
2026-02-28 16:04:14 +00:00
- Scripts are intentionally manual-triggered (predictable for homelab bring-up).
- If `.250` on the node subnet is already in use, change `controlPlaneVipSuffix`
in `modules/k8s-cluster-settings.nix` before bootstrap.