# Kubeadm Cluster Layout (NixOS) This folder defines role-based NixOS configs for a kubeadm cluster. ## Topology - Control planes: `cp-1`, `cp-2`, `cp-3` - Workers: `wk-1`, `wk-2`, `wk-3` ## What this provides - Shared Kubernetes/node prerequisites in `modules/k8s-common.nix` - Shared cluster defaults in `modules/k8s-cluster-settings.nix` - Role-specific settings for control planes and workers - Generated per-node host configs from `flake.nix` (no duplicated host files) - Bootstrap helper commands on each node: - `th-kubeadm-init` - `th-kubeadm-join-control-plane` - `th-kubeadm-join-worker` - `th-kubeadm-status` - A Python bootstrap controller for orchestration: - `bootstrap/controller.py` ## Layered architecture - `terraform/`: VM lifecycle only - `nixos/kubeadm/modules/`: declarative node OS config only - `nixos/kubeadm/bootstrap/controller.py`: imperative cluster reconciliation state machine ## Hardware config files The flake automatically imports `hosts/hardware/.nix` if present. Copy each node's generated hardware config into this folder: ```bash sudo nixos-generate-config sudo cp /etc/nixos/hardware-configuration.nix ./hosts/hardware/cp-1.nix ``` Repeat for each node (`cp-2`, `cp-3`, `wk-1`, `wk-2`, `wk-3`). ## Deploy approach Start from one node at a time while experimenting: ```bash sudo nixos-rebuild switch --flake .#cp-1 ``` For remote target-host workflows, use your preferred deploy wrapper later (`nixos-rebuild --target-host ...` or deploy-rs/colmena). ## Bootstrap runbook (kubeadm + kube-vip + Cilium) 1. Apply Nix config on all nodes (`cp-*`, then `wk-*`). 2. On `cp-1`, run: ```bash sudo th-kubeadm-init ``` This infers the control-plane VIP as `.250` on `eth0`, creates the kube-vip static pod manifest, and runs `kubeadm init`. 3. Install Cilium from `cp-1`: ```bash helm repo add cilium https://helm.cilium.io helm repo update helm upgrade --install cilium cilium/cilium \ --namespace kube-system \ --set kubeProxyReplacement=true ``` 4. Generate join commands on `cp-1`: ```bash sudo kubeadm token create --print-join-command sudo kubeadm init phase upload-certs --upload-certs ``` 5. Join `cp-2` and `cp-3`: ```bash sudo th-kubeadm-join-control-plane '' ``` 6. Join workers: ```bash sudo th-kubeadm-join-worker '' ``` 7. Validate from a control plane: ```bash kubectl get nodes -o wide kubectl -n kube-system get pods -o wide ``` ## Repeatable rebuild flow (recommended) 1. Copy and edit inventory: ```bash cp ./scripts/inventory.example.env ./scripts/inventory.env $EDITOR ./scripts/inventory.env ``` 2. Rebuild all nodes and bootstrap/reconcile cluster: ```bash ./scripts/rebuild-and-bootstrap.sh ``` Optional tuning env vars: ```bash FAST_MODE=1 WORKER_PARALLELISM=3 REBUILD_TIMEOUT=45m REBUILD_RETRIES=2 ./scripts/rebuild-and-bootstrap.sh ``` - `FAST_MODE=1` skips pre-rebuild remote GC cleanup to reduce wall-clock time. - Set `FAST_MODE=0` for a slower but more aggressive space cleanup pass. ### Bootstrap controller state The controller stores checkpoints in both places: - Remote (source of truth): `/var/lib/terrahome/bootstrap-state.json` on `cp-1` - Local copy (workflow/debug artifact): `nixos/kubeadm/bootstrap/bootstrap-state-last.json` This makes retries resumable and keeps failure context visible from CI. 3. If you only want to reset Kubernetes state on existing VMs: ```bash ./scripts/reset-cluster-nodes.sh ``` For a full nuke/recreate lifecycle: - run Terraform destroy/apply for VMs first, - then run `./scripts/rebuild-and-bootstrap.sh` again. Node lists are discovered from Terraform outputs, so adding new workers/control planes in Terraform is picked up automatically by the bootstrap/reconcile flow. ## Optional Gitea workflow automation Primary flow: - Push to `master` triggers `.gitea/workflows/terraform-apply.yml` - That workflow now does Terraform apply and then runs kubeadm rebuild/bootstrap reconciliation automatically Manual dispatch workflows are available: - `.gitea/workflows/kubeadm-bootstrap.yml` - `.gitea/workflows/kubeadm-reset.yml` Required repository secrets: - Existing Terraform/backend secrets used by current workflows (`B2_*`, `PM_API_TOKEN_SECRET`, `SSH_KEY_PUBLIC`) - SSH private key: prefer `KUBEADM_SSH_PRIVATE_KEY`, fallback to existing `SSH_KEY_PRIVATE` Optional secrets: - `KUBEADM_SSH_USER` (defaults to `micqdf`) - `KUBEADM_SUBNET_PREFIX` (optional, e.g. `10.27.27`; used for SSH-based IP discovery fallback) Node IPs are auto-discovered from Terraform state outputs (`control_plane_vm_ipv4`, `worker_vm_ipv4`), so you do not need per-node IP secrets. ## Notes - Scripts are intentionally manual-triggered (predictable for homelab bring-up). - If `.250` on the node subnet is already in use, change `controlPlaneVipSuffix` in `modules/k8s-cluster-settings.nix` before bootstrap.