kube-vip expects an unsigned integer for --log. Replace --log -4 with --log 4 so manifest generation no longer fails during bootstrap.
Kubeadm Cluster Layout (NixOS)
This folder defines role-based NixOS configs for a kubeadm cluster.
Topology
- Control planes:
cp-1,cp-2,cp-3 - Workers:
wk-1,wk-2,wk-3
What this provides
- Shared Kubernetes/node prerequisites in
modules/k8s-common.nix - Shared cluster defaults in
modules/k8s-cluster-settings.nix - Role-specific settings for control planes and workers
- Generated per-node host configs from
flake.nix(no duplicated host files) - Bootstrap helper commands on each node:
th-kubeadm-initth-kubeadm-join-control-planeth-kubeadm-join-workerth-kubeadm-status
- A Python bootstrap controller for orchestration:
bootstrap/controller.py
Layered architecture
terraform/: VM lifecycle onlynixos/kubeadm/modules/: declarative node OS config onlynixos/kubeadm/bootstrap/controller.py: imperative cluster reconciliation state machine
Hardware config files
The flake automatically imports hosts/hardware/<host>.nix if present.
Copy each node's generated hardware config into this folder:
sudo nixos-generate-config
sudo cp /etc/nixos/hardware-configuration.nix ./hosts/hardware/cp-1.nix
Repeat for each node (cp-2, cp-3, wk-1, wk-2, wk-3).
Deploy approach
Start from one node at a time while experimenting:
sudo nixos-rebuild switch --flake .#cp-1
For remote target-host workflows, use your preferred deploy wrapper later
(nixos-rebuild --target-host ... or deploy-rs/colmena).
Bootstrap runbook (kubeadm + kube-vip + Cilium)
- Apply Nix config on all nodes (
cp-*, thenwk-*). - On
cp-1, run:
sudo th-kubeadm-init
This infers the control-plane VIP as <node-subnet>.250 on eth0, creates the
kube-vip static pod manifest, and runs kubeadm init.
- Install Cilium from
cp-1:
helm repo add cilium https://helm.cilium.io
helm repo update
helm upgrade --install cilium cilium/cilium \
--namespace kube-system \
--set kubeProxyReplacement=true
- Generate join commands on
cp-1:
sudo kubeadm token create --print-join-command
sudo kubeadm init phase upload-certs --upload-certs
- Join
cp-2andcp-3:
sudo th-kubeadm-join-control-plane '<kubeadm join ... --control-plane --certificate-key ...>'
- Join workers:
sudo th-kubeadm-join-worker '<kubeadm join ...>'
- Validate from a control plane:
kubectl get nodes -o wide
kubectl -n kube-system get pods -o wide
Repeatable rebuild flow (recommended)
- Copy and edit inventory:
cp ./scripts/inventory.example.env ./scripts/inventory.env
$EDITOR ./scripts/inventory.env
- Rebuild all nodes and bootstrap/reconcile cluster:
./scripts/rebuild-and-bootstrap.sh
Optional tuning env vars:
FAST_MODE=1 WORKER_PARALLELISM=3 REBUILD_TIMEOUT=45m REBUILD_RETRIES=2 ./scripts/rebuild-and-bootstrap.sh
FAST_MODE=1skips pre-rebuild remote GC cleanup to reduce wall-clock time.- Set
FAST_MODE=0for a slower but more aggressive space cleanup pass.
Bootstrap controller state
The controller stores checkpoints in both places:
- Remote (source of truth):
/var/lib/terrahome/bootstrap-state.jsononcp-1 - Local copy (workflow/debug artifact):
nixos/kubeadm/bootstrap/bootstrap-state-last.json
This makes retries resumable and keeps failure context visible from CI.
- If you only want to reset Kubernetes state on existing VMs:
./scripts/reset-cluster-nodes.sh
For a full nuke/recreate lifecycle:
- run Terraform destroy/apply for VMs first,
- then run
./scripts/rebuild-and-bootstrap.shagain.
Node lists are discovered from Terraform outputs, so adding new workers/control planes in Terraform is picked up automatically by the bootstrap/reconcile flow.
Optional Gitea workflow automation
Primary flow:
- Push to
mastertriggers.gitea/workflows/terraform-apply.yml - That workflow now does Terraform apply and then runs kubeadm rebuild/bootstrap reconciliation automatically
Manual dispatch workflows are available:
.gitea/workflows/kubeadm-bootstrap.yml.gitea/workflows/kubeadm-reset.yml
Required repository secrets:
- Existing Terraform/backend secrets used by current workflows (
B2_*,PM_API_TOKEN_SECRET,SSH_KEY_PUBLIC) - SSH private key: prefer
KUBEADM_SSH_PRIVATE_KEY, fallback to existingSSH_KEY_PRIVATE
Optional secrets:
KUBEADM_SSH_USER(defaults tomicqdf)KUBEADM_SUBNET_PREFIX(optional, e.g.10.27.27; used for SSH-based IP discovery fallback)
Node IPs are auto-discovered from Terraform state outputs (control_plane_vm_ipv4, worker_vm_ipv4), so you do not need per-node IP secrets.
Notes
- Scripts are intentionally manual-triggered (predictable for homelab bring-up).
- If
.250on the node subnet is already in use, changecontrolPlaneVipSuffixinmodules/k8s-cluster-settings.nixbefore bootstrap.