Files
TerraHome/nixos/kubeadm

Kubeadm Cluster Layout (NixOS)

This folder defines role-based NixOS configs for a kubeadm cluster.

Topology

  • Control planes: cp-1, cp-2, cp-3
  • Workers: wk-1, wk-2, wk-3

What this provides

  • Shared Kubernetes/node prerequisites in modules/k8s-common.nix
  • Shared cluster defaults in modules/k8s-cluster-settings.nix
  • Role-specific settings for control planes and workers
  • Generated per-node host configs from flake.nix (no duplicated host files)
  • Bootstrap helper commands:
    • th-kubeadm-init
    • th-kubeadm-join-control-plane
    • th-kubeadm-join-worker
    • th-kubeadm-status

Hardware config files

The flake automatically imports hosts/hardware/<host>.nix if present. Copy each node's generated hardware config into this folder:

sudo nixos-generate-config
sudo cp /etc/nixos/hardware-configuration.nix ./hosts/hardware/cp-1.nix

Repeat for each node (cp-2, cp-3, wk-1, wk-2, wk-3).

Deploy approach

Start from one node at a time while experimenting:

sudo nixos-rebuild switch --flake .#cp-1

For remote target-host workflows, use your preferred deploy wrapper later (nixos-rebuild --target-host ... or deploy-rs/colmena).

Bootstrap runbook (kubeadm + kube-vip + Cilium)

  1. Apply Nix config on all nodes (cp-*, then wk-*).
  2. On cp-1, run:
sudo th-kubeadm-init

This infers the control-plane VIP as <node-subnet>.250 on eth0, creates the kube-vip static pod manifest, and runs kubeadm init.

  1. Install Cilium from cp-1:
helm repo add cilium https://helm.cilium.io
helm repo update
helm upgrade --install cilium cilium/cilium \
  --namespace kube-system \
  --set kubeProxyReplacement=true
  1. Generate join commands on cp-1:
sudo kubeadm token create --print-join-command
sudo kubeadm init phase upload-certs --upload-certs
  1. Join cp-2 and cp-3:
sudo th-kubeadm-join-control-plane '<kubeadm join ... --control-plane --certificate-key ...>'
  1. Join workers:
sudo th-kubeadm-join-worker '<kubeadm join ...>'
  1. Validate from a control plane:
kubectl get nodes -o wide
kubectl -n kube-system get pods -o wide
  1. Copy and edit inventory:
cp ./scripts/inventory.example.env ./scripts/inventory.env
$EDITOR ./scripts/inventory.env
  1. Rebuild all nodes and bootstrap/reconcile cluster:
./scripts/rebuild-and-bootstrap.sh

Optional tuning env vars:

FAST_MODE=1 WORKER_PARALLELISM=3 REBUILD_TIMEOUT=45m REBUILD_RETRIES=2 ./scripts/rebuild-and-bootstrap.sh
  • FAST_MODE=1 skips pre-rebuild remote GC cleanup to reduce wall-clock time.
  • Set FAST_MODE=0 for a slower but more aggressive space cleanup pass.
  1. If you only want to reset Kubernetes state on existing VMs:
./scripts/reset-cluster-nodes.sh

For a full nuke/recreate lifecycle:

  • run Terraform destroy/apply for VMs first,
  • then run ./scripts/rebuild-and-bootstrap.sh again.

Node lists are discovered from Terraform outputs, so adding new workers/control planes in Terraform is picked up automatically by the bootstrap/reconcile flow.

Optional Gitea workflow automation

Primary flow:

  • Push to master triggers .gitea/workflows/terraform-apply.yml
  • That workflow now does Terraform apply and then runs kubeadm rebuild/bootstrap reconciliation automatically

Manual dispatch workflows are available:

  • .gitea/workflows/kubeadm-bootstrap.yml
  • .gitea/workflows/kubeadm-reset.yml

Required repository secrets:

  • Existing Terraform/backend secrets used by current workflows (B2_*, PM_API_TOKEN_SECRET, SSH_KEY_PUBLIC)
  • SSH private key: prefer KUBEADM_SSH_PRIVATE_KEY, fallback to existing SSH_KEY_PRIVATE

Optional secrets:

  • KUBEADM_SSH_USER (defaults to micqdf)

Node IPs are auto-discovered from Terraform state outputs (control_plane_vm_ipv4, worker_vm_ipv4), so you do not need per-node IP secrets.

Notes

  • Scripts are intentionally manual-triggered (predictable for homelab bring-up).
  • If .250 on the node subnet is already in use, change controlPlaneVipSuffix in modules/k8s-cluster-settings.nix before bootstrap.