Some checks failed
Terraform Plan / Terraform Plan (push) Failing after 10s
Make Terraform the source of truth for node IPs, remove guest-agent/SSH discovery from the normal workflow path, simplify the bootstrap controller to a fresh-run flow, and swap the initial CNI to Flannel so cluster readiness is easier to prove before reintroducing more complex reconcile behavior.
170 lines
4.7 KiB
Markdown
170 lines
4.7 KiB
Markdown
# Kubeadm Cluster Layout (NixOS)
|
|
|
|
This folder defines role-based NixOS configs for a kubeadm cluster.
|
|
|
|
## Topology
|
|
|
|
- Control planes: `cp-1`, `cp-2`, `cp-3`
|
|
- Workers: `wk-1`, `wk-2`, `wk-3`
|
|
|
|
## What this provides
|
|
|
|
- Shared Kubernetes/node prerequisites in `modules/k8s-common.nix`
|
|
- Shared cluster defaults in `modules/k8s-cluster-settings.nix`
|
|
- Role-specific settings for control planes and workers
|
|
- Generated per-node host configs from `flake.nix` (no duplicated host files)
|
|
- Bootstrap helper commands on each node:
|
|
- `th-kubeadm-init`
|
|
- `th-kubeadm-join-control-plane`
|
|
- `th-kubeadm-join-worker`
|
|
- `th-kubeadm-status`
|
|
- A Python bootstrap controller for orchestration:
|
|
- `bootstrap/controller.py`
|
|
|
|
## Layered architecture
|
|
|
|
- `terraform/`: VM lifecycle only
|
|
- `nixos/kubeadm/modules/`: declarative node OS config only
|
|
- `nixos/kubeadm/bootstrap/controller.py`: imperative cluster reconciliation state machine
|
|
|
|
## Hardware config files
|
|
|
|
The flake automatically imports `hosts/hardware/<host>.nix` if present.
|
|
Copy each node's generated hardware config into this folder:
|
|
|
|
```bash
|
|
sudo nixos-generate-config
|
|
sudo cp /etc/nixos/hardware-configuration.nix ./hosts/hardware/cp-1.nix
|
|
```
|
|
|
|
Repeat for each node (`cp-2`, `cp-3`, `wk-1`, `wk-2`, `wk-3`).
|
|
|
|
## Deploy approach
|
|
|
|
Start from one node at a time while experimenting:
|
|
|
|
```bash
|
|
sudo nixos-rebuild switch --flake .#cp-1
|
|
```
|
|
|
|
For remote target-host workflows, use your preferred deploy wrapper later
|
|
(`nixos-rebuild --target-host ...` or deploy-rs/colmena).
|
|
|
|
## Bootstrap runbook (kubeadm + kube-vip + Flannel)
|
|
|
|
1. Apply Nix config on all nodes (`cp-*`, then `wk-*`).
|
|
2. On `cp-1`, run:
|
|
|
|
```bash
|
|
sudo th-kubeadm-init
|
|
```
|
|
|
|
This infers the control-plane VIP as `<node-subnet>.250` on `eth0`, creates the
|
|
kube-vip static pod manifest, and runs `kubeadm init`.
|
|
|
|
3. Install Flannel from `cp-1`:
|
|
|
|
```bash
|
|
kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/v0.25.5/Documentation/kube-flannel.yml
|
|
```
|
|
|
|
4. Generate join commands on `cp-1`:
|
|
|
|
```bash
|
|
sudo kubeadm token create --print-join-command
|
|
sudo kubeadm init phase upload-certs --upload-certs
|
|
```
|
|
|
|
5. Join `cp-2` and `cp-3`:
|
|
|
|
```bash
|
|
sudo th-kubeadm-join-control-plane '<kubeadm join ... --control-plane --certificate-key ...>'
|
|
```
|
|
|
|
6. Join workers:
|
|
|
|
```bash
|
|
sudo th-kubeadm-join-worker '<kubeadm join ...>'
|
|
```
|
|
|
|
7. Validate from a control plane:
|
|
|
|
```bash
|
|
kubectl get nodes -o wide
|
|
kubectl -n kube-system get pods -o wide
|
|
```
|
|
|
|
## Fresh bootstrap flow (recommended)
|
|
|
|
1. Copy and edit inventory:
|
|
|
|
```bash
|
|
cp ./scripts/inventory.example.env ./scripts/inventory.env
|
|
$EDITOR ./scripts/inventory.env
|
|
```
|
|
|
|
2. Rebuild all nodes and bootstrap a fresh cluster:
|
|
|
|
```bash
|
|
./scripts/rebuild-and-bootstrap.sh
|
|
```
|
|
|
|
Optional tuning env vars:
|
|
|
|
```bash
|
|
FAST_MODE=1 WORKER_PARALLELISM=3 REBUILD_TIMEOUT=45m REBUILD_RETRIES=2 ./scripts/rebuild-and-bootstrap.sh
|
|
```
|
|
|
|
- `FAST_MODE=1` skips pre-rebuild remote GC cleanup to reduce wall-clock time.
|
|
- Set `FAST_MODE=0` for a slower but more aggressive space cleanup pass.
|
|
|
|
### Bootstrap controller state
|
|
|
|
The controller stores checkpoints in both places:
|
|
|
|
- Remote (source of truth): `/var/lib/terrahome/bootstrap-state.json` on `cp-1`
|
|
- Local copy (workflow/debug artifact): `nixos/kubeadm/bootstrap/bootstrap-state-last.json`
|
|
|
|
This makes retries resumable and keeps failure context visible from CI.
|
|
|
|
3. If you only want to reset Kubernetes state on existing VMs:
|
|
|
|
```bash
|
|
./scripts/reset-cluster-nodes.sh
|
|
```
|
|
|
|
For a full nuke/recreate lifecycle:
|
|
- run Terraform destroy/apply for VMs first,
|
|
- then run `./scripts/rebuild-and-bootstrap.sh` again.
|
|
|
|
Node lists now come directly from static Terraform outputs, so bootstrap no longer
|
|
depends on Proxmox guest-agent IP discovery or SSH subnet scanning.
|
|
|
|
## Optional Gitea workflow automation
|
|
|
|
Primary flow:
|
|
|
|
- Push to `master` triggers `.gitea/workflows/terraform-apply.yml`
|
|
- That workflow now does Terraform apply and then runs a fresh kubeadm bootstrap automatically
|
|
|
|
Manual dispatch workflows are available:
|
|
|
|
- `.gitea/workflows/kubeadm-bootstrap.yml`
|
|
- `.gitea/workflows/kubeadm-reset.yml`
|
|
|
|
Required repository secrets:
|
|
|
|
- Existing Terraform/backend secrets used by current workflows (`B2_*`, `PM_API_TOKEN_SECRET`, `SSH_KEY_PUBLIC`)
|
|
- SSH private key: prefer `KUBEADM_SSH_PRIVATE_KEY`, fallback to existing `SSH_KEY_PRIVATE`
|
|
|
|
Optional secrets:
|
|
|
|
- `KUBEADM_SSH_USER` (defaults to `micqdf`)
|
|
Node IPs are rendered directly from static Terraform outputs (`control_plane_vm_ipv4`, `worker_vm_ipv4`), so you do not need per-node IP secrets or SSH discovery fallbacks.
|
|
|
|
## Notes
|
|
|
|
- Scripts are intentionally manual-triggered (predictable for homelab bring-up).
|
|
- If `.250` on the node subnet is already in use, change `controlPlaneVipSuffix`
|
|
in `modules/k8s-cluster-settings.nix` before bootstrap.
|