17 Commits

Author SHA1 Message Date
5acb8370cc Merge pull request 'fix: parse terraform output JSON robustly in enroll step' (#24) from stage into master
Some checks failed
Terraform Apply / Terraform Apply (push) Failing after 16m5s
Reviewed-on: #24
2026-02-28 02:29:06 +00:00
f207f774de fix: parse terraform output JSON robustly in enroll step
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 19s
Handle setup-terraform wrapper prefixes by decoding from first JSON object before reading VM outputs.
2026-02-28 02:21:57 +00:00
1a309cbe4f Merge pull request 'feat: enroll tailscale via Proxmox guest agent by VMID' (#23) from stage into master
Some checks failed
Terraform Apply / Terraform Apply (push) Failing after 1m56s
Reviewed-on: #23
2026-02-28 02:16:58 +00:00
83d277d144 feat: enroll tailscale via Proxmox guest agent by VMID
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 19s
Replace SSH/IP-based enrollment with Proxmox API guest-agent execution using Terraform outputs, set per-VM hostnames from resource names, and reset cloned tailscale state before join for unique node identities.
2026-02-28 02:14:39 +00:00
5e1fd2e9f3 Merge pull request 'fix: make tailscale enrollment clone-safe and hostname-aware' (#22) from stage into master
All checks were successful
Terraform Apply / Terraform Apply (push) Successful in 1m54s
Reviewed-on: #22
2026-02-28 02:02:49 +00:00
3335020db5 fix: make tailscale enrollment clone-safe and hostname-aware
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 17s
Reset cloned tailscale state before first join, remove one-shot marker dependency, and allow workflow host entries in host=hostname format so nodes join with VM-aligned tailscale names.
2026-02-28 02:01:48 +00:00
9ce06671c9 Merge pull request 'fix: align VM boot disk and add Terraform safety workflows' (#21) from stage into master
All checks were successful
Terraform Apply / Terraform Apply (push) Successful in 1m59s
Reviewed-on: #21
2026-02-28 01:26:59 +00:00
a7f68c0c4b fix: tolerate extra output in destroy guard parser
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 3m34s
Parse the first JSON object from terraform show output to avoid failures when extra non-JSON lines are present.
2026-02-28 01:23:07 +00:00
d1a7ccc98c chore: serialize Terraform workflows to prevent races
Some checks failed
Terraform Plan / Terraform Plan (push) Failing after 3m34s
Add global workflow concurrency group with queueing enabled so plan/apply/destroy runs do not overlap and contend for shared remote state.
2026-02-28 01:17:51 +00:00
afe19041d9 fix: make destroy guard parse tfplan JSON robustly
Some checks failed
Terraform Plan / Terraform Plan (push) Has been cancelled
Use terraform show with no-color and resilient JSON extraction to avoid parser failures when workflow output includes non-JSON noise.
2026-02-28 01:16:19 +00:00
c9be2a2fc8 fix: align VM boot disk and add Terraform safety workflows
Some checks failed
Terraform Plan / Terraform Plan (push) Failing after 3m35s
Switch VM boot order/disks to scsi0 to match cloned NixOS template boot layout, add destroy guards to plan/apply workflows, and replace destroy workflow with a confirmed manual dispatch nuke flow that uses remote B2 state.
2026-02-28 01:10:31 +00:00
5fc58dfc98 Merge pull request 'stage' (#20) from stage into master
All checks were successful
Terraform Apply / Terraform Apply (push) Successful in 4m28s
Reviewed-on: #20
2026-02-28 01:01:31 +00:00
1c4a27bca3 Merge branch 'master' into stage
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 16s
2026-02-28 01:00:47 +00:00
47f950d667 fix: update S3 backend config for Terraform init
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 17s
Use non-deprecated s3 endpoint settings, switch to use_path_style, and trim newline characters from B2 credentials when generating backend.hcl in CI.
2026-02-28 00:56:12 +00:00
b0768db7a7 feat: store Terraform state in Backblaze B2
Some checks failed
Terraform Plan / Terraform Plan (push) Failing after 9s
Configure an s3 backend and initialize Terraform in CI with backend config from Gitea secrets so state persists across runs and apply operations stay consistent.
2026-02-28 00:52:40 +00:00
c0dd091b51 chore: align template base with live VM config
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 16s
Set NixOS stateVersion to 25.05 and include neovim in the default utility package set.
2026-02-28 00:44:08 +00:00
595df12b3e update: automate tailscale enrollment from Gitea secrets
All checks were successful
Terraform Plan / Terraform Plan (push) Successful in 16s
Add a first-boot tailscale enrollment service to the NixOS template and wire terraform-apply to inject TS auth key at runtime from secrets, so keys are not baked into templates or repo files.
2026-02-28 00:33:14 +00:00
5 changed files with 339 additions and 33 deletions

View File

@@ -5,6 +5,10 @@ on:
branches: branches:
- master - master
concurrency:
group: terraform-global
cancel-in-progress: false
jobs: jobs:
terraform: terraform:
name: "Terraform Apply" name: "Terraform Apply"
@@ -20,6 +24,21 @@ jobs:
cat > secrets.auto.tfvars << EOF cat > secrets.auto.tfvars << EOF
pm_api_token_secret = "${{ secrets.PM_API_TOKEN_SECRET }}" pm_api_token_secret = "${{ secrets.PM_API_TOKEN_SECRET }}"
EOF EOF
cat > backend.hcl << EOF
bucket = "${{ secrets.B2_TF_BUCKET }}"
key = "terraform.tfstate"
region = "us-east-005"
endpoints = {
s3 = "${{ secrets.B2_TF_ENDPOINT }}"
}
access_key = "$(printf '%s' "${{ secrets.B2_KEY_ID }}" | tr -d '\r\n')"
secret_key = "$(printf '%s' "${{ secrets.B2_APPLICATION_KEY }}" | tr -d '\r\n')"
skip_credentials_validation = true
skip_metadata_api_check = true
skip_region_validation = true
skip_requesting_account_id = true
use_path_style = true
EOF
- name: Set up Terraform - name: Set up Terraform
uses: hashicorp/setup-terraform@v2 uses: hashicorp/setup-terraform@v2
@@ -28,12 +47,176 @@ jobs:
- name: Terraform Init - name: Terraform Init
working-directory: terraform working-directory: terraform
run: terraform init run: terraform init -reconfigure -backend-config=backend.hcl
- name: Terraform Plan - name: Terraform Plan
working-directory: terraform working-directory: terraform
run: terraform plan run: terraform plan -out=tfplan
- name: Block accidental destroy
env:
ALLOW_TF_DESTROY: ${{ secrets.ALLOW_TF_DESTROY }}
working-directory: terraform
run: |
terraform show -json -no-color tfplan > tfplan.json
DESTROY_COUNT=$(python3 -c 'import json; raw=open("tfplan.json","rb").read().decode("utf-8","ignore"); start=raw.find("{"); data=json.JSONDecoder().raw_decode(raw[start:])[0]; print(sum(1 for rc in data.get("resource_changes", []) if "delete" in rc.get("change", {}).get("actions", [])))')
echo "Planned deletes: $DESTROY_COUNT"
if [ "$DESTROY_COUNT" -gt 0 ] && [ "${ALLOW_TF_DESTROY}" != "true" ]; then
echo "Destroy actions detected. Set ALLOW_TF_DESTROY=true to allow."
exit 1
fi
- name: Terraform Apply - name: Terraform Apply
working-directory: terraform working-directory: terraform
run: terraform apply -auto-approve run: terraform apply -auto-approve tfplan
- name: Enroll VMs in Tailscale
env:
TS_AUTHKEY: ${{ secrets.TS_AUTHKEY }}
PM_API_TOKEN_SECRET: ${{ secrets.PM_API_TOKEN_SECRET }}
working-directory: terraform
run: |
if [ -z "$TS_AUTHKEY" ] || [ -z "$PM_API_TOKEN_SECRET" ]; then
echo "Skipping Tailscale enrollment (missing TS_AUTHKEY or PM_API_TOKEN_SECRET)."
exit 0
fi
PM_API_URL=$(awk -F'"' '/^pm_api_url/{print $2}' terraform.tfvars)
PM_API_TOKEN_ID=$(awk -F'"' '/^pm_api_token_id/{print $2}' terraform.tfvars)
TARGET_NODE=$(awk -F'"' '/^target_node/{print $2}' terraform.tfvars)
export PM_API_URL PM_API_TOKEN_ID TARGET_NODE
terraform output -json > tfoutputs.json
cat > enroll_tailscale.py <<'PY'
import json
import os
import ssl
import sys
import time
import urllib.parse
import urllib.request
api_url = os.environ["PM_API_URL"].rstrip("/")
if api_url.endswith("/api2/json"):
api_url = api_url[: -len("/api2/json")]
token_id = os.environ["PM_API_TOKEN_ID"].strip()
token_secret = os.environ["PM_API_TOKEN_SECRET"].strip()
target_node = os.environ["TARGET_NODE"].strip()
ts_authkey = os.environ["TS_AUTHKEY"]
if not token_id or not token_secret:
raise SystemExit("Missing Proxmox token id/secret")
raw_outputs = open("tfoutputs.json", "rb").read().decode("utf-8", "ignore")
start = raw_outputs.find("{")
if start == -1:
raise SystemExit("Could not find JSON payload in terraform output")
outputs = json.JSONDecoder().raw_decode(raw_outputs[start:])[0]
targets = []
for output_name in ("alpaca_vm_ids", "llama_vm_ids"):
mapping = outputs.get(output_name, {}).get("value", {})
if isinstance(mapping, dict):
for hostname, vmid in mapping.items():
targets.append((str(hostname), int(vmid)))
if not targets:
print("No VMs found in terraform outputs; skipping tailscale enrollment")
raise SystemExit(0)
print("Tailscale enrollment targets:", ", ".join(f"{h}:{v}" for h, v in targets))
ssl_ctx = ssl._create_unverified_context()
auth_header = f"PVEAPIToken={token_id}={token_secret}"
def api_request(method, path, data=None):
url = f"{api_url}{path}"
headers = {"Authorization": auth_header}
body = None
if data is not None:
body = urllib.parse.urlencode(data, doseq=True).encode("utf-8")
headers["Content-Type"] = "application/x-www-form-urlencoded"
req = urllib.request.Request(url, data=body, headers=headers, method=method)
with urllib.request.urlopen(req, context=ssl_ctx, timeout=30) as resp:
payload = resp.read().decode("utf-8")
return json.loads(payload)
def wait_for_guest_agent(vmid, timeout_seconds=420):
deadline = time.time() + timeout_seconds
while time.time() < deadline:
try:
res = api_request("GET", f"/api2/json/nodes/{target_node}/qemu/{vmid}/agent/ping")
if res.get("data") == "pong":
return True
except Exception:
pass
time.sleep(5)
return False
def exec_guest(vmid, command):
res = api_request(
"POST",
f"/api2/json/nodes/{target_node}/qemu/{vmid}/agent/exec",
{
"command": "/run/current-system/sw/bin/sh",
"extra-args": ["-lc", command],
},
)
pid = res["data"]["pid"]
for _ in range(120):
status = api_request(
"GET",
f"/api2/json/nodes/{target_node}/qemu/{vmid}/agent/exec-status?pid={pid}",
).get("data", {})
if status.get("exited"):
return (
int(status.get("exitcode", 1)),
status.get("out-data", ""),
status.get("err-data", ""),
)
time.sleep(2)
return (124, "", "Timed out waiting for guest command")
failures = []
safe_key = ts_authkey.replace("'", "'\"'\"'")
for hostname, vmid in targets:
print(f"\n== Enrolling {hostname} (vmid {vmid}) ==")
if not wait_for_guest_agent(vmid):
failures.append(f"{hostname}: guest agent not ready")
print(f"ERROR: guest agent not ready for vmid {vmid}")
continue
safe_hostname = hostname.replace("'", "'\"'\"'")
cmd = (
"set -e; "
f"printf '%s' '{safe_key}' > /etc/tailscale/authkey; "
f"printf '%s' '{safe_hostname}' > /etc/tailscale/hostname; "
"chmod 600 /etc/tailscale/authkey; "
f"hostnamectl set-hostname '{safe_hostname}' || true; "
"systemctl restart tailscaled; "
"systemctl start tailscale-firstboot.service; "
"tailscale status || true"
)
exitcode, stdout, stderr = exec_guest(vmid, cmd)
if stdout:
print(stdout)
if stderr:
print(stderr, file=sys.stderr)
if exitcode != 0:
failures.append(f"{hostname}: command failed exit {exitcode}")
print(f"ERROR: tailscale enrollment failed for {hostname} (exit {exitcode})")
if failures:
print("\nEnrollment failures:")
for failure in failures:
print(f"- {failure}")
raise SystemExit(1)
print("\nTailscale enrollment completed for all managed VMs")
PY
python3 enroll_tailscale.py

View File

@@ -1,28 +1,65 @@
name: Gitea Destroy Terraform name: Terraform Destroy
run-name: ${{ gitea.actor }} triggered a Terraform Destroy 🧨 run-name: ${{ gitea.actor }} requested Terraform destroy
on: on:
workflow_dispatch: # Manual trigger workflow_dispatch:
inputs:
confirm:
description: "Type NUKE to confirm destroy"
required: true
type: string
target:
description: "Destroy scope"
required: true
default: all
type: choice
options:
- all
- alpacas
- llamas
concurrency:
group: terraform-global
cancel-in-progress: false
jobs: jobs:
destroy: destroy:
name: "Terraform Destroy" name: "Terraform Destroy"
runs-on: ubuntu-latest runs-on: ubuntu-latest
permissions:
contents: read
pull-requests: write
env:
TF_VAR_SSH_KEY: ${{ secrets.TF_VAR_SSH_KEY_PUBLIC }}
TF_VAR_TS_AUTHKEY: ${{ secrets.TF_VAR_TS_AUTHKEY }}
TF_VAR_PROXMOX_PASSWORD: ${{ secrets.TF_VAR_PROXMOX_PASSWORD }}
steps: steps:
- name: Validate confirmation phrase
run: |
if [ "${{ inputs.confirm }}" != "NUKE" ]; then
echo "Confirmation failed. You must type NUKE."
exit 1
fi
- name: Checkout repository - name: Checkout repository
uses: actions/checkout@v4 uses: actions/checkout@v4
- name: Create Terraform secret files
working-directory: terraform
run: |
cat > secrets.auto.tfvars << EOF
pm_api_token_secret = "${{ secrets.PM_API_TOKEN_SECRET }}"
EOF
cat > backend.hcl << EOF
bucket = "${{ secrets.B2_TF_BUCKET }}"
key = "terraform.tfstate"
region = "us-east-005"
endpoints = {
s3 = "${{ secrets.B2_TF_ENDPOINT }}"
}
access_key = "$(printf '%s' "${{ secrets.B2_KEY_ID }}" | tr -d '\r\n')"
secret_key = "$(printf '%s' "${{ secrets.B2_APPLICATION_KEY }}" | tr -d '\r\n')"
skip_credentials_validation = true
skip_metadata_api_check = true
skip_region_validation = true
skip_requesting_account_id = true
use_path_style = true
EOF
- name: Set up Terraform - name: Set up Terraform
uses: hashicorp/setup-terraform@v2 uses: hashicorp/setup-terraform@v2
with: with:
@@ -30,9 +67,27 @@ jobs:
- name: Terraform Init - name: Terraform Init
working-directory: terraform working-directory: terraform
run: terraform init run: terraform init -reconfigure -backend-config=backend.hcl
- name: Terraform Destroy - name: Terraform Destroy Plan
working-directory: terraform working-directory: terraform
run: terraform destroy -auto-approve run: |
case "${{ inputs.target }}" in
all)
terraform plan -destroy -out=tfdestroy
;;
alpacas)
terraform plan -destroy -target=proxmox_vm_qemu.alpacas -out=tfdestroy
;;
llamas)
terraform plan -destroy -target=proxmox_vm_qemu.llamas -out=tfdestroy
;;
*)
echo "Invalid destroy target: ${{ inputs.target }}"
exit 1
;;
esac
- name: Terraform Destroy Apply
working-directory: terraform
run: terraform apply -auto-approve tfdestroy

View File

@@ -6,6 +6,10 @@ on:
- stage - stage
- test - test
concurrency:
group: terraform-global
cancel-in-progress: false
jobs: jobs:
terraform: terraform:
name: "Terraform Plan" name: "Terraform Plan"
@@ -22,6 +26,21 @@ jobs:
cat > secrets.auto.tfvars << EOF cat > secrets.auto.tfvars << EOF
pm_api_token_secret = "${{ secrets.PM_API_TOKEN_SECRET }}" pm_api_token_secret = "${{ secrets.PM_API_TOKEN_SECRET }}"
EOF EOF
cat > backend.hcl << EOF
bucket = "${{ secrets.B2_TF_BUCKET }}"
key = "terraform.tfstate"
region = "us-east-005"
endpoints = {
s3 = "${{ secrets.B2_TF_ENDPOINT }}"
}
access_key = "$(printf '%s' "${{ secrets.B2_KEY_ID }}" | tr -d '\r\n')"
secret_key = "$(printf '%s' "${{ secrets.B2_APPLICATION_KEY }}" | tr -d '\r\n')"
skip_credentials_validation = true
skip_metadata_api_check = true
skip_region_validation = true
skip_requesting_account_id = true
use_path_style = true
EOF
echo "Created secrets.auto.tfvars:" echo "Created secrets.auto.tfvars:"
cat secrets.auto.tfvars | sed 's/=.*/=***/' cat secrets.auto.tfvars | sed 's/=.*/=***/'
echo "Using token ID from terraform.tfvars:" echo "Using token ID from terraform.tfvars:"
@@ -34,7 +53,7 @@ jobs:
- name: Terraform Init - name: Terraform Init
working-directory: terraform working-directory: terraform
run: terraform init run: terraform init -reconfigure -backend-config=backend.hcl
- name: Terraform Format Check - name: Terraform Format Check
working-directory: terraform working-directory: terraform
@@ -48,6 +67,19 @@ jobs:
working-directory: terraform working-directory: terraform
run: terraform plan -out=tfplan run: terraform plan -out=tfplan
- name: Block accidental destroy
env:
ALLOW_TF_DESTROY: ${{ secrets.ALLOW_TF_DESTROY }}
working-directory: terraform
run: |
terraform show -json -no-color tfplan > tfplan.json
DESTROY_COUNT=$(python3 -c 'import json; raw=open("tfplan.json","rb").read().decode("utf-8","ignore"); start=raw.find("{"); data=json.JSONDecoder().raw_decode(raw[start:])[0]; print(sum(1 for rc in data.get("resource_changes", []) if "delete" in rc.get("change", {}).get("actions", [])))')
echo "Planned deletes: $DESTROY_COUNT"
if [ "$DESTROY_COUNT" -gt 0 ] && [ "${ALLOW_TF_DESTROY}" != "true" ]; then
echo "Destroy actions detected. Set ALLOW_TF_DESTROY=true to allow."
exit 1
fi
- name: Upload Terraform Plan - name: Upload Terraform Plan
uses: actions/upload-artifact@v3 uses: actions/upload-artifact@v3
with: with:

View File

@@ -39,6 +39,34 @@
security.sudo.wheelNeedsPassword = false; security.sudo.wheelNeedsPassword = false;
systemd.services.tailscale-firstboot = {
description = "One-time Tailscale enrollment";
after = [ "network-online.target" "tailscaled.service" ];
wants = [ "network-online.target" "tailscaled.service" ];
wantedBy = [ "multi-user.target" ];
serviceConfig = {
Type = "oneshot";
RemainAfterExit = true;
};
script = ''
if [ ! -s /etc/tailscale/authkey ]; then
exit 0
fi
key="$(cat /etc/tailscale/authkey)"
ts_hostname=""
if [ -s /etc/tailscale/hostname ]; then
ts_hostname="--hostname=$(cat /etc/tailscale/hostname)"
fi
rm -f /var/lib/tailscale/tailscaled.state
${pkgs.tailscale}/bin/tailscale up --reset --auth-key="$key" $ts_hostname
rm -f /etc/tailscale/authkey
rm -f /etc/tailscale/hostname
'';
};
environment.systemPackages = with pkgs; [ environment.systemPackages = with pkgs; [
btop btop
curl curl
@@ -50,11 +78,13 @@
htop htop
jq jq
ripgrep ripgrep
tailscale
tree tree
unzip unzip
vim vim
neovim
wget wget
]; ];
system.stateVersion = "24.11"; system.stateVersion = "25.05";
} }

View File

@@ -1,4 +1,6 @@
terraform { terraform {
backend "s3" {}
required_providers { required_providers {
proxmox = { proxmox = {
source = "Telmate/proxmox" source = "Telmate/proxmox"
@@ -24,19 +26,21 @@ resource "proxmox_vm_qemu" "alpacas" {
os_type = "cloud-init" os_type = "cloud-init"
agent = 1 agent = 1
cpu {
sockets = var.sockets sockets = var.sockets
cores = var.cores cores = var.cores
}
memory = var.memory memory = var.memory
scsihw = "virtio-scsi-pci" scsihw = "virtio-scsi-pci"
boot = "order=virtio0" boot = "order=scsi0"
bootdisk = "virtio0" bootdisk = "scsi0"
ipconfig0 = "ip=dhcp" ipconfig0 = "ip=dhcp"
cicustom = "user=local:snippets/cloud_init_global.yaml" cicustom = "user=local:snippets/cloud_init_global.yaml"
disks { disks {
virtio { scsi {
virtio0 { scsi0 {
disk { disk {
size = var.disk_size size = var.disk_size
storage = var.storage storage = var.storage
@@ -71,18 +75,20 @@ resource "proxmox_vm_qemu" "llamas" {
os_type = "cloud-init" os_type = "cloud-init"
agent = 1 agent = 1
cpu {
sockets = var.sockets sockets = var.sockets
cores = var.cores cores = var.cores
}
memory = var.memory memory = var.memory
scsihw = "virtio-scsi-pci" scsihw = "virtio-scsi-pci"
boot = "order=virtio0" boot = "order=scsi0"
bootdisk = "virtio0" bootdisk = "scsi0"
ipconfig0 = "ip=dhcp" ipconfig0 = "ip=dhcp"
cicustom = "user=local:snippets/cloud_init_global.yaml" cicustom = "user=local:snippets/cloud_init_global.yaml"
disks { disks {
virtio { scsi {
virtio0 { scsi0 {
disk { disk {
size = var.disk_size size = var.disk_size
storage = var.storage storage = var.storage