cleanup: Remove obsolete port-forwarding, deferred Traefik files, and CI workaround
- Remove ansible/roles/private-access/ (replaced by Tailscale LB services) - Remove deferred observability ingress/traefik files (replaced by direct Tailscale LBs) - Remove orphaned kustomization-traefik-config.yaml (no backing directory) - Simplify CI: remove SA patch + job deletion workaround for rancher-backup (now handled by postRenderer in HelmRelease) - Update AGENTS.md to reflect current architecture
This commit is contained in:
@@ -304,11 +304,8 @@ jobs:
|
|||||||
kubectl -n flux-system wait --for=condition=Ready kustomization/addon-ccm --timeout=600s
|
kubectl -n flux-system wait --for=condition=Ready kustomization/addon-ccm --timeout=600s
|
||||||
kubectl -n flux-system wait --for=condition=Ready kustomization/addon-csi --timeout=600s
|
kubectl -n flux-system wait --for=condition=Ready kustomization/addon-csi --timeout=600s
|
||||||
kubectl -n flux-system wait --for=condition=Ready kustomization/addon-tailscale-operator --timeout=300s
|
kubectl -n flux-system wait --for=condition=Ready kustomization/addon-tailscale-operator --timeout=300s
|
||||||
# Observability stack deferred - complex helm release timing out, debug separately
|
|
||||||
# kubectl -n flux-system wait --for=condition=Ready kustomization/addon-observability --timeout=300s
|
|
||||||
# kubectl -n flux-system wait --for=condition=Ready kustomization/addon-observability-content --timeout=300s
|
|
||||||
|
|
||||||
- name: Wait for Rancher and fix backup operator
|
- name: Wait for Rancher and backup operator
|
||||||
env:
|
env:
|
||||||
KUBECONFIG: outputs/kubeconfig
|
KUBECONFIG: outputs/kubeconfig
|
||||||
run: |
|
run: |
|
||||||
@@ -320,15 +317,6 @@ jobs:
|
|||||||
echo "Waiting for rancher-backup operator..."
|
echo "Waiting for rancher-backup operator..."
|
||||||
kubectl -n flux-system wait --for=condition=Ready kustomization/addon-rancher-backup --timeout=600s || true
|
kubectl -n flux-system wait --for=condition=Ready kustomization/addon-rancher-backup --timeout=600s || true
|
||||||
|
|
||||||
echo "Patching default SA in cattle-resources-system..."
|
|
||||||
kubectl patch serviceaccount default -n cattle-resources-system -p '{"automountServiceAccountToken": false}' || true
|
|
||||||
|
|
||||||
echo "Cleaning up failed patch-sa jobs..."
|
|
||||||
kubectl delete job -n cattle-resources-system rancher-backup-patch-sa --ignore-not-found=true || true
|
|
||||||
|
|
||||||
echo "Force reconciling rancher-backup HelmRelease..."
|
|
||||||
flux reconcile helmrelease rancher-backup -n flux-system --timeout=5m || true
|
|
||||||
|
|
||||||
- name: Restore Rancher from latest B2 backup
|
- name: Restore Rancher from latest B2 backup
|
||||||
env:
|
env:
|
||||||
KUBECONFIG: outputs/kubeconfig
|
KUBECONFIG: outputs/kubeconfig
|
||||||
|
|||||||
@@ -18,18 +18,19 @@ Repository guide for agentic contributors working in this repo.
|
|||||||
- **cert-manager** is required — Tailscale LoadBalancer does L4 TCP passthrough, so Rancher serves its own TLS.
|
- **cert-manager** is required — Tailscale LoadBalancer does L4 TCP passthrough, so Rancher serves its own TLS.
|
||||||
- **Secrets flow**: Doppler → `ClusterSecretStore` (doppler-hetznerterra) → `ExternalSecret` resources → k8s Secrets.
|
- **Secrets flow**: Doppler → `ClusterSecretStore` (doppler-hetznerterra) → `ExternalSecret` resources → k8s Secrets.
|
||||||
- Rancher is reachable only over Tailscale at `https://rancher.silverside-gopher.ts.net/`.
|
- Rancher is reachable only over Tailscale at `https://rancher.silverside-gopher.ts.net/`.
|
||||||
|
- Grafana, Prometheus, and Flux UI are also exposed via dedicated Tailscale LoadBalancer services at `http://grafana.silverside-gopher.ts.net/`, `http://prometheus.silverside-gopher.ts.net/`, `http://flux.silverside-gopher.ts.net:9001/`.
|
||||||
|
|
||||||
## Important Files
|
## Important Files
|
||||||
|
|
||||||
- `terraform/main.tf` — provider and version pins
|
- `terraform/main.tf` — provider and version pins
|
||||||
- `terraform/variables.tf` — input surface and defaults
|
- `terraform/variables.tf` — input surface and defaults
|
||||||
- `terraform/firewall.tf` — firewall rules (tailnet CIDR, internal cluster ports)
|
- `terraform/firewall.tf` — firewall rules (tailnet CIDR, internal cluster ports)
|
||||||
- `ansible/site.yml` — ordered bootstrap playbook (roles: common → k3s-server → ccm → k3s-agent → private-access → doppler → tailscale-cleanup)
|
- `ansible/site.yml` — ordered bootstrap playbook (roles: common → k3s-server → ccm → k3s-agent → doppler → tailscale-cleanup)
|
||||||
- `ansible/generate_inventory.py` — renders `ansible/inventory.ini` from Terraform outputs via Jinja2
|
- `ansible/generate_inventory.py` — renders `ansible/inventory.ini` from Terraform outputs via Jinja2
|
||||||
- `clusters/prod/flux-system/` — Flux GitRepository and top-level Kustomization resources
|
- `clusters/prod/flux-system/` — Flux GitRepository and top-level Kustomization resources
|
||||||
- `infrastructure/addons/kustomization.yaml` — root addon graph with dependency ordering
|
- `infrastructure/addons/kustomization.yaml` — root addon graph with dependency ordering
|
||||||
- `infrastructure/addons/<addon>/` — each addon is a self-contained dir with its own `kustomization.yaml`
|
- `infrastructure/addons/<addon>/` — each addon is a self-contained dir with its own `kustomization.yaml`
|
||||||
- `.gitea/workflows/deploy.yml` — canonical CI: terraform → ansible → flux bootstrap → rancher fix → B2 restore
|
- `.gitea/workflows/deploy.yml` — canonical CI: terraform → ansible → flux bootstrap → B2 restore → health checks
|
||||||
|
|
||||||
## Build / Validate / Test
|
## Build / Validate / Test
|
||||||
|
|
||||||
@@ -109,7 +110,7 @@ Repository guide for agentic contributors working in this repo.
|
|||||||
|
|
||||||
## Known Issues & Workarounds
|
## Known Issues & Workarounds
|
||||||
|
|
||||||
- **rancher-backup post-install job** (`rancher-backup-patch-sa`) fails because `rancher/kuberlr-kubectl` can't download kubectl. CI patches the SA and deletes the failed job. Do NOT set `s3` block in HelmRelease values — put S3 config in the Backup CR instead.
|
- **rancher-backup post-install job** (`rancher-backup-patch-sa`) uses a postRenderer in the HelmRelease to replace the broken `rancher/kuberlr-kubectl` image with `rancher/kubectl`. Do NOT set `s3` block in HelmRelease values — put S3 config in the Backup CR instead.
|
||||||
- **B2 ExternalSecret** must use key names `accessKey` and `secretKey` (not `aws_access_key_id`/`aws_secret_access_key`).
|
- **B2 ExternalSecret** must use key names `accessKey` and `secretKey` (not `aws_access_key_id`/`aws_secret_access_key`).
|
||||||
- **Stale Tailscale devices**: After cluster rebuild, delete stale offline `rancher` devices before booting. The `tailscale-cleanup` Ansible role handles this via the Tailscale API.
|
- **Stale Tailscale devices**: After cluster rebuild, delete stale offline `rancher` devices before booting. The `tailscale-cleanup` Ansible role handles this via the Tailscale API.
|
||||||
- **Restricted B2 keys**: `b2_authorize_account` may return `allowed.bucketId: null`. CI falls back to `b2_list_buckets` to resolve bucket ID by name.
|
- **Restricted B2 keys**: `b2_authorize_account` may return `allowed.bucketId: null`. CI falls back to `b2_list_buckets` to resolve bucket ID by name.
|
||||||
@@ -125,7 +126,7 @@ Repository guide for agentic contributors working in this repo.
|
|||||||
1. Terraform: fmt check → init → validate → import existing servers → plan → apply (main only)
|
1. Terraform: fmt check → init → validate → import existing servers → plan → apply (main only)
|
||||||
2. Ansible: install deps → generate inventory → run site.yml with extra vars (secrets injected from Gitea)
|
2. Ansible: install deps → generate inventory → run site.yml with extra vars (secrets injected from Gitea)
|
||||||
3. Flux bootstrap: install kubectl/flux → rewrite kubeconfig → apply CRDs → apply graph → wait for addons
|
3. Flux bootstrap: install kubectl/flux → rewrite kubeconfig → apply CRDs → apply graph → wait for addons
|
||||||
4. Rancher post-install: wait for Rancher/backup operator → patch SA → clean failed jobs → force reconcile
|
4. Rancher wait: wait for Rancher and backup operator to be ready
|
||||||
5. B2 restore: authorize B2 → find latest backup → create Restore CR → poll until ready
|
5. B2 restore: authorize B2 → find latest backup → create Restore CR → poll until ready
|
||||||
6. Health checks: nodes, Flux objects, pods, storage class
|
6. Health checks: nodes, Flux objects, pods, storage class
|
||||||
|
|
||||||
|
|||||||
@@ -1,86 +0,0 @@
|
|||||||
---
|
|
||||||
- name: Create systemd unit for Grafana private access
|
|
||||||
template:
|
|
||||||
src: kubectl-port-forward.service.j2
|
|
||||||
dest: /etc/systemd/system/k8s-portforward-grafana.service
|
|
||||||
mode: "0644"
|
|
||||||
vars:
|
|
||||||
unit_description: Port-forward Grafana for Tailscale access
|
|
||||||
unit_namespace: observability
|
|
||||||
unit_target: svc/observability-kube-prometheus-stack-grafana
|
|
||||||
unit_local_port: 13080
|
|
||||||
unit_remote_port: 80
|
|
||||||
|
|
||||||
- name: Create systemd unit for Prometheus private access
|
|
||||||
template:
|
|
||||||
src: kubectl-port-forward.service.j2
|
|
||||||
dest: /etc/systemd/system/k8s-portforward-prometheus.service
|
|
||||||
mode: "0644"
|
|
||||||
vars:
|
|
||||||
unit_description: Port-forward Prometheus for Tailscale access
|
|
||||||
unit_namespace: observability
|
|
||||||
unit_target: svc/observability-kube-prometh-prometheus
|
|
||||||
unit_local_port: 19090
|
|
||||||
unit_remote_port: 9090
|
|
||||||
|
|
||||||
- name: Create systemd unit for Flux UI private access
|
|
||||||
template:
|
|
||||||
src: kubectl-port-forward.service.j2
|
|
||||||
dest: /etc/systemd/system/k8s-portforward-flux-ui.service
|
|
||||||
mode: "0644"
|
|
||||||
vars:
|
|
||||||
unit_description: Port-forward Flux UI for Tailscale access
|
|
||||||
unit_namespace: flux-system
|
|
||||||
unit_target: svc/flux-system-weave-gitops
|
|
||||||
unit_local_port: 19001
|
|
||||||
unit_remote_port: 9001
|
|
||||||
|
|
||||||
- name: Create systemd unit for Rancher HTTP private access
|
|
||||||
template:
|
|
||||||
src: kubectl-port-forward.service.j2
|
|
||||||
dest: /etc/systemd/system/k8s-portforward-rancher.service
|
|
||||||
mode: "0644"
|
|
||||||
vars:
|
|
||||||
unit_description: Port-forward Rancher HTTP for Tailscale access
|
|
||||||
unit_namespace: cattle-system
|
|
||||||
unit_target: svc/cattle-system-rancher
|
|
||||||
unit_local_port: 19442
|
|
||||||
unit_remote_port: 80
|
|
||||||
|
|
||||||
- name: Create systemd unit for Rancher HTTPS private access
|
|
||||||
template:
|
|
||||||
src: kubectl-port-forward.service.j2
|
|
||||||
dest: /etc/systemd/system/k8s-portforward-rancher-https.service
|
|
||||||
mode: "0644"
|
|
||||||
vars:
|
|
||||||
unit_description: Port-forward Rancher HTTPS for Tailscale access
|
|
||||||
unit_namespace: cattle-system
|
|
||||||
unit_target: svc/cattle-system-rancher
|
|
||||||
unit_local_port: 19443
|
|
||||||
unit_remote_port: 443
|
|
||||||
|
|
||||||
- name: Reload systemd
|
|
||||||
systemd:
|
|
||||||
daemon_reload: true
|
|
||||||
|
|
||||||
- name: Enable and start private access port-forward services
|
|
||||||
systemd:
|
|
||||||
name: "{{ item }}"
|
|
||||||
enabled: true
|
|
||||||
state: started
|
|
||||||
loop:
|
|
||||||
- k8s-portforward-grafana.service
|
|
||||||
- k8s-portforward-prometheus.service
|
|
||||||
- k8s-portforward-flux-ui.service
|
|
||||||
- k8s-portforward-rancher.service
|
|
||||||
- k8s-portforward-rancher-https.service
|
|
||||||
|
|
||||||
- name: Configure Tailscale Serve for private access endpoints
|
|
||||||
shell: >-
|
|
||||||
tailscale serve reset &&
|
|
||||||
tailscale serve --bg --tcp={{ private_access_grafana_port }} tcp://127.0.0.1:13080 &&
|
|
||||||
tailscale serve --bg --tcp={{ private_access_prometheus_port }} tcp://127.0.0.1:19090 &&
|
|
||||||
tailscale serve --bg --tcp={{ private_access_flux_port }} tcp://127.0.0.1:19001 &&
|
|
||||||
tailscale serve --bg --tcp={{ private_access_rancher_port }} tcp://127.0.0.1:19442 &&
|
|
||||||
tailscale serve --bg --tcp=9443 tcp://127.0.0.1:19443
|
|
||||||
changed_when: true
|
|
||||||
@@ -1,13 +0,0 @@
|
|||||||
[Unit]
|
|
||||||
Description={{ unit_description }}
|
|
||||||
After=network-online.target k3s.service
|
|
||||||
Wants=network-online.target
|
|
||||||
|
|
||||||
[Service]
|
|
||||||
Type=simple
|
|
||||||
Restart=always
|
|
||||||
RestartSec=5
|
|
||||||
ExecStart=/usr/local/bin/kubectl -n {{ unit_namespace }} port-forward --address 127.0.0.1 {{ unit_target }} {{ unit_local_port }}:{{ unit_remote_port }}
|
|
||||||
|
|
||||||
[Install]
|
|
||||||
WantedBy=multi-user.target
|
|
||||||
@@ -1,18 +0,0 @@
|
|||||||
apiVersion: kustomize.toolkit.fluxcd.io/v1
|
|
||||||
kind: Kustomization
|
|
||||||
metadata:
|
|
||||||
name: addon-traefik-config
|
|
||||||
namespace: flux-system
|
|
||||||
spec:
|
|
||||||
interval: 10m
|
|
||||||
prune: true
|
|
||||||
sourceRef:
|
|
||||||
kind: GitRepository
|
|
||||||
name: platform
|
|
||||||
path: ./infrastructure/addons/traefik-config
|
|
||||||
wait: true
|
|
||||||
timeout: 5m
|
|
||||||
suspend: false
|
|
||||||
dependsOn:
|
|
||||||
- name: addon-tailscale-operator
|
|
||||||
- name: addon-tailscale-proxyclass
|
|
||||||
@@ -1,17 +0,0 @@
|
|||||||
apiVersion: networking.k8s.io/v1
|
|
||||||
kind: Ingress
|
|
||||||
metadata:
|
|
||||||
name: grafana
|
|
||||||
namespace: observability
|
|
||||||
spec:
|
|
||||||
ingressClassName: traefik
|
|
||||||
rules:
|
|
||||||
- http:
|
|
||||||
paths:
|
|
||||||
- path: /grafana
|
|
||||||
pathType: Prefix
|
|
||||||
backend:
|
|
||||||
service:
|
|
||||||
name: observability-kube-prometheus-stack-grafana
|
|
||||||
port:
|
|
||||||
number: 80
|
|
||||||
@@ -1,17 +0,0 @@
|
|||||||
apiVersion: networking.k8s.io/v1
|
|
||||||
kind: Ingress
|
|
||||||
metadata:
|
|
||||||
name: prometheus
|
|
||||||
namespace: observability
|
|
||||||
spec:
|
|
||||||
ingressClassName: traefik
|
|
||||||
rules:
|
|
||||||
- http:
|
|
||||||
paths:
|
|
||||||
- path: /prometheus
|
|
||||||
pathType: Prefix
|
|
||||||
backend:
|
|
||||||
service:
|
|
||||||
name: observability-kube-prometh-prometheus
|
|
||||||
port:
|
|
||||||
number: 9090
|
|
||||||
@@ -1,27 +0,0 @@
|
|||||||
apiVersion: v1
|
|
||||||
kind: Service
|
|
||||||
metadata:
|
|
||||||
name: traefik-tailscale
|
|
||||||
namespace: kube-system
|
|
||||||
annotations:
|
|
||||||
tailscale.com/hostname: observability
|
|
||||||
tailscale.com/proxy-class: infra-stable
|
|
||||||
spec:
|
|
||||||
type: LoadBalancer
|
|
||||||
loadBalancerClass: tailscale
|
|
||||||
selector:
|
|
||||||
app.kubernetes.io/instance: traefik-kube-system
|
|
||||||
app.kubernetes.io/name: traefik
|
|
||||||
ports:
|
|
||||||
- name: web
|
|
||||||
port: 80
|
|
||||||
protocol: TCP
|
|
||||||
targetPort: web
|
|
||||||
- name: websecure
|
|
||||||
port: 443
|
|
||||||
protocol: TCP
|
|
||||||
targetPort: websecure
|
|
||||||
- name: flux
|
|
||||||
port: 9001
|
|
||||||
protocol: TCP
|
|
||||||
targetPort: 9001
|
|
||||||
Reference in New Issue
Block a user