4965017b86
Fix Load Balancer network attachment
...
Deploy Cluster / Terraform (push) Successful in 54s
Deploy Cluster / Ansible (push) Failing after 3m44s
Add hcloud_load_balancer_network resource to attach LB to private network.
This is required before targets can use use_private_ip=true.
LB gets IP 10.0.1.5 on the private network.
2026-03-23 02:44:35 +00:00
b2b9c38b91
Fix Load Balancer output attribute - use ipv4 instead of ipv4_address
Deploy Cluster / Terraform (push) Failing after 1m37s
Deploy Cluster / Ansible (push) Has been skipped
2026-03-23 02:40:50 +00:00
ff31cb4e74
Implement HA control plane with Load Balancer (3-3 topology)
...
Deploy Cluster / Terraform (push) Failing after 10s
Deploy Cluster / Ansible (push) Has been skipped
Major changes:
- Terraform: Scale to 3 control planes (cx23) + 3 workers (cx33)
- Terraform: Add Hetzner Load Balancer (lb11) for Kubernetes API
- Terraform: Add kube_api_lb_ip output
- Ansible: Add community.network collection to requirements
- Ansible: Update inventory to include LB endpoint
- Ansible: Configure secondary CPs and workers to join via LB
- Ansible: Add k3s_join_endpoint variable for HA joins
- Workflow: Add imports for cp-2, cp-3, and worker-3
- Docs: Update STABLE_BASELINE.md with HA topology and phase gates
Topology:
- 3 control planes (cx23 - 2 vCPU, 8GB RAM each)
- 3 workers (cx33 - 4 vCPU, 16GB RAM each)
- 1 Load Balancer (lb11) routing to all 3 control planes on port 6443
- Workers and secondary CPs join via LB endpoint for HA
Cost impact: +~€26/month (2 extra CPs + 1 extra worker + LB)
2026-03-23 02:39:39 +00:00
8b4a445b37
Update STABLE_BASELINE.md - CCM/CSI integration achieved
...
Deploy Cluster / Terraform (push) Successful in 31s
Deploy Cluster / Ansible (push) Successful in 3m36s
Document the successful completion of Hetzner CCM and CSI integration:
- CCM deployed via Ansible before workers join (fixes uninitialized taint)
- CSI provides hcloud-volumes StorageClass for persistent storage
- Two consecutive rebuilds passed all phase gates
- PVC provisioning tested and working
Platform now has full cloud provider integration with persistent volumes.
2026-03-23 02:25:00 +00:00
e447795395
Install helm binary in ccm-deploy role before using it
...
Deploy Cluster / Terraform (push) Successful in 2m1s
Deploy Cluster / Ansible (push) Successful in 6m35s
The kubernetes.core.helm module requires helm CLI to be installed on
the target node. Added check and install step using the official
helm install script.
2026-03-23 00:07:39 +00:00
31b82c9371
Deploy CCM via Ansible before workers join to fix external cloud provider
...
Deploy Cluster / Terraform (push) Successful in 31s
Deploy Cluster / Ansible (push) Failing after 1m48s
This fixes the chicken-and-egg problem where workers with
--kubelet-arg=cloud-provider=external couldn't join because CCM wasn't
running yet to remove the node.cloudprovider.kubernetes.io/uninitialized taint.
Changes:
- Create ansible/roles/ccm-deploy/ to deploy CCM via Helm during Ansible phase
- Reorder site.yml: CCM deploys after secrets but before workers join
- CCM runs on control_plane[0] with proper tolerations for control plane nodes
- Add 10s pause after CCM ready to ensure it can process new nodes
- Workers can now successfully join with external cloud provider enabled
Flux still manages CCM for updates, but initial install happens in Ansible.
2026-03-22 23:58:03 +00:00
cadfedacf1
Fix providerID health check - use shell module for piped grep
Deploy Cluster / Terraform (push) Successful in 1m47s
Deploy Cluster / Ansible (push) Failing after 18m4s
2026-03-22 22:55:55 +00:00
561cd67b0c
Enable Hetzner CCM and CSI for cloud provider integration
...
Deploy Cluster / Terraform (push) Successful in 30s
Deploy Cluster / Ansible (push) Failing after 3m21s
- Enable --kubelet-arg=cloud-provider=external on all nodes (control planes and workers)
- Activate CCM Kustomization with 10m timeout for Hetzner cloud-controller-manager
- Activate CSI Kustomization with dependsOn CCM and 10m timeout for hcloud-csi
- Update deploy workflow to wait for CCM/CSI readiness (600s timeout)
- Add providerID verification to post-deploy health checks
This enables proper cloud provider integration with Hetzner CCM for node
labeling and Hetzner CSI for persistent volume provisioning.
2026-03-22 22:26:21 +00:00
4eebbca648
docs: update README for deferred observability baseline
Deploy Cluster / Terraform (push) Successful in 1m41s
Deploy Cluster / Ansible (push) Successful in 5m37s
2026-03-22 01:04:53 +00:00
7b5d794dfc
fix: update health checks for deferred observability
Deploy Cluster / Ansible (push) Has been cancelled
Deploy Cluster / Terraform (push) Has been cancelled
2026-03-22 01:04:27 +00:00
8643bbfc12
fix: defer observability to get clean baseline
Deploy Cluster / Ansible (push) Has been cancelled
Deploy Cluster / Terraform (push) Has been cancelled
2026-03-22 01:03:55 +00:00
84f446c2e6
fix: restore observability timeouts to 5 minutes
Deploy Cluster / Terraform (push) Successful in 32s
Deploy Cluster / Ansible (push) Failing after 8m38s
2026-03-22 00:43:37 +00:00
d446e86ece
fix: use static grafana password, remove externalsecret dependency
Deploy Cluster / Ansible (push) Has been cancelled
Deploy Cluster / Terraform (push) Has been cancelled
2026-03-22 00:43:21 +00:00
90c7f565e0
fix: remove tailscale ingress dependencies from observability
Deploy Cluster / Terraform (push) Successful in 39s
Deploy Cluster / Ansible (push) Has been cancelled
2026-03-22 00:42:35 +00:00
989848fa89
fix: increase observability timeouts to 10 minutes
Deploy Cluster / Terraform (push) Successful in 2m1s
Deploy Cluster / Ansible (push) Failing after 13m54s
2026-03-21 19:34:43 +00:00
56e5807474
fix: create doppler ClusterSecretStore after ESO is installed
Deploy Cluster / Terraform (push) Successful in 47s
Deploy Cluster / Ansible (push) Failing after 8m31s
2026-03-21 19:19:43 +00:00
df0511148c
fix: unsuspend tailscale operator for stable baseline
Deploy Cluster / Terraform (push) Successful in 41s
Deploy Cluster / Ansible (push) Failing after 8m44s
2026-03-21 19:03:39 +00:00
894e6275b1
docs: update stable baseline to defer ccm/csi
Deploy Cluster / Terraform (push) Successful in 28s
Deploy Cluster / Ansible (push) Failing after 8m35s
2026-03-21 18:41:36 +00:00
a01cf435d4
fix: skip ccm/csi waits for stable baseline - using k3s embedded
Deploy Cluster / Terraform (push) Successful in 37s
Deploy Cluster / Ansible (push) Has been cancelled
2026-03-21 18:40:53 +00:00
84f77c4a68
fix: use kubectl patch instead of apply for flux controller nodeSelector
Deploy Cluster / Terraform (push) Successful in 38s
Deploy Cluster / Ansible (push) Failing after 9m41s
2026-03-21 18:05:41 +00:00
2e4196688c
fix: bootstrap flux in phases - crds first, then resources
Deploy Cluster / Terraform (push) Successful in 38s
Deploy Cluster / Ansible (push) Failing after 3m19s
2026-03-21 17:42:39 +00:00
8d1f9f4944
fix: add k3s reset logic for primary control plane
Deploy Cluster / Terraform (push) Successful in 39s
Deploy Cluster / Ansible (push) Failing after 4m19s
2026-03-21 16:10:17 +00:00
d4fd43e2f5
refactor: simplify k3s-server bootstrap for
2026-03-21 15:48:33 +00:00
48a80c362c
fix: disable external cloud-provider kubelet arg for stable baseline
Deploy Cluster / Terraform (push) Successful in 50s
Deploy Cluster / Ansible (push) Failing after 4m21s
2026-03-21 14:36:54 +00:00
fcf7f139ff
fix: use public api endpoint for flux bootstrap
Deploy Cluster / Terraform (push) Successful in 41s
Deploy Cluster / Ansible (push) Failing after 2m16s
2026-03-21 00:07:51 +00:00
7139ae322d
fix: bootstrap flux during cluster deploy
Deploy Cluster / Terraform (push) Successful in 38s
Deploy Cluster / Ansible (push) Failing after 3m21s
2026-03-20 10:37:11 +00:00
528a8dc210
fix: defer doppler store until eso is installed
Deploy Cluster / Terraform (push) Successful in 45s
Deploy Cluster / Ansible (push) Failing after 24m34s
2026-03-20 09:30:17 +00:00
349f75729a
fix: bootstrap tailscale namespace before secret
Deploy Cluster / Terraform (push) Successful in 44s
Deploy Cluster / Ansible (push) Failing after 3m30s
2026-03-20 09:24:35 +00:00
522626a52b
refactor: simplify stable cluster baseline
Deploy Cluster / Terraform (push) Successful in 1m48s
Deploy Cluster / Ansible (push) Failing after 4m7s
2026-03-20 02:24:37 +00:00
5bd4c41c2d
fix: restore k3s agent bootstrap
Deploy Cluster / Terraform (push) Successful in 49s
Deploy Cluster / Ansible (push) Failing after 18m16s
2026-03-20 01:50:16 +00:00
3e41f71b1b
fix: harden terraform destroy workflow
Deploy Cluster / Terraform (push) Successful in 2m28s
Deploy Cluster / Ansible (push) Failing after 20m4s
2026-03-19 23:26:03 +00:00
9d2f30de32
fix: prepare k3s for external cloud provider
Deploy Cluster / Terraform (push) Successful in 46s
Deploy Cluster / Ansible (push) Successful in 4m4s
2026-03-17 01:21:23 +00:00
08a3031276
refactor: retire imperative addon roles
Deploy Cluster / Terraform (push) Successful in 52s
Deploy Cluster / Ansible (push) Successful in 4m2s
2026-03-17 01:04:02 +00:00
e3ce91db62
fix: align flux ccm with live deployment
Deploy Cluster / Terraform (push) Successful in 47s
Deploy Cluster / Ansible (push) Successful in 3m56s
2026-03-11 18:17:16 +00:00
bed8e4afc8
feat: migrate core addons toward flux
Deploy Cluster / Terraform (push) Successful in 49s
Deploy Cluster / Ansible (push) Successful in 4m6s
2026-03-11 17:43:35 +00:00
2d4de6cff8
fix: bootstrap doppler store outside flux
Deploy Cluster / Terraform (push) Successful in 43s
Deploy Cluster / Ansible (push) Successful in 9m42s
2026-03-09 02:58:26 +00:00
4a83d981c8
fix: skip dry-run validation for doppler store sync
Deploy Cluster / Terraform (push) Successful in 44s
Deploy Cluster / Ansible (push) Has been cancelled
2026-03-09 02:52:08 +00:00
d188a51ef6
fix: move doppler store manifests out of ignored path
Deploy Cluster / Terraform (push) Successful in 45s
Deploy Cluster / Ansible (push) Has been cancelled
2026-03-09 02:45:46 +00:00
646ef16258
fix: stabilize flux and external secrets reconciliation
Deploy Cluster / Terraform (push) Successful in 48s
Deploy Cluster / Ansible (push) Successful in 9m42s
2026-03-09 02:25:27 +00:00
6f2e056b98
feat: sync runtime secrets from doppler
Deploy Cluster / Terraform (push) Successful in 45s
Deploy Cluster / Ansible (push) Successful in 9m56s
2026-03-09 00:25:41 +00:00
e10a70475f
fix: right-size flux observability workloads
Deploy Cluster / Terraform (push) Successful in 47s
Deploy Cluster / Ansible (push) Successful in 9m37s
2026-03-08 05:17:22 +00:00
f95e0051a5
feat: automate private tailnet access on cp1
Deploy Cluster / Terraform (push) Successful in 47s
Deploy Cluster / Ansible (push) Successful in 9m45s
2026-03-08 04:16:06 +00:00
7c15ac5846
feat: add flux ui on shared tailscale endpoint
Deploy Cluster / Terraform (push) Successful in 46s
Deploy Cluster / Ansible (push) Successful in 9m40s
2026-03-07 12:30:17 +00:00
4c104f74e8
feat: route observability through one tailscale endpoint
Deploy Cluster / Terraform (push) Successful in 51s
Deploy Cluster / Ansible (push) Successful in 9m33s
2026-03-07 01:04:03 +00:00
be04602bfb
fix: make flux bootstrap reachable from cluster
Deploy Cluster / Terraform (push) Successful in 47s
Deploy Cluster / Ansible (push) Successful in 9m59s
2026-03-07 00:38:29 +00:00
06c1356f1e
feat: expose flux observability services over tailscale
Deploy Cluster / Terraform (push) Successful in 46s
Deploy Cluster / Ansible (push) Successful in 9m14s
2026-03-05 00:43:29 +00:00
86fb5d5b90
fix: move observability gitops gating to role level
Deploy Cluster / Terraform (push) Successful in 44s
Deploy Cluster / Ansible (push) Successful in 9m17s
2026-03-05 00:17:25 +00:00
8b403cd1d6
feat: migrate observability stack to flux gitops
Deploy Cluster / Terraform (push) Successful in 45s
Deploy Cluster / Ansible (push) Failing after 1m11s
2026-03-04 23:38:40 +00:00
480a079dc8
fix: fail fast when loki datasource has no labels
Deploy Grafana Content / Grafana Content (push) Successful in 1m59s
Deploy Cluster / Terraform (push) Successful in 44s
Deploy Cluster / Ansible (push) Successful in 22m51s
2026-03-04 21:00:01 +00:00
ff8e32daf5
fix: add loki nodeport fallback for grafana datasource reachability
Deploy Grafana Content / Grafana Content (push) Successful in 2m18s
Deploy Cluster / Terraform (push) Successful in 48s
Deploy Cluster / Ansible (push) Successful in 22m59s
2026-03-04 19:39:16 +00:00