FALCO-BUNDLE: mount custom-rules + disable falcosidekick UI/Redis (F-1+F-2+F-3) by alexrf45 · Pull Request #49 · alexrf45/th0th

alexrf45 · 2026-05-28T21:00:42Z

Closes

F-1, F-2, F-3 from _docs/reviews/home-0ps-review-2026-05-28.md. All three findings touch the same Falco HR file, so they ship as one PR.

What's in this PR

F-1 — mount custom-rules ConfigMap

mounts.volumes + mounts.volumeMounts reference the existing falco-custom-rules ConfigMap, mounted at /etc/falco/rules.d
falco.rules_files already included that path (no change needed there; guarded by assertion in accept.sh)

F-2 — disable falcosidekick UI + Redis

falcosidekick.webui.enabled: false
falcosidekick.webui.redis.enabled: false ← path correction: the brief mentioned falcosidekick.config.redis.*, but in falcosidekick subchart 0.12.x Redis is actually nested under webui (confirmed against upstream values.yaml before committing)
Slack output preserved via falco.http_output.url → falco-falcosidekick:2801 (the chart routes via http_output → falcosidekick → Slack, not via a direct falcosidekick.config.slack.* block on the HR). Accept-test guards http_output.enabled + the URL target, plus a future-state guard for any falcosidekick.config.slack.webhookurl if someone later wires that direct path.

F-3 — modern_ebpf driver verified live

No HR change needed. Live DaemonSet log on security-falco-falcosecurity-4l8fk shows:
```
Thu May 28 20:27:02 2026: Opening 'syscall' source with modern BPF probe.
```
No kmod / legacy / fallback strings in the last 400 lines across the 6 DS pods on Talos 6.18.32-talos.
driver.kind: modern_ebpf is already set in the HR; the assertion guards it.

Post-merge manual cleanup (NOT in this PR)

The PV security-falco-falcosidekick-ui-redis-data (1Gi, Retain) will go unbound after the chart removes the StatefulSet. The data won't auto-delete because reclaimPolicy: Retain. Follow-ups:

After Flux reconciles, confirm the StatefulSet + PVC are gone: kube dev -n security get sts,pvc | grep redis
The underlying TrueNAS zvol (named per the dynamic-provisioner pattern) needs hand-deletion to reclaim the 1Gi. Locate via the PV's volumeAttributes.volume_id or volumeAttributes.iscsi_target_iqn.

Acceptance test

CI runs .claude/sprints/falco-bundle/accept.sh via .github/workflows/sprint-accept.yml. Local pass:

[accept:FALCO-BUNDLE] yamllint 1 files
[accept:FALCO-BUNDLE] kubectl kustomize _lib/controllers/falco
[accept:FALCO-BUNDLE] F-1 assert: mounts.volumes references configMap falco-custom-rules
[accept:FALCO-BUNDLE] F-1 assert: mounts.volumeMounts has /etc/falco/rules.d entry tied to the custom-rules volume
[accept:FALCO-BUNDLE] F-1 assert: falco.rules_files includes /etc/falco/rules.d
[accept:FALCO-BUNDLE] F-2 assert: falcosidekick.webui.enabled == false
[accept:FALCO-BUNDLE] F-2 assert: falcosidekick.webui.redis.enabled == false
[accept:FALCO-BUNDLE] F-2 assert: falco.http_output.enabled == true (alert plumbing preserved)
[accept:FALCO-BUNDLE] F-2 assert: falco.http_output.url targets falcosidekick
[accept:FALCO-BUNDLE] F-3 assert: driver.kind == modern_ebpf (static)
[accept:FALCO-BUNDLE] PASS

Out of scope for this sprint

falco-custom-rules ConfigMap content — kept as the existing thin starter set. Future sprint expands the rule library.

Worktree

/tmp/sprints/falco-bundle (clean up after merge: git worktree remove /tmp/sprints/falco-bundle && git branch -D sprint/falco-bundle)

Generated by

/sprint-orchestrate FALCO-BUNDLE O-7 — wave 1 (parallel with O-7, separate PR for the alert work).

🤖 Generated with Claude Code

…+F-2) F-1: Mount the in-repo falco-custom-rules ConfigMap (security ns) into /etc/falco/rules.d via the chart's mounts.volumes/volumeMounts hooks. falco.rules_files already enumerated /etc/falco/rules.d, so this wires the Git-managed rule library into the running engine without a chart fork. F-2: Disable the falcosidekick web UI Deployment and its bundled Redis StatefulSet (1Gi iSCSI PVC) that the falcosecurity/falco 8.0.0 chart turns on by default via the falcosidekick 0.12.x subchart. Alerts are surfaced via Falco → falcosidekick HTTP (the falco.http_output block is preserved), so the web UI + Redis add storage + attack surface for no win on a single-operator homelab. Heads-up: post-merge cleanup will leave an orphan security-falco-falcosidekick-ui-redis-data PV on the Retain-policy iSCSI StorageClass — manual TrueNAS zvol cleanup needed to reclaim the 1Gi. F-3: No HR change needed. Static driver.kind=modern_ebpf was already correct; runtime probe (k8sop dev kubectl -n security logs ds/security-falco-falcosecurity -c falco) confirmed the engine line 'Opening syscall source with modern BPF probe.' on all 6 pods. Logged in PR body, no fallback observed. Also tweak accept.sh: drop accept.sh from yamllint TOUCH_PATHS (it's a shell script, not YAML) and add a bash -n syntax check instead.

## Closes **O-7 remaining alerts** from [_docs/reviews/home-0ps-review-2026-05-28.md](../blob/dev/_docs/reviews/home-0ps-review-2026-05-28.md). The two CNPG alerts (`CNPGBackupStale`, `CNPGDumpCronJobStale`) already exist; this PR adds the three remaining. ## What's in this PR ### `CertExpiringSoon` (2 variants) - **Warning** when expiry < **14 days**, sustained 1h → routes to `slack-warning` - **Critical** when expiry < **3 days**, sustained 1h → routes to `slack-critical` - Metric: **`certmanager_certificate_expiration_timestamp_seconds`** ← **brief said `cert_manager_*` (underscore split); live cluster exports `certmanager_*` (no underscore).** Verified via Prometheus `/api/v1/label/__name__/values`. - Labels: `severity`, `app: cert-manager` for runbook routing ### `PVCNearFull` (2 variants) - **Warning** when `available/capacity < 0.10`, sustained 30m → `slack-warning` - **Critical** when `available/capacity < 0.05`, sustained 10m → `slack-critical` - Metrics: `kubelet_volume_stats_available_bytes / kubelet_volume_stats_capacity_bytes` - **Excludes `*-dumps-pvc`** via `persistentvolumeclaim!~".*-dumps-pvc"` — S-6 dump PVCs fill the volume by design - Labels: `severity`, `pvc: {{ $labels.persistentvolumeclaim }}` ### `GatusEndpointDown` (1 variant) - **Warning** when `gatus_results_endpoint_success == 0` for 5m → `slack-warning` - Metric verified live via in-cluster Prometheus API — gauge labelled by `key` (e.g. `applications_authentik`), `group`, `name`, `type` - Labels: `severity=warning`, `app=gatus`, `endpoint: {{ $labels.key }}` ## Existing rules preserved (guarded by accept.sh) Survival assertions confirm these are all still present and unchanged: `CNPGBackupStale`, `CNPGDumpCronJobStale`, `PodOOMKilled`, `FluxKustomizationNotReady`, `FluxHelmReleaseNotReady`, `FluxResourceSuspended`. ## Acceptance test CI runs `.claude/sprints/o-7/accept.sh` via `.github/workflows/sprint-accept.yml`. Local pass: ```text [accept:O-7] yamllint 2 files [accept:O-7] kubectl kustomize _lib/observability/kube-prometheus-stack [accept:O-7] assert: exactly one custom PrometheusRule renders [accept:O-7] assert: existing rules CNPGBackupStale + CNPGDumpCronJobStale preserved [accept:O-7] assert: new alert 'CertExpiringSoon' exists with severity label [accept:O-7] assert: new alert 'PVCNearFull' exists with severity label [accept:O-7] assert: new alert 'GatusEndpointDown' exists with severity label [accept:O-7] assert: CertExpiringSoon expr references certmanager_certificate_expiration_timestamp_seconds [accept:O-7] assert: CertExpiringSoon has a warning-severity variant [accept:O-7] assert: CertExpiringSoon has a critical-severity variant [accept:O-7] assert: PVCNearFull expr references kubelet_volume_stats_available_bytes + capacity_bytes [accept:O-7] assert: PVCNearFull excludes *-dumps-pvc via persistentvolumeclaim filter [accept:O-7] assert: PVCNearFull has a warning-severity variant [accept:O-7] assert: PVCNearFull has a critical-severity variant [accept:O-7] assert: GatusEndpointDown expr references gatus_results_endpoint_success [accept:O-7] PASS ``` ## Post-merge manual checks - `kube dev -n monitoring exec deploy/monitoring-kube-prometheus-stack-prometheus -- wget -qO- "localhost:9090/api/v1/rules"` — confirm 3 new rule groups loaded - Force a synthetic test via Alertmanager UI / silence to verify Slack routing on critical paths (optional — pattern matches existing CNPG alerts) ## Worktree `/tmp/sprints/o-7` (clean up after merge: `git worktree remove /tmp/sprints/o-7 && git branch -D sprint/o-7`) ## Generated by `/sprint-orchestrate FALCO-BUNDLE O-7` — wave 1 (parallel with FALCO-BUNDLE, separate PR #49). 🤖 Generated with [Claude Code](https://claude.com/claude-code)

…gate Heavy observability+CI day. New review doc captures full state. Closed since 2026-05-28: F-1+F-2+F-3 (PR #49 FALCO-BUNDLE), O-7 remaining alerts (PR #50 + 8a7a6f7 PVCNearFull fix), O-12 (CI lint workflow), O-13 (flux PodMonitor — pre-existing latent scrape bug, never caught data for 4 days), O-16 (kromgo configMapGenerator auto-rollout). New: O-15 (kromgo flux_version "No Data" — KSM label allowlist needed; top of next sprint), O-17 (TargetDown × 2 firing on authentik metrics scrape), O-18 (tailscale-operator PodMonitor 0 targets since bootstrap). Hyg-2 (orphan Falco Redis PVC after F-2 descope), Hyg-3 (3 cluster-configs comment-warnings). Recommended next sprint: O-15 + O-17 + O-18 as a 90-min observability cleanup bundle. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

alexrf45 added 2 commits May 28, 2026 16:56

chore(sprint): add FALCO-BUNDLE acceptance test

df0208e

alexrf45 mentioned this pull request May 28, 2026

O-7: cert expiry + PVC near-full + Gatus endpoint-down alerts #50

Merged

alexrf45 merged commit c492d45 into dev May 28, 2026
1 check passed

alexrf45 deleted the sprint/falco-bundle branch May 28, 2026 21:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FALCO-BUNDLE: mount custom-rules + disable falcosidekick UI/Redis (F-1+F-2+F-3)#49

FALCO-BUNDLE: mount custom-rules + disable falcosidekick UI/Redis (F-1+F-2+F-3)#49
alexrf45 merged 2 commits into
devfrom
sprint/falco-bundle

alexrf45 commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

alexrf45 commented May 28, 2026

Closes

What's in this PR

F-1 — mount custom-rules ConfigMap

F-2 — disable falcosidekick UI + Redis

F-3 — modern_ebpf driver verified live

Post-merge manual cleanup (NOT in this PR)

Acceptance test

Out of scope for this sprint

Worktree

Generated by

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant