Skip to content

feat(cli): add kerno preflight command#97

Open
indenigrate wants to merge 8 commits into
optiqor:mainfrom
indenigrate:feat/kerno-preflight
Open

feat(cli): add kerno preflight command#97
indenigrate wants to merge 8 commits into
optiqor:mainfrom
indenigrate:feat/kerno-preflight

Conversation

@indenigrate
Copy link
Copy Markdown
Contributor

@indenigrate indenigrate commented May 21, 2026

What

Adds the kerno preflight subcommand to validate host prerequisites (kernel version, BTF, capabilities, cgroups, etc.) before running the diagnostic engine. Includes a Helm pre-install hook Job and an optional DaemonSet init container.

Why

Fixes #44

When kerno fails to start due to host misconfigurations, users currently see cryptic Go/eBPF verifier errors. This provides a clean, upfront check that reports PASS/FAIL/WARN with actionable, copy-pasteable remediation hints.

How

  • Added internal/preflight package containing 10 checks. Uses golang.org/x/sys/unix for capability checks to avoid CGO and new dependencies.
  • Added internal/cli/preflight.go for the Cobra command with pretty terminal output and JSON output (--output json).
  • Added Helm pre-install hook (preflight-job.yaml) with hostNetwork: true to ensure accurate port checks.
  • Added optional initContainers block to the DaemonSet.
  • Added ExecStartPre=/usr/local/bin/kerno preflight to the systemd unit.
  • Added 19 fixture-based unit tests in checks_test.go that simulate filesystem mounts and synthetic capability masks.

Testing

  • go build ./... passes

  • go test ./... passes

  • go vet ./... passes

  • golangci-lint run ./... passes

  • Tested locally with: ./bin/kerno preflight (to verify sudo hints) and sudo ./bin/kerno preflight --output json

  • N/A — pure docs/refactor (Did not modify BPF C code)

  • sudo ./bin/bpf-verify --read 5s confirms 6/6 programs still load

  • ./scripts/verify.sh passes (or specific phase: ./scripts/verify.sh quality)

Checklist

  • PR title follows Conventional Commits (feat(scope): subject)
  • All commits are DCO-signed (git commit -s)
  • No unrelated changes pulled in
  • Documentation updated where user-visible behavior changed
  • Added/updated tests for new code paths
  • If a new doctor rule, paired with a chaos scenario in scripts/verify.sh

@indenigrate indenigrate requested a review from btwshivam as a code owner May 21, 2026 19:50
@github-actions github-actions Bot added documentation Improvements or additions to documentation testing Tests and test coverage area/k8s Kubernetes integration level:advanced 200+ lines or 6+ files (auto-applied) area/ops Operations, deployment, runtime ergonomics area/community Community, contributors, governance labels May 21, 2026
@btwshivam
Copy link
Copy Markdown
Member

Fix conlicts

Copy link
Copy Markdown
Member

@btwshivam btwshivam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge state is conflicting and the branch is on a stale base (merge-base bdfe9a6), so the diff drags in completion.go, doctor.go, kerno-mangen, install.sh, Makefile and others that diverge from main. rebase on upstream main first so the diff is just the preflight feature, then the two notes inline. also note the job validates one node, not every daemonset node, so a mixed-kernel cluster can pass preflight and still fail on some nodes, worth calling out in the docs.

limits:
cpu: 100m
memory: 64Mi
volumeMounts:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the job mounts btf, proc, and cgroup, but CheckTracefs (checks.go:362) looks for /sys/kernel/tracing or /sys/kernel/debug/tracing, and neither is mounted here. the daemonset mounts /sys/kernel/debug (daemonset.yaml), so the actual daemon has it but this pre-install hook doesn't, which means the hook reports a tracefs problem on every healthy node and can block install. add the same /sys/kernel/debug (and tracing) mounts so the job's checks match what the daemon actually runs with.

Comment thread internal/preflight/checks.go Outdated

// CheckBTF verifies that /sys/kernel/btf/vmlinux is readable.
func CheckBTF(opts CheckOptions) Result {
_, err := os.Stat(opts.BTFPath)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os.Stat only proves the path exists, not that it's readable, but the message says "readable". a root-readable-only or zero-perm vmlinux would pass here and then fail at load. minor, but either open it (os.Open + close) or soften the message. also the Detail says "kernel >= 5.2" while CheckKernelVersion requires 5.8, pick one.

Signed-off-by: Devansh Soni <devanshsoni899@gmail.com>
Signed-off-by: Devansh Soni <devanshsoni899@gmail.com>
Signed-off-by: Devansh Soni <devanshsoni899@gmail.com>
Signed-off-by: Devansh Soni <devanshsoni899@gmail.com>
Signed-off-by: Devansh Soni <devanshsoni899@gmail.com>
Signed-off-by: Devansh Soni <devanshsoni899@gmail.com>
Signed-off-by: Devansh Soni <devanshsoni899@gmail.com>
- CheckBTF opens vmlinux instead of stat so a root-only or zero-perm
  file fails here rather than later at eBPF load; fix Detail to say
  kernel >= 5.8 to match CheckKernelVersion.
- Mount /sys/kernel/debug in the Helm preflight Job and DaemonSet init
  container so CheckTracefs matches what the daemon actually runs with,
  preventing a false tracefs WARN that can block install.
- Document in values.yaml that the pre-install hook validates only one
  node; initContainer gates every node on mixed-kernel clusters.

Signed-off-by: Devansh Soni <devanshsoni899@gmail.com>
@indenigrate indenigrate force-pushed the feat/kerno-preflight branch from 5b35e1e to ca4339d Compare June 1, 2026 11:57
@indenigrate
Copy link
Copy Markdown
Contributor Author

Thanks for the review @btwshivam — addressed all points and force-pushed:

Rebase / conflicts. Rebased onto current main (96adb57). The only conflict was in root.go, resolved by keeping both preflightCmd and the newly-merged completionCmd in AddCommand. The diff is now just the preflight feature — no more completion.go, doctor.go, kerno-mangen, install.sh, or Makefile drift.

tracefs mounts (preflight-job.yaml). Added the /sys/kernel/debug hostPath volume + mount to both the pre-install hook Job and the DaemonSet init container, so CheckTracefs sees the same filesystem the daemon runs with. This stops the false tracefs WARN that could block install on healthy nodes.

CheckBTF (checks.go). Switched os.Stat to os.Open + Close so a root-only / zero-perm vmlinux fails here instead of later at load. The message now includes the underlying error, and the Detail says kernel >= 5.8 to match CheckKernelVersion.

Mixed-kernel docs. Documented in values.yaml that the pre-install hook Job validates only one scheduled node (so a mixed-kernel cluster can pass yet still fail on other nodes), and that initContainer: true provides a per-node gate.

Verified: go build, go test ./internal/preflight ./internal/cli, go vet, golangci-lint (0 issues), and helm lint/template all pass.

@indenigrate indenigrate requested a review from btwshivam June 2, 2026 21:45
@btwshivam
Copy link
Copy Markdown
Member

fix conflict

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/community Community, contributors, governance area/k8s Kubernetes integration area/ops Operations, deployment, runtime ergonomics documentation Improvements or additions to documentation level:advanced 200+ lines or 6+ files (auto-applied) testing Tests and test coverage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CLI: kerno preflight — validate prerequisites without running

2 participants