feat(ci): add 24-hour nightly soak test with leak detection#84
feat(ci): add 24-hour nightly soak test with leak detection#84purvask2006-collab wants to merge 6 commits into
Conversation
…qor#47) - Add internal/ai/http_client.go: shared *http.Client builder * Honours HTTPS_PROXY/HTTP_PROXY/NO_PROXY env vars (Go default) * config.ai.proxy overrides env for explicit per-provider proxy * config.ai.ca_cert_file appended to system root pool (never replaces) * Actionable TLS error: cert subject + issuer + enterprise docs link * config.ai.insecure_skip_verify for dev only (loud stderr warning) * config.ai.timeout with 30s default - Update anthropic.go, openai.go, ollama.go to use shared client instead of inline &http.Client{} - Add config fields: ai.proxy, ai.ca_cert_file, ai.insecure_skip_verify, ai.timeout - Add internal/ai/http_client_test.go: 7 tests covering custom CA success, wrong CA error, default case, bad CA file, insecure skip verify, explicit proxy, empty proxy - Add docs/enterprise.md: 4 deployment scenarios, mitmproxy verification steps, openssl CA extraction one-liner Closes optiqor#47
…or#47) - Update anthropic.go, openai.go, ollama.go to use shared NewHTTPClient instead of inline &http.Client{} - Add ai.proxy, ai.ca_cert_file, ai.insecure_skip_verify, ai.timeout fields to AIConfig in internal/config/config.go Signed-off-by: purvask2006-collab <purvask2006@gmail.com>
…nvironments with authenticating proxies and MITM CA certificates. Signed-off-by: purvask2006-collab <purvask2006@gmail.com>
- Add internal/doctor/baselines.go: sliding ring-buffer Tracker with sigma mode (normal metrics) and ratio mode (skewed/log-distributed) - Add internal/doctor/baselines_test.go: 12 tests covering warmup suppression, stable-then-spike detection, 3x WARNING / 10x CRITICAL - Extend internal/config/config.go with BaselinesConfigYAML + helper - Wire Tracker into engine.go via WithBaselines() - Add adaptive overlays to all four threshold rules in rules.go - Add BaselineAnnotation field to Finding; render highlighted in render.go - Update AI system prompt in prompt.go to reference baseline context Static absolute floors are preserved alongside adaptive limits.
|
🚀 First PR — welcome aboard! A few things to expect:
If you get stuck, reply here or jump to Discussions. We want this PR to land. |
e364986 to
63c9a44
Compare
Signed-off-by: purvask2006-collab <purvask2006@gmail.com>
|
dont spam PR. first get your past pr merged |
|
please focus on learning rather then blindly using AI for points |
|
Understood, and I apologize for the friction. I got ahead of myself trying to fix things and open this soak test infrastructure without realising I was cluttering the workflow and pulling in unrelated commits. I really want to learn the proper maintenance workflow here. I will stop opening new PRs, put this branch into a draft, and focus 100% of my attention on fixing the build, reverting the out-of-scope model changes, and getting the enterprise HTTP client PR cleanly merged first. |
|
@btwshivam Here is why a standard test suite isn't enough for Kerno, and why I want to get this landed: |
|
@purvask2006-collab Hello!! slow down i understand your efforts.. i am saying first focus on getting your open PRs get merged then we can jump on this.. calmly and without any distraction |
|
also purva always rebase your branch with main before making pr so other branch changes wont get carrired away |
ill get back on your finding when ill review this PR.. for now rebase it.. and remove extra file changes |
|
Yes, thank you. so much. I am working on it, I will make the changes that are required. |
|
fix conflicts and rebase with name.. then ill review it.. tag me thnks! |
|
Hi @btwshivam, Got it. I am currently working on rebasing the branch with main, resolving the merge conflicts, and cleaning up any extra file changes as requested. I will tag you here as soon as the branch is clean and ready for your review. |
What
Adds a nightly 24-hour soak test that continuously monitors kerno under
chaos load and asserts that RSS, goroutine count, file descriptors, and
event throughput stay within defined bounds across the full run.
Why
Fixes #
How
The workflow builds kerno, grants BPF capabilities via setcap, then runs
kerno startandkerno chaos --induce cascade --duration 86400sinparallel. Every 5 minutes, soak-watch.sh scrapes RSS from /proc, goroutine
count and heap profiles from the pprof endpoint, open FDs, pinned BPF map
count via bpftool, and event throughput from the Prometheus metrics endpoint.
All data is appended to a CSV. Pprof snapshots are saved at hours 1, 6, 12,
18, and 24. At the end of the run a Python assertion script evaluates all
pass criteria and exits non-zero on any regression. The full CSV, logs, and
pprof dumps are uploaded as a GitHub Actions artifact on every run regardless
of pass or fail, so any failure can be reproduced locally.
The local script (scripts/soak-watch.sh) accepts --duration and --interval
flags so engineers can run a short smoke soak (10 minutes) without waiting
24 hours.
Testing
go build ./...passesgo test ./...passesgo vet ./...passesgolangci-lint run ./...passesbash scripts/soak-watch.sh --duration 600 --interval 60 --pid <kerno-pid>Checklist
feat(scope): subject)git commit -s)failure interpretation instructions