fix: clear all gh code-scanning CVEs + quality-audit findings by stxkxs · Pull Request #6 · nanohype/slack-knowledge-bot

stxkxs · 2026-06-20T02:53:54Z

See commit message for full details.

Summary

Security: clears all 5 HIGH gh code-scanning CVEs + 26 moderate OTel findings; both lockfiles now at 0 npm-audit vulnerabilities (was 28 + 7). Adds authenticated DB TLS (bundled RDS CA) and Pino log redaction.
Systems/quality: pg timeouts, graceful SIGTERM drain (+ terminationGracePeriodSeconds), audit-consumer early shutdown, Bedrock token metering, bounded cache, composite chunks PK, plus a batch of correctness/cleanup fixes.
Docs: all 6 docs rewritten off the dead ECS/CDK/Lambda model onto the actual EKS/Helm/ArgoCD/KEDA/landing-zone/Platform/ESO deployment.

Verification

130 tests pass · coverage 92.8/79/90/95 · typecheck/lint/format/build/chart-lint green · 0 vulnerabilities.

🤖 Two-pass run: quality audit + fix, then an adversarial regression-hunt of the diff (caught & fixed 4 self-introduced regressions before commit).

A two-pass quality-check-and-fix sweep: resolve every GitHub code-scanning vulnerability, then fix the findings from a 9-dimension quality audit (and the regressions that first pass introduced). Net result: 0 npm-audit vulnerabilities in both lockfiles, 130 tests green, coverage 92.8/79/90/95, typecheck/lint/format/build/chart-lint all clean. ─────────────────────── Supply chain / CVEs ─────────────────────── All 5 HIGH gh code-scanning CVEs were transitive deps; pinned via npm `overrides`: @grpc/grpc-js→1.14.4, form-data→4.0.6, protobufjs→7.6.1, ws→8.21.0 (all in-major patch/minor). The 26 MODERATE OpenTelemetry findings (@opentelemetry/core W3C-Baggage DoS) are fixed at the source by bumping @opentelemetry/auto-instrumentations-node 0.76→0.77, which pulls sdk-node 0.219 → core 2.8.0 — a real dep bump rather than a fragile override. Dev-only vite HIGH + esbuild low cleared via overrides. The OAuth subpackage's AWS-SDK xml transitives (fast-xml-builder/-parser, @aws-sdk/xml-builder) bumped within range; its own vite/esbuild overrides added. Both lockfiles now report 0 vulnerabilities. ──────────────────────── Systems & reliability ──────────────────────── - pgvector Pool gains statement_timeout/query_timeout (5s), connectionTimeoutMillis (2s), idleTimeoutMillis (30s) — the hottest user-path call was the one external call with no timeout, so a hung-but-alive Postgres could pin the Bolt handler slot. - Graceful SIGTERM drain: the query handler now tracks in-flight queries in a set and exposes drainInFlight(deadline); shutdown ordering is app.stop() → drain (25s) → flushMetrics → close, so a deploy mid-query no longer drops the awaited compliance audit. The chart's main Deployment gains terminationGracePeriodSeconds: 45 to fit the drain. - audit-consumer shutdown exits as soon as the receive loop drains (loopExited flag, 500ms tick) instead of always burning the full 30s. - uncaughtException now logs + exits(1) for a clean restart instead of swallowing into an undefined process state; unhandledRejection stays swallowed (Bolt Socket-Mode reconnect noise) and is now clearly scoped. - Email cache is bounded (max 10k, oldest-evicted, expired-on-read) so a churn of distinct users can't grow it without limit. - Bedrock answer path now meters input/output/cache-read tokens from the response usage block — the AI path's cost is observable per-query. ──────────────────────────── Security ──────────────────────────── - Postgres TLS now verifies the Aurora server cert against the bundled Amazon RDS global CA (certs/rds-global-bundle.pem, COPYed into the image) instead of blanket rejectUnauthorized:false — authenticated encryption, MITM-resistant. Secure by default (PG_SSL_REJECT_UNAUTHORIZED defaults true) with a documented escape hatch and graceful degrade if the CA is unreadable. Applied to both the app and the demo seeder. - Pino logger gains a redact config mirroring the OAuth module's token field set + the realistic nested header/SDK-error shapes, so tokens and secrets never land in app logs. - pii-scrubber email regex char-class bug fixed ([A-Z|a-z] → [A-Za-z], the pipe was matching a literal '|'). ──────────────────────── Data model & schema ──────────────────────── chunks table keyed on a composite PRIMARY KEY (doc_id, chunk_index) instead of doc_id alone, which had capped each document to a single chunk row. initSchema runs its DDL on a dedicated client inside a transaction with SET LOCAL statement_timeout = 0, so the new hot-path Pool timeout can't kill an IVFFlat index build on a populated table. Seeder INSERT upserts on (doc_id, chunk_index) and its PGDATABASE default is corrected to the underscored name; its Bedrock embed call and Pool gain timeouts. ─────────────────────── Config & correctness ─────────────────────── - BEDROCK_LLM_MODEL_ID corrected to the cross-region inference profile (us.anthropic.claude-sonnet-4-6) across chart/values.yaml, .env.example, and CLAUDE.md — the bare anthropic.* ID is not invocable on-demand and would have shipped a broken default to every deploy. - Removed dead config (CRAWL_INTERVAL_MINUTES) and a dead export (scrubPiiFromObject); rewrote a comment that recommended the CI-banned vi.mock("ioredis") pattern; fixed a stale Node-version note. - Coverage config honestly excludes the two bootstrap entry files (src/redis.ts, src/bin/audit-consumer.ts) — only verifiable against real Redis / a live process — matching the existing index.ts precedent. ──────────────────────────── Documentation ──────────────────────────── All six docs (runbook, integrations, qa-playbook, rag-architecture, test-plan, threat-model) rewritten off the dead AWS ECS/ECR/CDK/Fargate/ Lambda deployment model and onto the actual one: an eks-agent-platform Platform tenant whose Helm chart ArgoCD reconciles from git, with landing-zone substrate, the Platform CR's IRSA, External Secrets, and a KEDA-scaled audit-consumer Deployment. The runbook is fully reworked to kubectl/helm/argocd incident steps and the chart's PrometheusRule alerts. ──────────────────────────── Testing ──────────────────────────── Added tests for the graceful-drain/in-flight tracking and the Bedrock token metering; coverage rose to 92.8/79/90/95 against the 75/60/75/75 thresholds. No SDK module-mocking introduced (the grep-enforced ban holds); every new external touchpoint carries an explicit timeout. Co-authored-by: stxkxsbot <275011021+stxkxsbot@users.noreply.github.com>

stxkxs · 2026-06-20T02:55:45Z

Deferred items (tracked separately)

These quality-audit findings were deliberately not included here — each needs validation infra, a design decision, or a feature slice beyond an audit-fix:

Rate limiter: replace check-then-act with an atomic Redis Lua script #7 — Rate limiter TOCTOU → atomic Redis Lua (bug) — needs a live Redis to validate the script
WorkOS identity: replace O(n) directory scan with lookup-by-email or a sync webhook #8 — WorkOS O(n) directory scan → lookup-by-email / sync webhook — architectural
pgvector: forward-migration support + multi-chunk read-path dedup #9 — pgvector forward-migrations + multi-chunk read-path dedup — forward-looking (no ingester yet)
Generator: add a Bedrock model fallback chain #10 — Generator Bedrock model fallback chain — feature addition
Formatter: parameterize the hardcoded org name #11 — Formatter hardcoded org name → config — low / design call
OAuth module: wire the provider registry + share root toolchain config #12 — OAuth scaffold: wire the registry + share root toolchain config — vendored-module judgment call

CodeQL js/incomplete-url-substring-sanitization (HIGH) flagged two test fetch-fakes that routed by `url.includes("api.notion.com")` — a substring check an attacker URL like `evil.com/api.notion.com` could bypass. These are test fixtures, not production security checks, but the pattern is worth correcting: route on the parsed hostname (`new URL(url).hostname === "api.notion.com"`) instead. Resolves the two open CodeQL alerts at src/connectors/acl-guard.test.ts and src/slack/query-handler.integration.test.ts. Co-authored-by: stxkxsbot <275011021+stxkxsbot@users.noreply.github.com>

stxkxs marked this pull request as ready for review June 20, 2026 03:44

stxkxs merged commit cade037 into main Jun 20, 2026
9 checks passed

stxkxs deleted the quality-audit-and-security-fixes branch June 20, 2026 03:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: clear all gh code-scanning CVEs + quality-audit findings#6

fix: clear all gh code-scanning CVEs + quality-audit findings#6
stxkxs merged 2 commits into
mainfrom
quality-audit-and-security-fixes

stxkxs commented Jun 20, 2026

Uh oh!

stxkxs commented Jun 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

stxkxs commented Jun 20, 2026

Summary

Verification

Uh oh!

stxkxs commented Jun 20, 2026

Deferred items (tracked separately)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant