fix: clear all gh code-scanning CVEs + quality-audit findings#6
Merged
Conversation
A two-pass quality-check-and-fix sweep: resolve every GitHub
code-scanning vulnerability, then fix the findings from a 9-dimension
quality audit (and the regressions that first pass introduced). Net
result: 0 npm-audit vulnerabilities in both lockfiles, 130 tests green,
coverage 92.8/79/90/95, typecheck/lint/format/build/chart-lint all clean.
─────────────────────── Supply chain / CVEs ───────────────────────
All 5 HIGH gh code-scanning CVEs were transitive deps; pinned via npm
`overrides`: @grpc/grpc-js→1.14.4, form-data→4.0.6, protobufjs→7.6.1,
ws→8.21.0 (all in-major patch/minor). The 26 MODERATE OpenTelemetry
findings (@opentelemetry/core W3C-Baggage DoS) are fixed at the source by
bumping @opentelemetry/auto-instrumentations-node 0.76→0.77, which pulls
sdk-node 0.219 → core 2.8.0 — a real dep bump rather than a fragile
override. Dev-only vite HIGH + esbuild low cleared via overrides. The
OAuth subpackage's AWS-SDK xml transitives (fast-xml-builder/-parser,
@aws-sdk/xml-builder) bumped within range; its own vite/esbuild overrides
added. Both lockfiles now report 0 vulnerabilities.
──────────────────────── Systems & reliability ────────────────────────
- pgvector Pool gains statement_timeout/query_timeout (5s),
connectionTimeoutMillis (2s), idleTimeoutMillis (30s) — the hottest
user-path call was the one external call with no timeout, so a
hung-but-alive Postgres could pin the Bolt handler slot.
- Graceful SIGTERM drain: the query handler now tracks in-flight queries
in a set and exposes drainInFlight(deadline); shutdown ordering is
app.stop() → drain (25s) → flushMetrics → close, so a deploy mid-query
no longer drops the awaited compliance audit. The chart's main
Deployment gains terminationGracePeriodSeconds: 45 to fit the drain.
- audit-consumer shutdown exits as soon as the receive loop drains
(loopExited flag, 500ms tick) instead of always burning the full 30s.
- uncaughtException now logs + exits(1) for a clean restart instead of
swallowing into an undefined process state; unhandledRejection stays
swallowed (Bolt Socket-Mode reconnect noise) and is now clearly scoped.
- Email cache is bounded (max 10k, oldest-evicted, expired-on-read) so a
churn of distinct users can't grow it without limit.
- Bedrock answer path now meters input/output/cache-read tokens from the
response usage block — the AI path's cost is observable per-query.
──────────────────────────── Security ────────────────────────────
- Postgres TLS now verifies the Aurora server cert against the bundled
Amazon RDS global CA (certs/rds-global-bundle.pem, COPYed into the
image) instead of blanket rejectUnauthorized:false — authenticated
encryption, MITM-resistant. Secure by default (PG_SSL_REJECT_UNAUTHORIZED
defaults true) with a documented escape hatch and graceful degrade if
the CA is unreadable. Applied to both the app and the demo seeder.
- Pino logger gains a redact config mirroring the OAuth module's token
field set + the realistic nested header/SDK-error shapes, so tokens and
secrets never land in app logs.
- pii-scrubber email regex char-class bug fixed ([A-Z|a-z] → [A-Za-z],
the pipe was matching a literal '|').
──────────────────────── Data model & schema ────────────────────────
chunks table keyed on a composite PRIMARY KEY (doc_id, chunk_index)
instead of doc_id alone, which had capped each document to a single chunk
row. initSchema runs its DDL on a dedicated client inside a transaction
with SET LOCAL statement_timeout = 0, so the new hot-path Pool timeout
can't kill an IVFFlat index build on a populated table. Seeder INSERT
upserts on (doc_id, chunk_index) and its PGDATABASE default is corrected
to the underscored name; its Bedrock embed call and Pool gain timeouts.
─────────────────────── Config & correctness ───────────────────────
- BEDROCK_LLM_MODEL_ID corrected to the cross-region inference profile
(us.anthropic.claude-sonnet-4-6) across chart/values.yaml, .env.example,
and CLAUDE.md — the bare anthropic.* ID is not invocable on-demand and
would have shipped a broken default to every deploy.
- Removed dead config (CRAWL_INTERVAL_MINUTES) and a dead export
(scrubPiiFromObject); rewrote a comment that recommended the
CI-banned vi.mock("ioredis") pattern; fixed a stale Node-version note.
- Coverage config honestly excludes the two bootstrap entry files
(src/redis.ts, src/bin/audit-consumer.ts) — only verifiable against
real Redis / a live process — matching the existing index.ts precedent.
──────────────────────────── Documentation ────────────────────────────
All six docs (runbook, integrations, qa-playbook, rag-architecture,
test-plan, threat-model) rewritten off the dead AWS ECS/ECR/CDK/Fargate/
Lambda deployment model and onto the actual one: an eks-agent-platform
Platform tenant whose Helm chart ArgoCD reconciles from git, with
landing-zone substrate, the Platform CR's IRSA, External Secrets, and a
KEDA-scaled audit-consumer Deployment. The runbook is fully reworked to
kubectl/helm/argocd incident steps and the chart's PrometheusRule alerts.
──────────────────────────── Testing ────────────────────────────
Added tests for the graceful-drain/in-flight tracking and the Bedrock
token metering; coverage rose to 92.8/79/90/95 against the 75/60/75/75
thresholds. No SDK module-mocking introduced (the grep-enforced ban
holds); every new external touchpoint carries an explicit timeout.
Co-authored-by: stxkxsbot <275011021+stxkxsbot@users.noreply.github.com>
This was referenced Jun 20, 2026
Member
Author
Deferred items (tracked separately)These quality-audit findings were deliberately not included here — each needs validation infra, a design decision, or a feature slice beyond an audit-fix:
|
CodeQL js/incomplete-url-substring-sanitization (HIGH) flagged two test
fetch-fakes that routed by `url.includes("api.notion.com")` — a substring
check an attacker URL like `evil.com/api.notion.com` could bypass. These
are test fixtures, not production security checks, but the pattern is worth
correcting: route on the parsed hostname (`new URL(url).hostname ===
"api.notion.com"`) instead. Resolves the two open CodeQL alerts at
src/connectors/acl-guard.test.ts and
src/slack/query-handler.integration.test.ts.
Co-authored-by: stxkxsbot <275011021+stxkxsbot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See commit message for full details.
Summary
terminationGracePeriodSeconds), audit-consumer early shutdown, Bedrock token metering, bounded cache, compositechunksPK, plus a batch of correctness/cleanup fixes.Verification
130 tests pass · coverage 92.8/79/90/95 · typecheck/lint/format/build/chart-lint green · 0 vulnerabilities.
🤖 Two-pass run: quality audit + fix, then an adversarial regression-hunt of the diff (caught & fixed 4 self-introduced regressions before commit).