Skip to content

fix(cli): non-interactive setup, git isolation, auto-commit, execute-only scans#279

Open
andreybavt wants to merge 8 commits into
mainfrom
finding-hidden-flags-issues
Open

fix(cli): non-interactive setup, git isolation, auto-commit, execute-only scans#279
andreybavt wants to merge 8 commits into
mainfrom
finding-hidden-flags-issues

Conversation

@andreybavt

Copy link
Copy Markdown
Contributor

Fixes four findings surfaced while using ktx as the context layer for a text-to-SQL benchmark, each as a scoped, test-covered commit. ktx setup --no-input now requires an explicit --llm-backend instead of silently defaulting to anthropic and failing with an error that never named the flag. Each ktx project now initializes its own git repo at the project dir — fixing wiki, semantic-layer, and raw-sources being silently unindexed when the project is nested inside another repo — and storage.git.auto_commit / memory.auto_commit: false are now honored by leaving ingest changes staged instead of committing. Warehouse connections gain scan_enabled: false (execute-only: usable by ktx sql / sql_execution but never scanned), and scripted setup with no --database-schema now warns instead of silently scanning every discovered dataset. Also syncs uv.lock to the released ktx-daemon/ktx-sl 0.10.0 versions; the full gate suite (type-check, build, dead-code, ~3.3k tests) passes.

Non-interactive setup (--no-input) silently defaulted the LLM backend to
anthropic, the one backend that cannot self-configure without an extra flag,
then failed with 'Missing Anthropic API key: pass --anthropic-api-key-env or
--anthropic-api-key-file.' — an error that never mentioned --llm-backend. A
user who passed --target claude-code had no way to discover the (hidden)
--llm-backend claude-code flag from the error.

- chooseBackend no longer silently picks anthropic in disabled mode; it fails
  with a message naming --llm-backend, listing every backend, and noting that
  claude-code/codex use local auth (no key).
- The anthropic and embedding credential errors now name --llm-backend /
  --embedding-backend so a keyless backend is discoverable from the error.
- --llm-backend and --embedding-backend use .choices(), so invalid values
  report the allowed set (and the bespoke parser fns are removed).

Only invocations that already failed change behavior; they now fail with an
actionable error instead of a cryptic one.
A ktx project assumes its config dir is its own git working-tree root: writes,
session worktrees, squash-merges, and reindex scans all resolve relative to it.
GitService.initialize() gated on checkIsRepo() (IN_TREE), which is also
satisfied by an *enclosing* repository — so a project nested inside another git
working tree silently operated against the outer repo. Worktree/ingest writes
landed at the outer root (e.g. <outer>/wiki/global/) while reindex scanned
<projectDir>/wiki/global/, so the wiki was seeded but never indexed:
wiki_search returned nothing and knowledge_pages stayed empty, with no error.
Semantic-layer and raw-sources had the same divergence.

Gate initialization on checkIsRepo('root') instead: require the repo root to be
the config dir itself, and initialize a dedicated repository there when it is
not (logging clearly when nesting inside an existing repo). This restores the
one-repo-per-project invariant at the shared git layer, fixing all artifacts at
once, and keeps ktx's commits out of the enclosing repository.
Both documented flags were read only for status display; every ingest path
squash-committed to main unconditionally, so setting either to false was a
silent no-op (the reported symptom: 'Memory ingest (external_ingest): ...'
commits despite memory.auto_commit: false).

Gate the commit at the squash-merge onto main — the one point where ingest work
becomes a permanent commit (intermediate session-worktree commits must still
happen for the squash to collapse). When auto-commit is off, apply the squash to
main's working tree and leave it staged instead of committing, so the run is
never silently discarded:

- GitService.stageSquashMergeIntoMain: shares the merge core with
  squashMergeIntoMain but stops before committing and returns the staged tree
  SHA (a valid diff/read ref).
- memory.auto_commit gates MemoryAgentService (its DB writes are eager, so the
  staged files stay consistent); the commit-message job is skipped.
- storage.git.auto_commit gates IngestBundleRunner; the wiki index is reconciled
  from the staged tree via the existing syncFromCommit (git diff/show accept a
  write-tree ref), and SL reindex already reads from files.

Config descriptions now state precisely what each flag gates and the staged
semantics when false.
…ect scans

A configured warehouse was always a scan/ingest target. The only way to use a
connection purely for SQL execution (ktx sql / sql_execution) was the leaky
workaround of an empty setup.database_connection_ids — which actually re-includes
every warehouse via the 'fall back to all' branch — so e.g. a BigQuery connection
meant only for read-only queries triggered a full-billing-project scan.

- Add a per-connection scan_enabled flag (default true) to warehouse connections.
  scan_enabled: false registers the connection for execution only and never as a
  scan target.
- Route every scan-target selection path through one predicate
  (isScanTargetWarehouse): both ingest (primaryWarehouseConnectionIds, including
  the all-warehouses fallback) and setup (configuredPrimaryConnectionIds) now
  exclude execute-only connections. Setup validates the credential but skips
  scope discovery and scan for them. Execution paths are untouched — the warehouse
  descriptor still resolves, so ktx sql / sql_execution keep working.
- Scripted setup with no --database-schema no longer silently scopes the scan to
  every discovered schema/dataset: it warns with the count and names how to narrow
  (--database-schema) or opt out (scan_enabled: false).
@vercel

vercel Bot commented Jun 9, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
ktx-docs-site Ready Ready Preview, Comment Jun 9, 2026 12:55pm

Request Review

The auto_commit:false path (stageSquashMergeIntoMain) leaves main staged, but the
shared squash helper assumed a clean target. A later ingest/memory run merging
into that dirty index would 'git commit' the prior run's staged files under the
new run's commit (contamination), and conflict cleanup's 'reset --hard HEAD'
would discard them (data loss).

Guard applySquashToIndex: if the target worktree has uncommitted tracked changes,
refuse before merging and return a 'dirty' result (untracked/gitignored files are
ignored — the squash never commits them). Callers surface it cleanly: the bundle
runner fails the run with an actionable message; the memory agent rolls back its
eager DB writes (like a conflict) so the DB never gets ahead of main. Main is
left untouched in every case.
…commands

scan_enabled:false promised the connection is 'never used as a scan/ingest
target,' but the predicate only gated automatic selection — explicit
ktx scan <id> / ktx ingest <id> still resolved the connection id and reached the
live-database introspection path, so an execute-only connection could still be
scanned or ingested.

Guard runKtxScan and runKtxIngest at entry: if the target connection is
execute-only, refuse with an actionable error (remove the flag to scan, or use
ktx sql to query) before doing any work. This makes the flag a single declaration
honored on every scan/ingest entry point, not just auto-selection.
The previous guard rejected any uncommitted tracked change (git status
--untracked-files=no), which also caught unstaged edits like a ktx.yaml that
setup writes during the flow and commits only after the context build — so the
guard wrongly blocked setup context builds and ingest (8 local-bundle-ingest
cases failed with 'uncommitted changes (ktx.yaml)').

The actual hazard is the index, not the working tree: 'git commit' captures only
staged changes, and the auto_commit:false residue is staged by 'git merge
--squash'. Narrow the check to 'git diff --cached' so only pre-existing staged
changes are refused; unstaged working-tree edits proceed untouched (never
committed by the squash). Adds a regression test that an unstaged tracked change
does not block the merge and is neither committed nor discarded.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant