fix(cli): non-interactive setup, git isolation, auto-commit, execute-only scans#279
Open
andreybavt wants to merge 8 commits into
Open
fix(cli): non-interactive setup, git isolation, auto-commit, execute-only scans#279andreybavt wants to merge 8 commits into
andreybavt wants to merge 8 commits into
Conversation
Non-interactive setup (--no-input) silently defaulted the LLM backend to anthropic, the one backend that cannot self-configure without an extra flag, then failed with 'Missing Anthropic API key: pass --anthropic-api-key-env or --anthropic-api-key-file.' — an error that never mentioned --llm-backend. A user who passed --target claude-code had no way to discover the (hidden) --llm-backend claude-code flag from the error. - chooseBackend no longer silently picks anthropic in disabled mode; it fails with a message naming --llm-backend, listing every backend, and noting that claude-code/codex use local auth (no key). - The anthropic and embedding credential errors now name --llm-backend / --embedding-backend so a keyless backend is discoverable from the error. - --llm-backend and --embedding-backend use .choices(), so invalid values report the allowed set (and the bespoke parser fns are removed). Only invocations that already failed change behavior; they now fail with an actionable error instead of a cryptic one.
A ktx project assumes its config dir is its own git working-tree root: writes,
session worktrees, squash-merges, and reindex scans all resolve relative to it.
GitService.initialize() gated on checkIsRepo() (IN_TREE), which is also
satisfied by an *enclosing* repository — so a project nested inside another git
working tree silently operated against the outer repo. Worktree/ingest writes
landed at the outer root (e.g. <outer>/wiki/global/) while reindex scanned
<projectDir>/wiki/global/, so the wiki was seeded but never indexed:
wiki_search returned nothing and knowledge_pages stayed empty, with no error.
Semantic-layer and raw-sources had the same divergence.
Gate initialization on checkIsRepo('root') instead: require the repo root to be
the config dir itself, and initialize a dedicated repository there when it is
not (logging clearly when nesting inside an existing repo). This restores the
one-repo-per-project invariant at the shared git layer, fixing all artifacts at
once, and keeps ktx's commits out of the enclosing repository.
Both documented flags were read only for status display; every ingest path squash-committed to main unconditionally, so setting either to false was a silent no-op (the reported symptom: 'Memory ingest (external_ingest): ...' commits despite memory.auto_commit: false). Gate the commit at the squash-merge onto main — the one point where ingest work becomes a permanent commit (intermediate session-worktree commits must still happen for the squash to collapse). When auto-commit is off, apply the squash to main's working tree and leave it staged instead of committing, so the run is never silently discarded: - GitService.stageSquashMergeIntoMain: shares the merge core with squashMergeIntoMain but stops before committing and returns the staged tree SHA (a valid diff/read ref). - memory.auto_commit gates MemoryAgentService (its DB writes are eager, so the staged files stay consistent); the commit-message job is skipped. - storage.git.auto_commit gates IngestBundleRunner; the wiki index is reconciled from the staged tree via the existing syncFromCommit (git diff/show accept a write-tree ref), and SL reindex already reads from files. Config descriptions now state precisely what each flag gates and the staged semantics when false.
…ect scans A configured warehouse was always a scan/ingest target. The only way to use a connection purely for SQL execution (ktx sql / sql_execution) was the leaky workaround of an empty setup.database_connection_ids — which actually re-includes every warehouse via the 'fall back to all' branch — so e.g. a BigQuery connection meant only for read-only queries triggered a full-billing-project scan. - Add a per-connection scan_enabled flag (default true) to warehouse connections. scan_enabled: false registers the connection for execution only and never as a scan target. - Route every scan-target selection path through one predicate (isScanTargetWarehouse): both ingest (primaryWarehouseConnectionIds, including the all-warehouses fallback) and setup (configuredPrimaryConnectionIds) now exclude execute-only connections. Setup validates the credential but skips scope discovery and scan for them. Execution paths are untouched — the warehouse descriptor still resolves, so ktx sql / sql_execution keep working. - Scripted setup with no --database-schema no longer silently scopes the scan to every discovered schema/dataset: it warns with the count and names how to narrow (--database-schema) or opt out (scan_enabled: false).
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
The auto_commit:false path (stageSquashMergeIntoMain) leaves main staged, but the shared squash helper assumed a clean target. A later ingest/memory run merging into that dirty index would 'git commit' the prior run's staged files under the new run's commit (contamination), and conflict cleanup's 'reset --hard HEAD' would discard them (data loss). Guard applySquashToIndex: if the target worktree has uncommitted tracked changes, refuse before merging and return a 'dirty' result (untracked/gitignored files are ignored — the squash never commits them). Callers surface it cleanly: the bundle runner fails the run with an actionable message; the memory agent rolls back its eager DB writes (like a conflict) so the DB never gets ahead of main. Main is left untouched in every case.
…commands scan_enabled:false promised the connection is 'never used as a scan/ingest target,' but the predicate only gated automatic selection — explicit ktx scan <id> / ktx ingest <id> still resolved the connection id and reached the live-database introspection path, so an execute-only connection could still be scanned or ingested. Guard runKtxScan and runKtxIngest at entry: if the target connection is execute-only, refuse with an actionable error (remove the flag to scan, or use ktx sql to query) before doing any work. This makes the flag a single declaration honored on every scan/ingest entry point, not just auto-selection.
The previous guard rejected any uncommitted tracked change (git status --untracked-files=no), which also caught unstaged edits like a ktx.yaml that setup writes during the flow and commits only after the context build — so the guard wrongly blocked setup context builds and ingest (8 local-bundle-ingest cases failed with 'uncommitted changes (ktx.yaml)'). The actual hazard is the index, not the working tree: 'git commit' captures only staged changes, and the auto_commit:false residue is staged by 'git merge --squash'. Narrow the check to 'git diff --cached' so only pre-existing staged changes are refused; unstaged working-tree edits proceed untouched (never committed by the squash). Adds a regression test that an unstaged tracked change does not block the merge and is neither committed nor discarded.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes four findings surfaced while using ktx as the context layer for a text-to-SQL benchmark, each as a scoped, test-covered commit.
ktx setup --no-inputnow requires an explicit--llm-backendinstead of silently defaulting toanthropicand failing with an error that never named the flag. Each ktx project now initializes its own git repo at the project dir — fixing wiki, semantic-layer, and raw-sources being silently unindexed when the project is nested inside another repo — andstorage.git.auto_commit/memory.auto_commit: falseare now honored by leaving ingest changes staged instead of committing. Warehouse connections gainscan_enabled: false(execute-only: usable byktx sql/sql_executionbut never scanned), and scripted setup with no--database-schemanow warns instead of silently scanning every discovered dataset. Also syncsuv.lockto the released ktx-daemon/ktx-sl 0.10.0 versions; the full gate suite (type-check, build, dead-code, ~3.3k tests) passes.