feat(duckdb): cross-database federation via derived DuckDB connection#295
Open
kevinmessiaen wants to merge 47 commits into
Open
feat(duckdb): cross-database federation via derived DuckDB connection#295kevinmessiaen wants to merge 47 commits into
kevinmessiaen wants to merge 47 commits into
Conversation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…esolveStringReference Collapse the 5 remaining private copies in bigquery, clickhouse, mysql, snowflake, and sqlserver into the shared module. Fix a latent bug in the shared module where `~/path` was incorrectly sliced (dropping only `~`, leaving the leading `/` and making resolve() ignore homedir). Add a tilde-expansion test that caught the bug and now covers that branch. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…e members Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ach url Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Bypass assertSafeConnectionId for _ktx_federated in resolveLocalConnectionId and loadComputableSources, and resolve the compute dialect to 'duckdb' when connectionId is FEDERATED_CONNECTION_ID instead of falling through to the default postgres lookup. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…erage Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… member Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Also marks attachTypeForDriver, buildAttachStatements, and isReservedConnectionId @internal — all three are exported solely for unit-test access with no production cross-file consumer. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
…loads Collapse the parallel ATTACH_COMPATIBLE_DRIVERS set and ATTACH_TYPE_BY_DRIVER map into one map in federation.ts whose keys are the membership rule. Replace FederatedMember.config (read only via a type-erasing cast) with a typed url field extracted at derive time. Emit INSTALL/LOAD once per distinct driver type instead of once per member. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…p id validation Wrap the federated DuckDB instance in its own try/finally so a failing connect() or a throwing connection.closeSync() no longer leaks the native instance. Route setup-sources connection-id validation through the canonical assertSafeConnectionId so the reserved _ktx_ prefix guard applies there too. Derive the federated dialect through sqlAnalysisDialectForDriver instead of a hardcoded literal. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…n FederatedMember Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nector resolvers Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s, supporting sqlite path: Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…es _ktx_federated Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…h real DuckDB Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…utor Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…d executor Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ifest re-emit Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ts in test Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…collisions Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…federated MCP errors Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ling reads Dedup the federated driver ternary in local-query, derive the prefixed source.name from the already-built name, drop the duplicated error in federatedAttachTarget's exhaustive switch, inline the one-line cleanupConnector wrapper, and parallelize federatedSiblingTargets' shard reads (was sequential await-in-for on the scan hot path). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ated parity Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Surfaces the virtual federated connection in the output of `ktx connection list` so agents and users can discover cross-database querying when 2+ attach-compatible connections are configured. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Drive runKtxSql with the real federated DuckDB executor against two on-disk sqlite files, stubbing only SQL validation. The test surfaced that the JSON output path could not serialize bigint values DuckDB returns for integer columns; printJson now coerces bigint to JSON numbers, matching the plain/pretty paths. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…xecutor DuckDB returns integer columns as JS bigint, which JSON.stringify cannot serialize. The CLI --json path worked around this with a replacer, but the MCP sql_execution tool serializes via plain JSON.stringify and crashed on any federated query selecting an integer column. Coerce bigint to Number once in executeFederatedQuery so every consumer (CLI, MCP, ingest, SL) gets a JSON-safe result, and remove the now-redundant CLI replacer. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… path - Replace the identity-valued ATTACH_TYPE_BY_DRIVER record with a ATTACH_COMPATIBLE_DRIVERS Set; the driver name doubles as the attach type, so the map encoded nothing beyond membership. - Switch federatedAttachTarget directly on the driver with a default throw, dropping the unreachable post-switch throw and its comment. - Route the MCP sql_execution standard-connection case through the shared executeProjectReadOnlySql instead of reimplementing the connector create/capability-check/execute/cleanup ceremony, so federated and standard connections share one execution path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The federation doc example URL and the federated-attach test fixtures use literal placeholder credentials that trip detect-secrets. Mark them with line-scoped pragma allowlist comments so a real secret added later is still caught. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds cross-database federation to ktx. When a project declares 2+ attach-compatible databases (postgres, mysql, sqlite), ktx derives a virtual
_ktx_federatedconnection backed by an embedded DuckDB thatATTACHes each member read-only and runs cross-catalog joins. From the semantic layer's view there is one connection; DuckDB fans out to the real databases underneath. Live data, no copy.Answers the question that motivated this: books in postgres, reviews in sqlite, joined in one query.
A federated query goes from
ktx.yamlto returned rows for real configs, raw federated SQL works via thektx sqlCLI, the ingest path, and the MCPsql_executionpath, and declared cross-DB joins survive re-scan.Design (locked decisions)
deriveFederatedConnection(connections, projectDir)computes a descriptor from declared state. Never persisted — noktx.yamlentry, no flag, no_ktx_federated/directory. Recomputed every run.connectionId→ DuckDB catalog alias (connectionId.schema.table), quoted so hyphenated ids attach correctly.ATTACH ... READ_ONLY(physical) +assertReadOnlySql(statement) + caller-sidevalidateReadOnlywith the duckdb dialect._ktx_is a reserved connection-id namespace.ktx.yamlidentically (sqlitepath:/url:/~/project-relative; postgres/mysql discrete-fields-or-URL, SSL, search_path).ktx sql, ingest, and MCPsql_execution— routes through the sharedexecuteProjectReadOnlySql, so the federated-vs-direct decision follows from the connection id, not from which caller invoked it. The CLI and the agent expose the identical set of choices.What changed
Core federation
context/connections/federation.ts— derivation +FEDERATED_CONNECTION_ID;FederatedMembercarries the full member connection config +projectDir;federatedConnectionListingexposes the virtual connection (id, members, usage hint) for discovery surfaces.connectors/duckdb/federated-attach.ts(new) — resolves each member's DuckDB ATTACH target by reusing the canonical connector resolvers (sqliteDatabasePathFromConfig,postgresPoolConfigFromConfig,mysqlConnectionPoolConfigFromConfig). sqlitepath:resolves end-to-end; SSL (sslmode=require/ssl_mode=REQUIRED) and postgressearch_pathare preserved for discrete-field configs.connectors/duckdb/federated-executor.ts— ATTACH read-only + execute, targets resolved viafederated-attach. DuckDB returns integer columns as JSbigint; the executor coerces them tonumberonce here so every consumer (CLI/MCP/ingest/SL) gets a JSON-safe result.Unified query execution
context/connections/project-sql-executor.ts(new) — single sharedexecuteProjectReadOnlySqlthat owns the_ktx_federatedrouting decision. The ingest executor (ingest-query-executor.ts), the MCPsql_executionport (context/mcp/local-project-ports.ts), and thektx sqlCLI command (sql.ts) all delegate to it. MCP federated errors are classified viaKtxQueryErrorconsistently with non-federated SQL.ktx sqlCLI paritysql.ts—ktx sql -c _ktx_federated "<join>"now runs federated cross-database queries, matching MCP. The command's forked connection-lookup + single-scan-connector path is removed and replaced by a call toexecuteProjectReadOnlySql; the local duplicate dialect helper and the up-front config guard are deleted (the shared connector factory raises the same "not configured" error for unknown ids). Direct-c <member>queries are unchanged.KtxSqlQueryExecutionResultgained an optionalheaderTypesso--jsonoutput is preserved.Federated-connection discoverability
connection.ts(CLIktx connection) andcontext/mcp/local-project-ports.ts(MCPconnection_list) both surface the_ktx_federatedentry — id, member connection ids, and a short usage hint — via the one sharedfederatedConnectionListingbuilder, so an agent can discover that cross-database querying exists and how to address it.members/hintthread throughLocalConnectionInfo,KtxConnectionSummary, and the MCP output schema as optional fields;DUCKDBis a list-only label and is not added to the driver→connection-type map. Theconnection_listtool description points agents at the federated id for cross-database joins.Cross-DB join preservation through ingest
context/ingest/.../manifest.ts+context/scan/local-enrichment-artifacts.ts— declared cross-DB joins to federated siblings survive a re-scan. The sibling-target set is derived from scanned member state at the producer and honored wherever a cross-DBto:is evaluated.Semantic layer
context/sl/local-sl.ts— read-time union of member dirs for_ktx_federated, with member-namespaced source names (pg_books.books) so two members owning a same-named table don't collide. Physicalsource.tableis unchanged.context/sl/local-query.ts— duckdb dialect + federated id resolution; federated executed-plan metadata reportsduckdb.context/sl/source-files.ts— reserve_ktx_prefix.Setup / deps / docs
setup-databases.ts— informational federation notice (no prompt, no persisted state).@duckdb/node-apidependency.docs-site/.../concepts/cross-database-federation.mdx+ nav — documents the config shape, fully-qualified table naming, member-namespaced federated source names, and querying_ktx_federateddirectly viaktx sqland the MCPsql_executiontool.Test plan
pnpm --filter @kaelio/ktx run type-check— cleanpackages/cliINSERTrejected by read-only; hyphenated catalog ids attach and join; sqlitepath:resolved end-to-end; the shared executor's federated path against real DuckDB; the MCPsql_executionpath running a real_ktx_federatedjoin; a production-path test proving a manual cross-DB join survives re-scan.ktx sqlparity tests: member-direct execution,_ktx_federatedrouting to the shared executor (member connector not used), unknown-id error,--jsonheaderTypespreserved; an end-to-endktx sql -c _ktx_federatedcross-file sqlite join through the real executor._ktx_federatedappears inktx connectionand MCPconnection_listwith members + hint when 2+ attach-compatible members exist, and is absent otherwise.JSON.stringifyon both the executor result and the MCPsql_executionpath (previously the MCP federated path threwDo not know how to serialize a BigInton any integer column).Follow-ups (not blocking)
table:/to:references in generated SQL where a reserved identifier could appear.headerTypes); the DuckDB executor produces none. Direct member queries still report types.number(consistent with the existing plain/pretty CLI output); a string-fallback for out-of-range integers is a possible follow-up.