Skip to content

fix(scan): use warehouse-backed direct scans#180

Open
jonathanhaaswriter wants to merge 4 commits intosnowflake-sync-runtime-postgres-20260327from
snowflake-direct-scan-warehouse-20260327
Open

fix(scan): use warehouse-backed direct scans#180
jonathanhaaswriter wants to merge 4 commits intosnowflake-sync-runtime-postgres-20260327from
snowflake-direct-scan-warehouse-20260327

Conversation

@jonathanhaaswriter
Copy link
Copy Markdown
Collaborator

Summary

  • switch direct scan and scan preflight from Snowflake-only wiring to the configured warehouse so sqlite/postgres backends can run scans without a Snowflake client
  • add shared warehouse asset-query helpers that preserve _cq_table, dedupe on latest _cq_sync_time, and honor incremental/keyset scan cursors on sqlite/postgres
  • add regression coverage for warehouse-backed direct scans plus sqlite incremental pagination/filter behavior

Validation

  • go test ./...
  • GOTOOLCHAIN=go1.26.1 go run github.com/golangci/golangci-lint/v2/cmd/golangci-lint@v2.8.0 run --timeout 5m ./...
  • go run ./scripts/check_agent_sdk_contract_compat/main.go --require-baseline --base-ref=writer/main
  • make agent-sdk-docs-check
  • make agent-sdk-packages-check
  • go run ./scripts/check_api_contract_compat/main.go --require-baseline --base-ref=writer/main
  • make api-contract-docs-check
  • make config-docs-check
  • make devex-codegen-check
  • go test ./internal/app -run "Test(PreCommitHookRunsFastLintOnStagedGoFiles|PrePushHookRunsChangedDevexPreflight|DockerBuildCommandsPassGoVersionBuildArg|DevelopmentGuideDocumentsDevexPreflight|DevexScriptPlansRelevantChecks|DevexScriptChangedModeIncludesWorkspaceDiffSources)"
  • make openapi-check
  • go run ./scripts/check_report_contract_compat/main.go --require-baseline --base-ref=writer/main
  • make report-contract-docs-check
  • make vendor-check

@jonathanhaaswriter
Copy link
Copy Markdown
Collaborator Author

I think the new direct-scan path loses watermark progress for delete-only incremental batches. The current flow exits early when GetAssets returns no rows and only persists watermarks when scanned > 0, but delete-only CDC batches still need to advance the cursor even when there is nothing left to fetch. Otherwise the scan can keep replaying the same delete-only window.

Can we preserve the old "advance watermark from change events even when no assets are returned" behavior?

@jonathanhaaswriter jonathanhaaswriter force-pushed the snowflake-sync-runtime-postgres-20260327 branch from 08d9da8 to 337654a Compare March 28, 2026 01:41
@jonathanhaaswriter jonathanhaaswriter force-pushed the snowflake-direct-scan-warehouse-20260327 branch from a3fb703 to 8567b47 Compare March 28, 2026 02:05
@jonathanhaaswriter jonathanhaaswriter force-pushed the snowflake-sync-runtime-postgres-20260327 branch from cb9d726 to bab4caf Compare April 1, 2026 17:50
@jonathanhaaswriter jonathanhaaswriter force-pushed the snowflake-direct-scan-warehouse-20260327 branch from f71f756 to 82f7f71 Compare April 1, 2026 17:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant