feat: type-aware partitioning, SummingMergeTree + LowCardinality, --stdout by Maksim-Gr · Pull Request #30 · Maksim-Gr/clickforge

Maksim-Gr · 2026-06-19T10:58:42Z

Summary

Fixes one correctness bug and lands four low-risk quick wins that build on classifiers already present in the codebase. No new dependencies.

Fix type-blind PARTITION BY — table now decides PARTITION BY toYYYYMM(...) from the column's inferred type (DateTime64) instead of its name. Previously a scan → table flow could emit PARTITION BY toYYYYMM(<String column>), which ClickHouse rejects (date-only strings stay String, yet any field merely named like a timestamp qualified). Also removes the timestamp heuristic that was duplicated between generator and scanner.
Auto-suggest SummingMergeTree — scan suggests it when numeric metrics and a dimension (id/timestamp) exist, with a narrow grouping key (one primary id + one timestamp) and the metric columns as SUM COLUMNS.
LowCardinality(String) detection — inference tracks capped distinct string values and flags low-cardinality columns (conservative thresholds: ≥20 records, ≤100 distinct, distinct < records/2); rendered as LowCardinality(String) across table/kafka/diff.
--stdout flag — kafka/table/diff can print migrations (with -- up / -- down headers) instead of writing files.
Docs — README updated; version bumped 0.4.0 → 0.5.0.

…tdout Fix a correctness bug and land several quick wins that build on classifiers already present in the codebase. - generator: decide PARTITION BY from the column's inferred type (DateTime64) instead of its name, so a scan -> table flow can no longer emit PARTITION BY toYYYYMM(<String column>), which ClickHouse rejects. Removes the timestamp heuristic that was duplicated between generator and scanner. - scanner: suggest SummingMergeTree when numeric metrics and a dimension exist, with a narrow grouping key (one primary id + one timestamp) and the metric columns as SUM COLUMNS. - inference/schema: detect low-cardinality String columns (capped distinct-value tracking, conservative thresholds) and render them as LowCardinality(String); applied across table/kafka/diff output. - cli/main: add --stdout to kafka/table/diff to print migrations instead of writing files. - docs: document the above in README; bump version 0.4.0 -> 0.5.0. Adds 8 unit tests (18 -> 26). cargo fmt, clippy -D warnings, and tests pass.

Maksim-Gr merged commit 5aa682c into main Jun 19, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: type-aware partitioning, SummingMergeTree + LowCardinality, --stdout#30

feat: type-aware partitioning, SummingMergeTree + LowCardinality, --stdout#30
Maksim-Gr merged 1 commit into
mainfrom
feat/quick-wins-partitioning-engines

Maksim-Gr commented Jun 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Maksim-Gr commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Maksim-Gr commented Jun 19, 2026 •

edited

Loading