Minimal dbt project to benchmark the DuckDB adapter / dbt models against the ClickBench dataset. The sections below explain where the raw data comes from, how to drive the benchmark runs via make, and how to pin specific dbt-duckdb builds.
- Run
make setup(downloads 14GB dataset) make clickbenchormake incrementalto run different benchmarks- View the results in your terminal, e.g.
| target_name | total_runtime_s | models_executed |
|---|---|---|
| clickbench_8GB | 35.905… | 43 |
| clickbench_24GB | 38.348… | 43 |
| clickbench_2GB | 55.512… | 43 |
- More RAM might have minimal effect on performance
- Multi-threading is always faster than single threading, but difference from 3-8 is minor
| model_name | target_name | model_runtime_s |
|---|---|---|
| table | incremental_multi_thread_8_8GB | 28.124788 |
| table | incremental_multi_thread_3_8GB | 31.391459 |
| microbatch_default | incremental_multi_thread_8_8GB | 71.4216 |
| microbatch_default | incremental_multi_thread_3_8GB | 72.648674 |
| incremental__del_ins_date_partition | incremental_multi_thread_3_8GB | 79.468605 |
| incremental__del_ins_date_partition | incremental_multi_thread_8_8GB | 90.72558 |
| incremental__del_ins_ukey_date | incremental_multi_thread_3_8GB | 96.76076 |
| incremental__del_ins_ukey_date | incremental_multi_thread_8_8GB | 96.99508 |
| incremental__del_ins_ukey | incremental_multi_thread_3_8GB | 103.33216 |
| incremental__del_ins_ukey | incremental_multi_thread_8_8GB | 114.056786 |
| table | incremental_single_thread_16GB | 146.28302 |
| table | incremental_single_thread_8GB | 147.62831 |
| microbatch_default | incremental_single_thread_8GB | 204.0397 |
| incremental__del_ins_date_partition | incremental_single_thread_16GB | 264.17905 |
| incremental__del_ins_date_partition | incremental_single_thread_8GB | 291.9784 |
| incremental__del_ins_ukey_date | incremental_single_thread_16GB | 293.086 |
| incremental__del_ins_ukey | incremental_single_thread_16GB | 294.80502 |
| incremental__del_ins_ukey | incremental_single_thread_8GB | 333.6303 |
| incremental__del_ins_ukey_date | incremental_single_thread_8GB | 335.9294 |
The staging source defined in models/staging/source.yml points DuckDB at the public ClickBench export:
read_parquet('https://datasets.clickhouse.com/hits_compatible/hits.parquet', binary_as_string=True) -- 14GB!
By default the stg_clickbench__hits model streams directly from this remote Parquet file and materializes as a table, so you can run the benchmarks immediately without re-downloading. If you prefer to store the file locally (for repeatable offline runs from Parquet or to avoid re-downloading ~14 GB), you can download it once and adjust the source to point at your local copy. Update models/staging/source.yml so the external_location points to read_parquet('data/clickbench/hits.parquet') instead of the remote URL.
All benchmark entry points live in the root Makefile so you do not have to remember the full dbt invocations. Run these from the project root after syncing dependencies with uv sync.
make setup— executesuv run dbt debugto validate the database setup and thenuv run dbt run -s stagingto download and materialize the staging layer.make clickbench— runsuv run dbt run -s marts.clickbench --full-refresh, rebuilding every mart model under theclickbenchsubdirectory.make microbatch— first performs a full refresh of the sharednormal_incrementalmodels, then rerunsnormal_incremental,microbatch, andmicrobatch_ukeywith the event-time window2013-07-01through2013-07-31.
pyproject.toml currently pulls a specific dbt-duckdb adapter with microbatch implementation (see: PR #644)
[project]
dependencies = [
"dbt-duckdb>=1.8.2",
]
[tool.uv.sources]
dbt-duckdb = { git = "https://github.com/gfrmin/dbt-duckdb", branch = "microbatch" }You have two knobs for testing other adapter builds:
-
Pin a released version by setting an exact requirement and removing the git override:
dependencies = [ "dbt-duckdb==1.8.3", ]
-
Follow a specific git ref by editing
[tool.uv.sources], for example:[tool.uv.sources] # either dbt-duckdb = { git = "https://github.com/duckdb/dbt-duckdb", rev = "1c0ffee" } # or dbt-duckdb = { path = "/Users/username/code/dbt-duckdb-from-me/", package = true }
After changing either value, run uv sync so the lockfile and virtual environment pick up the new adapter. Subsequent make targets will automatically use whatever version uv just installed.