Skip to content

Add pipeline trace logging (enable_pipeline_trace setting)#22

Open
ybtsdst wants to merge 5 commits into
v1.5-variegatafrom
claude/duckdb-pipeline-logging-EJBXj
Open

Add pipeline trace logging (enable_pipeline_trace setting)#22
ybtsdst wants to merge 5 commits into
v1.5-variegatafrom
claude/duckdb-pipeline-logging-EJBXj

Conversation

@ybtsdst
Copy link
Copy Markdown
Owner

@ybtsdst ybtsdst commented Apr 27, 2026

Summary

为 DuckDB pipeline 执行系统添加调试/可视化能力,通过 session 变量控制:

SET enable_pipeline_trace = true;            -- 总开关(必须;LOCAL scope)
SET pipeline_graph_output  = '/tmp/g.txt';   -- pipeline 依赖图;空 = stderr;append 模式
SET pipeline_trace_output  = '/tmp/t.json';  -- Chrome Trace JSON;空 = stderr;truncate 模式
SELECT ...;

三个 setting 互相独立。文件无法打开时自动回退 stderr 并打印警告前缀。graph 是纯文本 append(多查询累积);trace JSON 必须是单一合法 JSON,所以 truncate(每条 query 覆盖)。


Output

Pipeline graph (pipeline_graph_output):

=== Pipeline Graph ===
Pipeline #0: TableScan→HashJoin
  depends on: #1
Pipeline #1: TableScan→HashJoinBuild
======================

Chrome Trace JSON (pipeline_trace_output,可拖入 https://ui.perfetto.dev/):

每个并行 task 一条 event(不是每个 pipeline 一条),tid = worker thread id 的 hash,所以每个工作线程占 Perfetto 的一行,真实并行度可见。

{"traceEvents":[
  {"name":"#0[0]: TableScan→HashJoin","ph":"X","pid":0,"tid":1832912,"ts":0,"dur":12480},
  {"name":"#0[1]: TableScan→HashJoin","ph":"X","pid":0,"tid":1832913,"ts":40,"dur":12300},
  {"name":"#1[0]: TableScan→HashJoinBuild","ph":"X","pid":0,"tid":1832914,"ts":0,"dur":45120}
]}

Design

Per-task wall-clock capture (in PipelineTask)

PipelineTask {
  int64_t  task_start_ns = -1;   // 首次 ExecuteTask 进入时设;
                                 // 任务 yield + reschedule 不重置 → 包含 blocked 间隔
  uint64_t task_thread_hash = 0; // std::hash<thread::id>,首次进入时锁定
}

ExecuteTask 入口:若 task_start_ns < 0 则记录 start_ns + thread hash。返回 TASK_FINISHED 前调用 pipeline.RecordTaskTiming(start, end, hash)

Per-task storage (in Pipeline)

struct TaskTiming { int64_t start_ns; int64_t end_ns; uint64_t thread_hash; };

mutex                task_timings_lock;
vector<TaskTiming>   task_timings;     // 多 worker 并发 append
idx_t                pipeline_id = 0;  // 由 PipelineTracer::AssignIds 顺序赋

RecordTaskTiming 持锁 push_back。读路径只在 query end 单线程读一次,不需要 RCU/CAS。

Trace emit lifecycle (in Executor)

  • InitializeInternal:enable_pipeline_trace=trueAssignIds、记录 pipeline_trace_start_ns(query 起点)、打印 graph、把 pipelines 复制到 traced_pipelines(shared_ptr,所有权延续)。
  • ExecuteTask(query 完成路径):在 pipelines.clear() 之前 emit Chrome Trace,然后清 traced_pipelines
  • Reset:清 traced_pipelines + pipeline_trace_start_ns

traced_pipelinesvector<shared_ptr<Pipeline>> 复制而非 raw 引用,是因为 timing 数据存活在 Pipeline 对象上,要等 pipelines.clear() 之后才 emit,shared_ptr 持有保证 Pipeline 不被析构。

Overhead

  • Disabled:Executor::InitializeInternal 一个 bool 检查 + ExecuteTask 一个 traced_pipelines.empty() 检查,基本零开销。
  • Enabled:每个 task steady_clock::now() × 2 + 一次 thread::id hash + 一次 mutex push_back。task 数量级下完全忽略。

Changes

New files

File Description
src/include/duckdb/parallel/pipeline_tracer.hpp PipelineTracer
src/parallel/pipeline_tracer.cpp AssignIds / PrintGraph / PrintChromeTrace / WriteOutput

Modified files

File Change
src/include/duckdb/parallel/pipeline.hpp PipelineTask:task_start_ns+task_thread_hash;Pipeline:pipeline_idTaskTiming 结构、task_timings + task_timings_lockRecordTaskTimingGetPipelineId、friend PipelineTracer
src/parallel/pipeline.cpp PipelineTask::ExecuteTask 首次进入捕获 start/thread,TASK_FINISHEDRecordTaskTiming;Pipeline::RecordTaskTiming 加锁 append
src/include/duckdb/execution/executor.hpp traced_pipelinespipeline_trace_start_ns
src/parallel/executor.cpp InitializeInternal 末尾 AssignIds + 记 query start ns + 打印 graph + 备份 traced_pipelines;ExecuteTaskpipelines.clear() 前 emit Chrome Trace;Reset 清状态
src/include/duckdb/main/client_config.hpp enable_pipeline_tracepipeline_graph_outputpipeline_trace_output
src/include/duckdb/main/settings.hpp EnablePipelineTraceSettingPipelineGraphOutputSettingPipelineTraceOutputSetting
src/main/settings/custom_settings.cpp 上述三个 setting 的 SetLocal/ResetLocal/GetSetting
src/common/settings.json 注册三个 setting(LOCAL scope,custom_implementation)
src/main/config.cpp 注册三个 setting + bump alias indices
src/parallel/CMakeLists.txt pipeline_tracer.cpp 进 unity build

claude added 2 commits April 27, 2026 01:55
Adds two debugging capabilities to the pipeline execution system:

1. Static pipeline graph: after query initialization, prints all pipelines
   and their dependency relationships to stderr.

2. Runtime timing trace: records wall-clock start/end times for each
   pipeline and outputs a Chrome Trace JSON report to stderr on query
   completion. The JSON is loadable in https://ui.perfetto.dev/ or
   chrome://tracing for an interactive Gantt visualization.

Usage:
  SET enable_pipeline_trace = true;
  SELECT ...;   -- pipeline graph + Chrome trace JSON go to stderr

New files:
  src/include/duckdb/parallel/pipeline_tracer.hpp
  src/parallel/pipeline_tracer.cpp

Key changes:
  - Pipeline: add pipeline_id, start_time_ns, end_time_ns, MarkStart/MarkEnd
  - PipelineEvent: call MarkStart in Schedule(), MarkEnd in FinishEvent()
  - PipelineFinishEvent: call MarkEnd in FinishEvent() (last-call wins,
    so finalization time is captured for base pipelines)
  - Executor: assign IDs, save traced_pipelines ref, print graph at init
    and Chrome trace JSON before pipelines.clear()
  - Settings: add enable_pipeline_trace (local bool, default false)

https://claude.ai/code/session_01UV8WQWq1RyBsyTAxsbu4yG
client_config.hpp is already included transitively via client_context.hpp.

https://claude.ai/code/session_01UV8WQWq1RyBsyTAxsbu4yG
@ybtsdst ybtsdst changed the base branch from main to v1.5-variegata April 27, 2026 02:00
Allows directing each trace output to a dedicated file:

  SET pipeline_graph_output  = '/tmp/graph.txt';   -- pipeline dependency graph
  SET pipeline_trace_output  = '/tmp/trace.json';  -- Chrome Trace JSON (Perfetto)

When a path is empty (default), the respective output falls back to stderr.
If the file cannot be opened, a warning is prepended and stderr is used.

Changes:
  - PipelineTracer: add WriteOutput() helper (file vs stderr), update
    PrintGraph/PrintChromeTrace signatures to accept output_path
  - ClientConfig: add pipeline_graph_output / pipeline_trace_output fields
  - Settings: add PipelineGraphOutputSetting / PipelineTraceOutputSetting
  - settings.json: register both new settings
  - custom_settings.cpp: implement SetLocal/ResetLocal/GetSetting for both
  - executor.cpp: thread file paths through to tracer calls

https://claude.ai/code/session_01UV8WQWq1RyBsyTAxsbu4yG
Copy link
Copy Markdown
Owner Author

@ybtsdst ybtsdst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

整体看下来:结构干净、改动范围合理、关闭时零开销。以下按严重程度记录一些点。

可能有问题的

1. settings.jsondefault_value 字段
新加的三个 setting(enable_pipeline_tracepipeline_graph_outputpipeline_trace_output)都没写 default_value,而旁边 enable_object_cachehttp_logging_output 等都写了。即使 custom_implementation: true 真正默认值来自 ClientConfig 成员默认值,settings codegen / 文档生成可能仍依赖该字段。建议补:

"default_value": "false"   // enable_pipeline_trace
"default_value": ""        // 两个路径

2. MarkStart / MarkEnd 写竞争(理论上 data race)
start_time_nsend_time_ns 是 plain int64_t,跨线程写入(Schedule 在调度线程,FinishEvent 在 worker 线程,后续 PrintChromeTrace 在持 executor_lock 的线程读取)。实践上 event 完成边界 + executor_lock 提供 happens-before,但 TSAN 会报警。改成 atomic<int64_t>,或加注释说明依赖。

3. 文件写入用 trunc 而非 append
pipeline_tracer.cppstd::ios::trunc,同一 session 跑多个 query 会互相覆盖,只剩最后一条。要么改 append,要么在描述/help 里写明"每次 query 覆盖"。

4. PipelineEvent::Schedule() 抛异常时 MarkEnd 不会被调用
MarkStarttry 之外,如果 pipeline->Schedule(event) 抛异常,FinishEvent 不会跑,end_time_ns-1PrintChromeTrace 已过滤 <0 的 pipeline,所以不出错,但失败 pipeline 在 trace 里完全消失。可在 catch 里也补一次 MarkEnd 以保留存在性。

风格 / 小瑕疵

5. Chrome Trace 的 tid: pipeline_id 语义不准
每条 pipeline 实际跨多个 worker 线程并行执行。用 tid=pipeline_id 让每条 pipeline 在 Perfetto 里独占一行,可视化够用,但严格说不是线程。如果想要真实并行度,可按 worker thread tid 输出多个 events,但属于另一种设计取舍。

6. JSON 转义只处理了 \"
PhysicalOperator::GetName() 通常是字母,低风险,但严格 JSON 还需 escape \\ 和控制字符。

7. trace 文件缺 metadata
Chrome Trace 支持 displayTimeUnitotherData 等顶层字段。加 \"displayTimeUnit\":\"ns\" 能让 Perfetto 显示精度更准。可选。

8. friend class PipelineTracer 略冗余
pipeline_tracer.cpp 访问的字段都已有 public getter (GetPipelineId() 等)。要么用 getter 去掉 friend,要么删 getter 仅靠 friend,两者都留有点重。

设计上 OK 的点

  • enable_pipeline_trace=false 时仅一次 bool 检查,热路径开销可忽略。
  • traced_pipelinesshared_ptr 让 timing 数据活过 pipelines.clear(),合理。
  • MarkEnd last-call-wins 让 PipelineFinishEvent 覆盖 PipelineEvent,把 finalize 时间也包进 base pipeline,设计有想法。
  • 文件打不开时 fallback 到 stderr 并 warning,体验 OK。

bingtao.yin and others added 2 commits May 2, 2026 18:18
…end/trunc, Replace API

- src/main/config.cpp: register 3 new settings in internal_options[] (DUCKDB_LOCAL)
  and adjust setting_aliases[] indexes for entries shifted by the inserts.
  Equivalent to running scripts/generate_settings.py; done by hand to keep the
  diff minimal.
- src/include/duckdb/parallel/pipeline.hpp: make start_time_ns/end_time_ns atomic;
  MarkStart/MarkEnd run on different threads than PipelineTracer::PrintChromeTrace.
- src/parallel/pipeline_tracer.cpp:
  - WriteOutput takes an `append` flag; PrintGraph appends, PrintChromeTrace
    truncates (Chrome Trace JSON must remain a single valid object).
  - StringUtil::ReplaceAll -> StringUtil::Replace (the in-place ReplaceAll
    does not exist; Replace is the global, returning variant).
- src/include/duckdb/parallel/pipeline_tracer.hpp: document the append flag.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously the tracer recorded a single (start, end) pair per Pipeline,
which collapsed all parallel PipelineTasks (created by LaunchScanTasks
according to MaxThreads()) into one logical event. Real parallelism
and per-worker timing were invisible.

Now each PipelineTask records its own lifetime:
- task_start_ns and task_thread_hash are captured on the first
  ExecuteTask() entry (handles PROCESS_PARTIAL yielding by checking
  task_start_ns < 0 sentinel, so re-entry doesn't overwrite the start);
- on TASK_FINISHED return, the task appends a TaskTiming record to
  Pipeline::task_timings under task_timings_lock.

PrintChromeTrace iterates task_timings and emits one Chrome Trace
event per task, with tid = thread_hash so each worker thread occupies
its own row in Perfetto, making real parallelism and load distribution
directly visible.

Pipeline-level MarkStart/MarkEnd and the start_time_ns/end_time_ns
atomics are removed; PipelineEvent::Schedule/FinishEvent and
PipelineFinishEvent::FinishEvent no longer touch timing. The graph
output (PrintGraph) is structural-only and is unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants