feat: Add Configurable Nested KeyValue Support for ClickHouse JSON Export#293
feat: Add Configurable Nested KeyValue Support for ClickHouse JSON Export#293brightsparc wants to merge 18 commits intostreamfold:mainfrom
Conversation
Signed-off-by: Mike Heffner <mikeh@fesnel.com>
Signed-off-by: Mike Heffner <mikeh@fesnel.com>
Support Python 3.14 in pyo3
…lags Fix build without file receiver
|
After ingesting some data, I can indeed see the JSON values as a deep structure, but I wasn't able to find a elegant way to say get the last item in an array of input messages eg see below: however there didn't Could be worth following up with Clickhouse folks on the best way to implement this. |
Release: Bump version to 0.1.7
… into nested-ch-attrs
|
Hey @brightsparc, as we discussed I support increasing the depth we can support here. So a few things here as I start to look at this:
Are you seeing any error messages from the exporter? I'll look at this some more of the next few days, but I may be busy. I'd like to support this, just need to clean it up a bit and identify where the gaps are. |
|
@rjenkins I've pulled in those changes, and removed the docker file, so let me know if you think this is GTG |
|
@brightsparc I can look again at this Mon |
Summary
This PR adds support for properly converting nested OpenTelemetry
KeyValueListandArrayValuestructures to native JSON objects/arrays in the ClickHouse exporter, with a configurable depth limit and backwards-compatible default behavior.Motivation
When exporting GenAI span attributes (e.g.,
gen_ai.input.messages,gen_ai.output.messages) that contain nestedKeyValueListstructures, the previous implementation would serialize them as empty strings or raw protobuf JSON, losing the semantic structure. This made querying these fields in ClickHouse difficult.Before (flat mode):
{"gen_ai.input.messages": ""}After (nested mode enabled):
{"gen_ai.input.messages": [{"role": "user", "parts": [{"type": "text", "content": "Hello"}]}]}Changes
1. Unified String Handling with
Cow<str>Before:
After:
Rationale:
Str+StrOwned)Cow<str>)The
Cow(Copy-on-Write) smart pointer elegantly handles both borrowed and owned strings:Cow::Borrowed(&str)- zero allocation, just stores pointer+lengthCow::Owned(String)- takes ownership, same asStrOwnedbefore2. New
JsonType::ObjectVariantAdded support for JSON objects (ClickHouse named tuples):
Serialization format follows ClickHouse rowbinary JSON spec:
3. Configurable Nested Conversion
NoneSome(0)Some(n)4. Separate Code Paths for Performance
Rather than adding depth checks to every recursive call in the common case, we maintain two separate implementations:
Flat mode (
anyvalue_to_jsontype_flat):Nested mode (
anyvalue_to_jsontype_nested):KvlistValue→JsonType::ObjectArrayValue→JsonType::Array(recursive)Performance Analysis
Why Separate Code Paths?
A single function with
if depth_enabled { check_depth() }on every call would add:By separating, the flat path compiles to tight, branch-free code.
Memory Layout
The discriminant for
JsonTypevariant +Cowtag fits in one cache line. Access pattern is identical to before.Benchmarking Expectations
The nested mode slowdown is acceptable because:
CLI Configuration
New flag added to configure nested KV conversion:
--clickhouse-exporter-nested-kv-max-depthROTEL_CLICKHOUSE_EXPORTER_NESTED_KV_MAX_DEPTHTransformer Configuration (Internal)
Wire Format
Flat Mode (unchanged)
Nested Mode (new)
Backwards Compatibility
None= flat modeTest Plan
test_anyvalue_arrayvalue_flat_mode- KvlistValue → JSON string, nested Array → JSON stringtest_anyvalue_arrayvalue_nested_mode- KvlistValue → Object, nested Array → Arraycargo test- 514 passed, 0 failedFuture Work