Skip to content

Latest commit

 

History

History
123 lines (95 loc) · 5.22 KB

File metadata and controls

123 lines (95 loc) · 5.22 KB

Message Pack Format

src/message_pack_format/ is the Rust-side source of truth for the MessagePack encode/decode contract. It is split into two sub-modules by audience:

  • portable/ — cross-language wire format shared with the Go counterpart sketchlib-go. Touching anything here is a protocol change and requires the Go side to be kept in lock-step (golden-byte tests catch drift).
  • native/ — thin trait shims over the existing serialize_to_bytes / deserialize_from_bytes methods on the pure-Rust generic sketch types in src/sketches/. The byte format is internal to Rust — Go never reads it, and the format is free to evolve without cross-language coordination.

The MessagePackCodec trait and unified Error type live at the module root so both worlds share the same encode/decode contract.

Core Types

Both live at the top of src/message_pack_format/ and are re-exported through the module root:

  • MessagePackCodec (in codec.rs) — the trait every codec-enabled type implements. Two methods: to_msgpack, from_msgpack. This is the canonical encode/decode entry point.
  • Error (in error.rs) — the unified encode/decode error type returned by both native and portable impls.

portable/ — Cross-Language Wire Format

One submodule per algorithm, with the filenames mirrored on the Go side:

  • countminsketch.rs, countminsketch_topk.rs, countsketch.rs, ddsketch.rs, hll.rs, kll.rs, hydra_kll.rs, set_aggregator.rs, delta_set_aggregator.rs

Each submodule owns:

  1. The wire-format-aligned runtime type and its delta companion (e.g. CountMinSketch, CountMinSketchDelta) — these are the types re-exported at the crate root.
  2. The wire DTO struct(s), when the runtime type needs a separate over-the-wire shape (e.g. borrow / owned pairs, byte-compatible field reordering with sketchlib-go).
  3. The MessagePackCodec impl for the runtime type.

The full list of re-exported wire-format-aligned types:

  • countminsketch.rsCountMinSketch, CountMinSketchDelta
  • countminsketch_topk.rsCountMinSketchWithHeap, CmsHeapItem
  • countsketch.rsCountSketch, CountSketchDelta
  • ddsketch.rsDdSketch, DdSketchDelta
  • hll.rsHllSketch, HllSketchDelta, HllVariant
  • kll.rsKllSketch, KllSketchData
  • hydra_kll.rsHydraKllSketch
  • set_aggregator.rsSetAggregator
  • delta_set_aggregator.rsDeltaResult

Use these for sketches that must cross a process / language boundary. For high-throughput local ingest, custom hashers, and framework composition, reach for the generic sketches in src/sketches/ instead:

Need Use
Local high-throughput ingest, custom hashers, framework composition src/sketches/
Cross-process / cross-language transfer matching sketchlib-go bytes src/message_pack_format/portable/

Types that act as their own DTO

CountSketch, DdSketch, and HllSketch derive Serialize / Deserialize directly because their public field layout already matches the wire shape. Their MessagePackCodec impls serialize the struct verbatim — no separate DTO is required.

Protocol invariants

  • The wire envelope must remain byte-compatible with sketchlib-go.
  • Adding, reordering, renaming, or retyping a field counts as a protocol change; bump the format version on both sides and add a golden-byte test before shipping.
  • DTOs that appear as nested fields in another wire type (e.g. KllSketchData inside HydraKllSketchWire) are part of the same protocol surface — treat them with the same care.

native/ — Rust-Internal Codec Shims

One submodule per generic sketch type in src/sketches/ whose serialization is exposed through MessagePackCodec:

  • countminsketch.rs, countsketch.rs, countsketch_topk.rs, ddsketch.rs, hll.rs, kll.rs, kll_dynamic.rs
  • kmv.rs (gated behind the experimental feature flag)

Each impl forwards to_msgpack / from_msgpack to the sketch's existing serialize_to_bytes / deserialize_from_bytes methods. The byte format is not part of the cross-language protocol — it is an internal Rust serialization that can evolve freely.

Use the native codecs when you want a single unified trait-based encode/decode entry point for the generic in-process sketch types, without going through a wire-format-aligned wrapper.

Choosing Between portable and native

Need Use
Send a sketch to Go (sketchlib-go) or any non-Rust consumer portable (the wire-format-aligned types re-exported at the crate root)
Persist or transport a sketch within an all-Rust pipeline native (works directly on the generic sketches types)
New sketch crossing the wire Add a portable/<name>.rs, mirror the filename in sketchlib-go, and add a golden-byte test
New internal-only sketch serialization Add a native/<name>.rs shim and you are done

Cross-Reference

  • Generated rustdoc for the trait and per-algorithm wire DTOs is the most up-to-date reference; build it with cargo doc --no-deps --all-features --open.