Hey — first, thanks for the views work, the zero-copy borrowing has been a real win for scalar/string/bytes-heavy messages.
I ran into a case where views end up more expensive than I expected, and wanted to float a feature idea before working around it.
What I hit
Views are zero-copy for scalars/strings/bytes (they borrow &str/&[u8]), but nested-message and repeated fields are materialized eagerly during decode_view:
MessageFieldView<V> is Option<Box<V>> — every present nested message gets a Box and is decoded recursively (the docs even call this out: "the inner view is boxed — recursive").
RepeatedView<'a, T> is Vec<T> — a repeated field allocates a Vec and decodes every element up front.
So decode_view walks and allocates the entire sub-tree, even if I only read a couple of top-level fields off each nested message. For deep, wide trees that ends up allocating more than decoding the owned type would, and a lot more than a hand-rolled minimal decode.
Where it bites
My workload is a backfill stream: a big blob shaped like Outer { items: [ Item { payload: BigNestedMessage {...} } ] }, roughly 6000 deeply-nested BigNestedMessages in one decode. I only need ~4 fields per item (an id, a small secret blob, a timestamp, and a couple of presence checks deeper in the tree for classification). Everything else in those messages is irrelevant to me.
With eager views, decoding the outer message materializes all 6000 sub-trees. dhat, same workload/machine:
| decode strategy |
total alloc churn |
| hand-written minimal-field structs (decode only the tags I need, skip the rest) |
~33 MB |
| generated views |
~80 MB (+144%) |
Leaf-first, the allocations are dominated by the per-item view decode: the innermost message view's _merge_into is ~21 MB / 48k allocations, its wrapper view ~17 MB / 24k. The minimal-struct path stays cheap only because it never decodes the fields it doesn't declare — which is exactly the boilerplate I was hoping views would let me delete.
Idea: opt-in lazy views
A mode where decode_view records each nested/repeated field's byte range in a single top-level pass, and only decodes a sub-view when its accessor is actually called. Roughly:
- nested message: store the undecoded slice (
Option<&'a [u8]>) and parse a fresh V<'a> on access. Returning the child by value (it's just a thin struct of borrowed refs / offsets) skips the Box and needs no interior mutability — fits read-once traversal well.
- repeated message: hold the raw field bytes and decode elements lazily on iteration instead of pre-building a
Vec<T>.
- scalars/strings/bytes stay borrowed exactly as today.
Then decode_view is O(top-level fields scanned once) and only the sub-trees you touch cost anything. For my case that's ~6000 cheap scans + a few field reads instead of 6000 full trees.
I think this has to be opt-in — it changes the accessor signatures for message/repeated fields (&V → V / an iterator), so a codegen flag like lazy_views(true) (or per-field) seems right. I don't have strong opinions on the exact shape and would defer to you.
Meanwhile
I'm falling back to hand-written minimal "projection" messages for this path — it works, but it's ~100 lines of parallel schema I now have to keep in sync with the real one, which is the exact maintenance burden views were supposed to remove.
Would you be open to something like this? Happy to prototype, benchmark, or send a PR if it's a direction you'd take.
Hey — first, thanks for the views work, the zero-copy borrowing has been a real win for scalar/string/bytes-heavy messages.
I ran into a case where views end up more expensive than I expected, and wanted to float a feature idea before working around it.
What I hit
Views are zero-copy for scalars/strings/bytes (they borrow
&str/&[u8]), but nested-message and repeated fields are materialized eagerly duringdecode_view:MessageFieldView<V>isOption<Box<V>>— every present nested message gets aBoxand is decoded recursively (the docs even call this out: "the inner view is boxed — recursive").RepeatedView<'a, T>isVec<T>— a repeated field allocates aVecand decodes every element up front.So
decode_viewwalks and allocates the entire sub-tree, even if I only read a couple of top-level fields off each nested message. For deep, wide trees that ends up allocating more than decoding the owned type would, and a lot more than a hand-rolled minimal decode.Where it bites
My workload is a backfill stream: a big blob shaped like
Outer { items: [ Item { payload: BigNestedMessage {...} } ] }, roughly 6000 deeply-nestedBigNestedMessages in one decode. I only need ~4 fields per item (an id, a small secret blob, a timestamp, and a couple of presence checks deeper in the tree for classification). Everything else in those messages is irrelevant to me.With eager views, decoding the outer message materializes all 6000 sub-trees. dhat, same workload/machine:
Leaf-first, the allocations are dominated by the per-item view decode: the innermost message view's
_merge_intois ~21 MB / 48k allocations, its wrapper view ~17 MB / 24k. The minimal-struct path stays cheap only because it never decodes the fields it doesn't declare — which is exactly the boilerplate I was hoping views would let me delete.Idea: opt-in lazy views
A mode where
decode_viewrecords each nested/repeated field's byte range in a single top-level pass, and only decodes a sub-view when its accessor is actually called. Roughly:Option<&'a [u8]>) and parse a freshV<'a>on access. Returning the child by value (it's just a thin struct of borrowed refs / offsets) skips theBoxand needs no interior mutability — fits read-once traversal well.Vec<T>.Then
decode_viewis O(top-level fields scanned once) and only the sub-trees you touch cost anything. For my case that's ~6000 cheap scans + a few field reads instead of 6000 full trees.I think this has to be opt-in — it changes the accessor signatures for message/repeated fields (
&V→V/ an iterator), so a codegen flag likelazy_views(true)(or per-field) seems right. I don't have strong opinions on the exact shape and would defer to you.Meanwhile
I'm falling back to hand-written minimal "projection" messages for this path — it works, but it's ~100 lines of parallel schema I now have to keep in sync with the real one, which is the exact maintenance burden views were supposed to remove.
Would you be open to something like this? Happy to prototype, benchmark, or send a PR if it's a direction you'd take.