Context
The only unsafe in the runtime crate outside view.rs (the OwnedView machinery) and the global registries is the set of decode fast paths in buffa/src/types.rs:
decode_string — Vec::with_capacity(len) + set_len(len) before copy_to_slice, to skip zero-initialising a buffer that is about to be overwritten
merge_string — String::as_mut_vec() + set_len(len) before copy_to_slice, with UTF-8 validated afterwards
decode_bytes / merge_bytes — the same set_len-before-write pattern for Vec<u8>
The blocks are well commented and carry #[allow(clippy::uninit_vec)], but the claim-length-before-initialising idiom is exactly what that lint exists to flag, and it sits in the contested corner of the uninitialised-memory rules (a &mut [u8] over uninit bytes handed to copy_to_slice). It would be nice to get this file to zero unsafe without giving up the reason the fast paths exist.
Proposal
Rewrite the four functions to build the output from the source chunks instead of pre-claiming the length:
- Contiguous case (decoding from a slice or a single-chunk
Bytes, i.e. essentially always): take buf.chunk()[..len] and use to_vec() / extend_from_slice / String::from_utf8(slice.to_vec()); for merge_string, validate the source slice first and push_str. Vec's own machinery allocates uninitialised and copies, so there is no zero-fill — the unsafe moves into std rather than being paid for with a memset.
- Non-contiguous
Buf fallback: loop over chunks with extend_from_slice (and validate UTF-8 once at the end for strings).
decode_bytes_to_bytes and the configurable string representations are unaffected.
Acceptance criteria
This is benchmark-gated, not assumed:
- Run the
benchmarks/ harness on string/bytes-heavy datasets before and after (plus the in-crate criterion benches where relevant).
- If the contiguous-path numbers are within noise, land the safe version.
- If there is a real regression, fall back to keeping the fast path but modernising it to
spare_capacity_mut() + write + set_len after the write — same performance as today, still unsafe, but no longer the claim-before-init pattern.
Either outcome should leave the SAFETY story of types.rs simpler than it is now; the preferred outcome deletes it entirely.
Context
The only
unsafein the runtime crate outsideview.rs(theOwnedViewmachinery) and the global registries is the set of decode fast paths inbuffa/src/types.rs:decode_string—Vec::with_capacity(len)+set_len(len)beforecopy_to_slice, to skip zero-initialising a buffer that is about to be overwrittenmerge_string—String::as_mut_vec()+set_len(len)beforecopy_to_slice, with UTF-8 validated afterwardsdecode_bytes/merge_bytes— the sameset_len-before-write pattern forVec<u8>The blocks are well commented and carry
#[allow(clippy::uninit_vec)], but the claim-length-before-initialising idiom is exactly what that lint exists to flag, and it sits in the contested corner of the uninitialised-memory rules (a&mut [u8]over uninit bytes handed tocopy_to_slice). It would be nice to get this file to zerounsafewithout giving up the reason the fast paths exist.Proposal
Rewrite the four functions to build the output from the source chunks instead of pre-claiming the length:
Bytes, i.e. essentially always): takebuf.chunk()[..len]and useto_vec()/extend_from_slice/String::from_utf8(slice.to_vec()); formerge_string, validate the source slice first andpush_str.Vec's own machinery allocates uninitialised and copies, so there is no zero-fill — theunsafemoves into std rather than being paid for with a memset.Buffallback: loop over chunks withextend_from_slice(and validate UTF-8 once at the end for strings).decode_bytes_to_bytesand the configurable string representations are unaffected.Acceptance criteria
This is benchmark-gated, not assumed:
benchmarks/harness on string/bytes-heavy datasets before and after (plus the in-crate criterion benches where relevant).spare_capacity_mut()+ write +set_lenafter the write — same performance as today, stillunsafe, but no longer the claim-before-init pattern.Either outcome should leave the SAFETY story of
types.rssimpler than it is now; the preferred outcome deletes it entirely.