Skip to content

types.rs: try chunk-based safe rewrites of the decode fast paths (benchmark-gated) #157

@iainmcgin

Description

@iainmcgin

Context

The only unsafe in the runtime crate outside view.rs (the OwnedView machinery) and the global registries is the set of decode fast paths in buffa/src/types.rs:

  • decode_stringVec::with_capacity(len) + set_len(len) before copy_to_slice, to skip zero-initialising a buffer that is about to be overwritten
  • merge_stringString::as_mut_vec() + set_len(len) before copy_to_slice, with UTF-8 validated afterwards
  • decode_bytes / merge_bytes — the same set_len-before-write pattern for Vec<u8>

The blocks are well commented and carry #[allow(clippy::uninit_vec)], but the claim-length-before-initialising idiom is exactly what that lint exists to flag, and it sits in the contested corner of the uninitialised-memory rules (a &mut [u8] over uninit bytes handed to copy_to_slice). It would be nice to get this file to zero unsafe without giving up the reason the fast paths exist.

Proposal

Rewrite the four functions to build the output from the source chunks instead of pre-claiming the length:

  • Contiguous case (decoding from a slice or a single-chunk Bytes, i.e. essentially always): take buf.chunk()[..len] and use to_vec() / extend_from_slice / String::from_utf8(slice.to_vec()); for merge_string, validate the source slice first and push_str. Vec's own machinery allocates uninitialised and copies, so there is no zero-fill — the unsafe moves into std rather than being paid for with a memset.
  • Non-contiguous Buf fallback: loop over chunks with extend_from_slice (and validate UTF-8 once at the end for strings).

decode_bytes_to_bytes and the configurable string representations are unaffected.

Acceptance criteria

This is benchmark-gated, not assumed:

  1. Run the benchmarks/ harness on string/bytes-heavy datasets before and after (plus the in-crate criterion benches where relevant).
  2. If the contiguous-path numbers are within noise, land the safe version.
  3. If there is a real regression, fall back to keeping the fast path but modernising it to spare_capacity_mut() + write + set_len after the write — same performance as today, still unsafe, but no longer the claim-before-init pattern.

Either outcome should leave the SAFETY story of types.rs simpler than it is now; the preferred outcome deletes it entirely.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions