Skip to content

Add niche to ColdString to make Option<ColdString> the same size as ColdString#8

Merged
tomtomwombat merged 5 commits into
tomtomwombat:mainfrom
luksan:non_null_niche
Jun 3, 2026
Merged

Add niche to ColdString to make Option<ColdString> the same size as ColdString#8
tomtomwombat merged 5 commits into
tomtomwombat:mainfrom
luksan:non_null_niche

Conversation

@luksan

@luksan luksan commented Apr 27, 2026

Copy link
Copy Markdown
Contributor

This changes the type of the internal "encdoded" from *const u8 to NonNull in order to create a niche so that Option has the same size as ColdString. To handle storing of WIDTH NUL bytes the NULL pointer is mapped to a static [0u8; WIDTH] array.

@luksan luksan mentioned this pull request Apr 27, 2026
@tomtomwombat

Copy link
Copy Markdown
Owner

Can this handle "\0\0\0\0\0\0\0\0"?

@tomtomwombat

Copy link
Copy Markdown
Owner

I see, it represents "\0\0\0\0\0\0\0\0" as Self::PTR_TAG. Unfortunately, is_inline returns false for "\0\0\0\0\0\0\0\0". Is there a particular reason you picked Self::PTR_TAG to represent 8-null? Why not a new representation that's invalid for every other string, like usize::MAX? usize::MAX's first byte matches the inline tag 11111xxx, but is invalid UTF-8, so it can be treated as 8 null. There might be a better choice....

@luksan

luksan commented May 30, 2026

Copy link
Copy Markdown
Contributor Author

Since rust doesn't support custom niches in types NonNull is the only reasonable internal type. That means that we can't store an 8 byte all NUL str inline, so it has to be that value that is special cased.

@tomtomwombat

tomtomwombat commented May 30, 2026

Copy link
Copy Markdown
Owner

I'm with you there. But, is Self::TAG_PTR the only reasonable 8-nul representation?

I'm suggesting

fn new_eight_nul() -> Self {
    // SAFETY: PTR_TAG is non-zero
    unsafe { Self::from_inline_buf(usize::MAX.to_ne_bytes()) }
}

fn inline_len(&self) -> usize {
    let addr = self.addr();
    match addr & Self::INLINE_TAG {
        Self::INLINE_TAG if addr != usize::MAX => (addr & Self::LEN_MASK).rotate_right(Self::ROT),
         _ => WIDTH,
    }
}

so new_eight_nul().is_inline() == true and we don't handle the 8 nul case in heap related functions: heap_ptr, decode_heap etc.

Would that work?

@luksan

luksan commented May 30, 2026

Copy link
Copy Markdown
Contributor Author

And having an invalid utf8 string as representation for 8NUL means that we can't return an &str to it, so some kind of tag and indirection is needed. I don't see any disadvantage of considering as a pointer, and then doing null ptr checks, as compared to having a completely different codepath for the niches.

@tomtomwombat

Copy link
Copy Markdown
Owner

The advantage of using the internal usize::MAX for 8 null bytes is that it's easier to reason about since it's consistent with the other inline cases. The code path doesn't diverge much from standard inline case. Less so than the heap case.

https://github.com/luksan/cold-string/pull/1/changes

@luksan

luksan commented May 30, 2026

Copy link
Copy Markdown
Contributor Author

Ok, I'll try to implement it that way and see how it looks. I might have some time this evening.

@tomtomwombat

tomtomwombat commented May 30, 2026

Copy link
Copy Markdown
Owner

I think you should be able to merge luksan#1 into your branch, and then it will update this PR automatically.

@tomtomwombat tomtomwombat force-pushed the non_null_niche branch 2 times, most recently from bf305e5 to 9e6237a Compare June 1, 2026 07:06
@luksan

luksan commented Jun 1, 2026

Copy link
Copy Markdown
Contributor Author

Why did you remove the packed attribute?

@tomtomwombat

Copy link
Copy Markdown
Owner

Why did you remove the packed attribute?

@luksan If I'm understanding https://doc.rust-lang.org/std/option/index.html#representation right, Rust does not guarantee niche optimizations (e.g across all platforms and Rust versions) for repr(packed) structs. However, Rust does guarantee it for NonNull and repr(transparent) structs that use NonNull for inner types. I'd like to have that guarantee for ColdString, partly because it makes documentation and tests easier: assert_eq!(size_of::<Option<ColdString>>(), size_of::<ColdString>()); is guaranteed to pass on all machines.

repr(packed) is in-compatible with repr(transparent).

Users who still want to use a packed version of ColdString can wrap it, and pack their wrapper:

use std::ptr::NonNull;

#[repr(transparent)]
struct ColdString {
    inner: NonNull<u8>,
}

#[repr(packed)]
struct PackedColdString {
    inner: ColdString,
    x: u8,
}

fn main() {
    assert_eq!(9, std::mem::size_of::<PackedColdString>());
}

@tomtomwombat tomtomwombat merged commit 76a4112 into tomtomwombat:main Jun 3, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants