refactor(encoding): shared FastLanes layout + RLE toLongs delegation#137
Merged
Conversation
…math Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ticket Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
FL_CHUNK_SIZE/FL_ORDER + transposeIndex/iterateIndex/lanes were duplicated in the Delta encoder and decoder; the low-bit typeMask and the byteSize*8 width were also copy-pasted across Delta, Bitpacked and Patched. Pull the FastLanes layout into a shared core.encoding.FastLanes (CHUNK, transposeIndex, iterateIndex, lanes, lowMask) and add PType.bits() for the width. Cross-module (reader + writer) so the home is core, mirroring PTypeIO. Hot paths deliberately untouched: Bitpacked keeps its own FL_ORDER constant and the unrolled pack/unpack kernels are byte-identical — only the cold per-call typeMask / width setup now routes through FastLanes. Delta's transposeIndex/iterateIndex were already standalone static calls, so moving them across a class boundary does not change inlining. Pco is excluded (not FastLanes-family, perf-critical). Ground truth green both directions: RustWritesJavaReads (12), JavaWritesRustReads (213), JavaRoundTrip. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… PrimitiveArrays Rle.toLongs is a superset of PrimitiveArrays.toLongs: identical widen for the eight integer ptypes, plus F32/F64/F16 raw-bit packing unique to RLE. The integer half was a verbatim copy. Keep the float/f16 arms local and route every other ptype through PrimitiveArrays.toLongs via the switch default (floats are matched first, so the default only ever sees integers). Drops ~48 duplicated lines. Ground truth green: JavaWritesRustReads (213). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Continues the encoding-helper dedup, behavior-preserving.
FastLanes (core)
FL_CHUNK_SIZE/FL_ORDER+transposeIndex/iterateIndex/laneswere duplicated in the Delta encoder + decoder; the low-bittypeMaskandbyteSize*8width were copy-pasted across Delta, Bitpacked and Patched. Pulled into a sharedcore.encoding.FastLanes(CHUNK,transposeIndex,iterateIndex,lanes,lowMask) +PType.bits(). Cross-module (reader + writer) → home is core, mirroringPTypeIO.Hot paths untouched by design: Bitpacked keeps its own
FL_ORDERconstant and its unrolled pack/unpack kernels are byte-identical — only the cold per-calltypeMask/width setup routes through FastLanes. Delta'stransposeIndex/iterateIndexwere already standalone static calls, so the class-boundary move doesn't change inlining. Pco excluded (not FastLanes-family, perf-critical).RLE toLongs delegation
Rle.toLongsis a superset ofPrimitiveArrays.toLongs: identical integer widen + extra F32/F64/F16 raw-bit packing. Keep the float/f16 arms local; route every integer ptype throughPrimitiveArrays.toLongsvia the switchdefault. −48 LOC.Follow-up (tracked in TODO.md)
transposeIndex/iterateIndexstill carry per-call%//; a separate bench-gated pass can strength-reduce that hot math. Out of scope here (this PR is behavior-preserving).Ground truth green both directions: RustWritesJavaReads (12), JavaWritesRustReads (213), JavaRoundTrip. Unit: writer 1599 + reader 854.
🤖 Generated with Claude Code