Skip to content

refactor(encoding): shared FastLanes layout + RLE toLongs delegation#137

Merged
dfa1 merged 4 commits into
mainfrom
refactor/fastlanes-common
Jun 22, 2026
Merged

refactor(encoding): shared FastLanes layout + RLE toLongs delegation#137
dfa1 merged 4 commits into
mainfrom
refactor/fastlanes-common

Conversation

@dfa1

@dfa1 dfa1 commented Jun 22, 2026

Copy link
Copy Markdown
Owner

Continues the encoding-helper dedup, behavior-preserving.

FastLanes (core)

FL_CHUNK_SIZE/FL_ORDER + transposeIndex/iterateIndex/lanes were duplicated in the Delta encoder + decoder; the low-bit typeMask and byteSize*8 width were copy-pasted across Delta, Bitpacked and Patched. Pulled into a shared core.encoding.FastLanes (CHUNK, transposeIndex, iterateIndex, lanes, lowMask) + PType.bits(). Cross-module (reader + writer) → home is core, mirroring PTypeIO.

Hot paths untouched by design: Bitpacked keeps its own FL_ORDER constant and its unrolled pack/unpack kernels are byte-identical — only the cold per-call typeMask/width setup routes through FastLanes. Delta's transposeIndex/iterateIndex were already standalone static calls, so the class-boundary move doesn't change inlining. Pco excluded (not FastLanes-family, perf-critical).

RLE toLongs delegation

Rle.toLongs is a superset of PrimitiveArrays.toLongs: identical integer widen + extra F32/F64/F16 raw-bit packing. Keep the float/f16 arms local; route every integer ptype through PrimitiveArrays.toLongs via the switch default. −48 LOC.

Follow-up (tracked in TODO.md)

transposeIndex/iterateIndex still carry per-call %//; a separate bench-gated pass can strength-reduce that hot math. Out of scope here (this PR is behavior-preserving).

Ground truth green both directions: RustWritesJavaReads (12), JavaWritesRustReads (213), JavaRoundTrip. Unit: writer 1599 + reader 854.

🤖 Generated with Claude Code

dfa1 and others added 4 commits June 22, 2026 23:10
…math

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ticket

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
FL_CHUNK_SIZE/FL_ORDER + transposeIndex/iterateIndex/lanes were duplicated in the
Delta encoder and decoder; the low-bit typeMask and the byteSize*8 width were also
copy-pasted across Delta, Bitpacked and Patched. Pull the FastLanes layout into a
shared core.encoding.FastLanes (CHUNK, transposeIndex, iterateIndex, lanes, lowMask)
and add PType.bits() for the width. Cross-module (reader + writer) so the home is
core, mirroring PTypeIO.

Hot paths deliberately untouched: Bitpacked keeps its own FL_ORDER constant and the
unrolled pack/unpack kernels are byte-identical — only the cold per-call typeMask /
width setup now routes through FastLanes. Delta's transposeIndex/iterateIndex were
already standalone static calls, so moving them across a class boundary does not
change inlining. Pco is excluded (not FastLanes-family, perf-critical).

Ground truth green both directions: RustWritesJavaReads (12), JavaWritesRustReads
(213), JavaRoundTrip.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… PrimitiveArrays

Rle.toLongs is a superset of PrimitiveArrays.toLongs: identical widen for the eight
integer ptypes, plus F32/F64/F16 raw-bit packing unique to RLE. The integer half was
a verbatim copy. Keep the float/f16 arms local and route every other ptype through
PrimitiveArrays.toLongs via the switch default (floats are matched first, so the
default only ever sees integers). Drops ~48 duplicated lines.

Ground truth green: JavaWritesRustReads (213).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@dfa1 dfa1 merged commit 7af0af2 into main Jun 22, 2026
6 checks passed
@dfa1 dfa1 deleted the refactor/fastlanes-common branch June 24, 2026 07:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant