fix(registry): deterministic encoder order via TreeMap (fixes Windows build)#102
Merged
Conversation
…Windows build) VortexWriter.create(.., WriteRegistry) with no cascade selects the first encoder whose accepts() matches the dtype, iterating WriteRegistry.encoderMap().values(). Two coupled defects made that selection wrong and unstable: 1. The map was Map.copyOf(HashMap) — non-deterministic iteration order — so the chosen encoder varied across runs and platforms. 2. ChunkedEncodingEncoder.accepts(Primitive) returned true, but encode() requires a ChunkedData wrapper the dtype-based write path never produces. It is not in DEFAULT_CODECS or the cascade and nothing dtype-dispatches to it — it is only reachable here. On Windows the hash order put it first for a plain long[], throwing ClassCastException; other platforms happened to order a leaf encoder first. Fixes (root, not symptom): - Both WriteRegistry and ReadRegistry back their maps with a TreeMap ordered by encoding name (Comparator.comparing(id) — Enum.compareTo is final, so natural enum order is ordinal and a Comparator is needed). Order is now a pure function of the registered set, independent of registration sequence or ServiceLoader's unspecified iteration order. - ChunkedEncodingEncoder.accepts() returns false: it is carrier-dispatched (selected explicitly when a caller holds ChunkedData), never dtype-dispatched, so it must not be a first-match candidate. Its explicit-encode path is unchanged and still covered by ChunkedEncodingEncoderTest. Tests: - WriteRegistryTest pins name-ordered iteration regardless of registration sequence, and that loadAll() is sorted by name. - ReadRegistryTest covers that a TreeMap-backed registry retains all decoders. - create_withFullRegistry_roundTrips drives the exact failing combination — create(loadAll()) -> writeChunk -> read back — as an end-to-end regression guard, rather than dodging it with a hand-picked single-encoder registry. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
a7aa667 to
f870c9c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The failure
Windows CI (run 27878549743) threw in
VortexWriterTest.create_withWriteRegistry_roundTrips:Root cause (not the symptom)
VortexWriter.create(.., WriteRegistry)selects the first encoder whoseaccepts()matches the dtype, iteratingWriteRegistry.encoderMap().values(). That map wasMap.copyOf(HashMap)— non-deterministic iteration order — so encoder selection varied across runs and platforms. On Windows the hash order put the compositeChunkedEncodingEncoder(accepts(Primitive) == true, but it needs aChunkedDatawrapper) ahead ofPrimitiveEncodingEncoderfor a plainlong[]. Mac/Linux happened to order a working encoder first → green there, red on Windows.Fix
Back both
WriteRegistryandReadRegistrywith aTreeMap, so iteration follows the id enum's natural ordering — a total order that is a pure function of the registered set, independent of:ServiceLoader's unspecified iteration order,register/registerServiceLoadedcalls interleave (the "what if another is added afterloadAll()?" case).registerServiceLoadedno longer needs to sort its own batch —build()owns the order.Tests
WriteRegistryTest—encoderMapiterates in natural key order regardless of registration sequence;loadAll()order is sorted.ReadRegistryTest— a TreeMap-backed registry retains all decoders.create_withWriteRegistry_roundTripsnow registers just the encoder the column needs (notloadAll(), which pulls in composite encoders unsuited to first-match leaf dispatch).Verified locally:
./mvnw verify -pl reader,writer -am— 0 checkstyle, javadoc clean, all tests pass.🤖 Generated with Claude Code