fix(reader): width-agnostic numeric compare in zone-map pruning (#159)#160
Merged
Conversation
731dc81 to
60137a4
Compare
) ScanIterator's comparator caught ClassCastException and returned 0, which canPruneChunk reads as "cannot prune". Stats decode integers as Long and floats as Float/Double, so a filter value boxed at a different width (Integer for I64, Float for F32) threw internally and silently disabled pruning — a valid, selective predicate degraded to a full scan with no signal. The comparison now keys off the *column* type, not the boxed operand: - floating column -> Double.compare; - unsigned int column -> Long.compareUnsigned (U64 stats/values store raw bits, so a value >= 2^63 is a negative Long; signed compare keeps/drops the wrong chunks). U8/U16/U32 zero-extend to a positive Long, unaffected; - signed int column -> Long.compare. Keying off the column also avoids routing an integer column through double-compare, which would lose precision past 2^53 and mis-prune. Eq/Neq previously had their own inline comparator with the same swallow; they now route through the shared one. A genuinely incomparable filter value (e.g. a String against a numeric column) now raises VortexException instead of a silent no-prune — a behaviour change, noted in the changelog. Adds DType.isUnsigned() (exhaustive over the sealed set) to classify the column. Coverage — ZoneMapPruningTest (27): BoxedWidth (Integer == Long, all six operators), Unsigned (U64 >= 2^63 keep/prune correctness), FloatWidths (F32 stat vs Double/Float filters), IntegerColumnFloatFilter (Double filter on I64 compares in the integer domain past 2^53), TypeMismatch (String throws). Plus DTypeIsUnsignedTest. Closes #159. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
60137a4 to
acbaa0b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #159.
ScanIterator's zone-map comparator caughtClassCastExceptionand returned0— whichcanPruneChunktreats as cannot prune. Stats decode integers asLongand floats asFloat/Double, so a filter value boxed at a different width (Integeron an I64 column,Floaton F32) threw internally and silently disabled pruning — a valid, selective predicate degraded to a full scan with no signal.Fix — key the comparison off the column type, not the boxed operand
Double.compareLong.compareUnsigned(U64stats/values store raw bits, so a value>= 2^63is a negativeLong; signed compare would keep/drop the wrong chunks — a correctness bug, not just perf).U8/U16/U32zero-extend to a positiveLong, unaffected.Long.compareKeying off the column also avoids ever routing an integer column through double-compare, which would lose precision past 2^53 and mis-prune (review point #1).
Eq/Neqhad their own inline comparator with the same swallow; they now route through the shared one.A filter value genuinely incomparable to its column (e.g. a
Stringagainst a numeric column) now raisesVortexExceptionduring the scan instead of silently disabling pruning. Callers relying on the old silent full scan will see an exception. Called out inCHANGELOG.md.API
Adds
DType.isUnsigned()(exhaustive switch over the sealed set) to classify the column.Tests
ZoneMapPruningTest(27): BoxedWidth (Integer==Long, all six operators), Unsigned (U64 >= 2^63keep/prune correctness), FloatWidths (F32 stat vsDouble/Floatfilters), IntegerColumnFloatFilter (Doublefilter on I64 compares in the integer domain past 2^53), TypeMismatch (Stringthrows). All groups parameterized.DTypeIsUnsignedTest.Full reader + writer suites green (incl. checkstyle/javadoc).
🤖 Generated with Claude Code