Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
1672 commits
Select commit Hold shift + click to select a range
cc5af43
feat(cudf): Add CudfEnforceSingleRow GPU operator (#16920)
perlitz Apr 11, 2026
65f2759
refactor(S3): Consolidate S3Config ownership in S3FileSystem::Impl (#…
czentgr Apr 11, 2026
2664d06
fix(cudf): Use `enqueueForDevice` for cudf buffered input data source…
mhaseeb123 Apr 11, 2026
1a3d930
fix: Print verbose errors in cudf deps updater (#16332)
mhaseeb123 Apr 11, 2026
bceaa5e
fix(cuDF): Fix debug build failure (#17011)
pramodsatya Apr 11, 2026
5a729d8
fix(cudf): Fix count aggregation condition to run all count on GPU (#…
karthikeyann Apr 11, 2026
dc5e3e5
feat: support paimon append & primary with rawConvertible=true read (…
konjac-h Apr 11, 2026
579bf17
refactor: Simplify IndexLookupJoin stat recording (#17133)
xiaoxmeng Apr 11, 2026
0d7d1fc
feat: Add isPointLookup() to EncodedKeyBounds (#17138)
xiaoxmeng Apr 13, 2026
948e7be
fix(cudf): Fix stream and mr in SubfieldFiltersToAst (#17135)
jinchengchenghh Apr 13, 2026
bb4283b
feat(joins): Add nullAsValue flag for IS NOT DISTINCT FROM join key s…
mbasmanova Apr 13, 2026
ccedd30
feat: Introduce Avro format (#17084)
Apr 13, 2026
890f5b5
fix: Fix processedStrides_ counting per stride instead of per stripe …
Apr 13, 2026
f8332b8
fix(connector): Fix execution operators to use query-scoped Connector…
jagill Apr 13, 2026
6e5dc5d
feat(cudf): Add CudfMarkDistinct GPU operator (#16974)
perlitz Apr 13, 2026
2fa7690
feat: Wire SST into Hive connector with end-to-end tests (#17050)
Apr 13, 2026
1a85554
fix(cudf): Fix CudfEnforceSingleRow hang by moving input_ in getOutpu…
shrshi Apr 14, 2026
922ed65
build: Add s2geometry to CI setup scripts (#17131)
Apr 14, 2026
ff18c7b
fix(type): Capture Field not found message template in RowType (#17094)
HeidiHan0000 Apr 14, 2026
6efa7db
fix: Capture failure message templates in exec, expression, and type …
HeidiHan0000 Apr 14, 2026
6409cae
fix: Capture failure message templates in dwio, experimental, and fun…
HeidiHan0000 Apr 14, 2026
e6afa63
fix(aggregate): Allow DistinctAggregations to handle literal argument…
kagamiori Apr 14, 2026
9160fed
fix: Generate majority positive frame offset values in Window Fuzzer …
pratikpugalia Apr 14, 2026
c1f8a8f
fix: Change default TpchConnectorSplit weight (#17159)
arhimondr Apr 14, 2026
5f23942
misc: Make the `DirectBufferedInput` ctor used for clone protected (#…
rui-mo Apr 14, 2026
1ec53db
Sort function documentation in alphabetical order in array.rst and ag…
abhinavmuk04 Apr 14, 2026
1b20c7d
fix(ci): Strip ANSI color codes before extracting build errors (#17152)
kgpai Apr 14, 2026
0582e90
fix(cudf): Synchronize stream before freeing Arrow host buffers in to…
bdice Apr 14, 2026
52b8fe2
fix(ci): Pass github_token to claude-code-action for fork PR support …
kgpai Apr 14, 2026
921f619
refactor(cudf): Unify cuDF operators with a common base class archite…
coreylammie Apr 14, 2026
b1fa393
fix: Allow overriding ARM_BUILD_TARGET in centos-multi.dockerfile (#1…
bdice Apr 14, 2026
62a7636
feat: Add ConfigProperty, ConfigProvider, and QueryConfigProvider (#1…
mbasmanova Apr 15, 2026
5fddc34
fix: Use consistent kebab-case for sort-writer finish time slice conf…
mbasmanova Apr 15, 2026
1686564
fix: Disable FileHandleCache in TableEvolutionFuzzer to prevent fd ex…
bdice Apr 15, 2026
7281717
refactor: Remove redundant hive. prefix from Hive connector config ke…
mbasmanova Apr 15, 2026
b29919b
fix: Replace static inline registration with traits-based pattern in …
mbasmanova Apr 15, 2026
f1e43a9
feat(cudf): Support all unary functions in AST (#17111)
jinchengchenghh Apr 15, 2026
17488dc
test: Add map_concat expression benchmark (#17103)
Yuhta Apr 15, 2026
7eaea31
fix: Populate null placeholders in VariantToVector<ROW> children
natashasehgal Apr 15, 2026
7331a50
fix: SwitchExpr::computePropagatesNulls skips then-clauses due to wro…
pratikpugalia Apr 15, 2026
439c3ad
fix(build): Build spark-server stage in docker compose (#17186)
czentgr Apr 15, 2026
63a2a2c
fbcode/velox/serializers/PrestoSerializerSerializationUtils.cpp (#16703)
Apr 16, 2026
7c8d926
refactor(cudf): Register Spark and Presto functions separately (#16960)
jinchengchenghh Apr 16, 2026
329370d
fix: Ensure make_set_digest returns setdigest type (#17181)
jkhaliqi Apr 16, 2026
45a03a6
test(exec): Guard operator blocking stats collection with TSAN atomic…
hdikeman Apr 16, 2026
09c4fb9
feat: Add configProvider() to Connector and HiveConfigProvider (#17197)
mbasmanova Apr 16, 2026
169f6fe
refactor(cudf): Refactor CudfHiveDataSource to use split reader (#17171)
mhaseeb123 Apr 16, 2026
fcb63d3
feat: S2 presto functions (#15511)
Apr 16, 2026
239ebe6
fix(cudf): Remove deprecated call to filtered_join (#17160)
bdice Apr 16, 2026
5ecdd82
fix(build): Fix `-Werror=type-limits` warnings in `DecimalUtil.cpp` (…
bdice Apr 16, 2026
6ca5d45
feat: Add NULLIF special form (#17213)
mbasmanova Apr 16, 2026
0f6c308
fix(cudf): Defer CudfHashJoinProbe filter initialization to avoid mem…
shrshi Apr 16, 2026
4ca0a64
perf: Replace sort-based map_concat dedup with hash-based mapConcat (…
Yuhta Apr 16, 2026
3a86cb2
feat(iceberg): Add Iceberg V3 deletion vector and equality delete sup…
apurva-meta Apr 16, 2026
d13ddb0
build: Split flaky retry into separate GitHub Actions step for Linux …
pratikpugalia Apr 16, 2026
d7bfea4
feat(tdigest): Add winsorizedMean() method to TDigest (#17172)
talgalili Apr 16, 2026
12de622
feat(tdigest): Add winsorized_mean() scalar function on TDigest (#17173)
talgalili Apr 16, 2026
dacc5b4
feat(tdigest): Add approx_winsorized_mean aggregate function (#17174)
talgalili Apr 16, 2026
47fea3f
docs(tdigest): Add documentation for winsorized_mean and approx_winso…
talgalili Apr 16, 2026
560a4ed
refactor(tdigest): Extract shared TDigest helpers and fix fuzzer regi…
talgalili Apr 16, 2026
3da0390
fix(parquet): Null pointer dereference in TimestampColumnReader (#17191)
pratikpugalia Apr 17, 2026
e48f91a
docs(ci): Add README documenting all CI workflows (#17024)
kgpai Apr 17, 2026
5f1bbeb
feat(iceberg): Add Iceberg V3 deletion vector writer (#17220)
apurva-meta Apr 17, 2026
ba5d327
feat(cudf): Bypass cudf folder in clang-tidy check (#17170)
jinchengchenghh Apr 17, 2026
2911d14
Gate TableScan::getOutput estimateFlatSize behind enableOperatorBatch…
Apr 17, 2026
ec95b0d
feat(cudf): Add cudaGetLastError on debug mode in CudfOperator.h (#17…
jinchengchenghh Apr 18, 2026
6bb2e0b
feat: Exclude index source from grouped execution leaf validation (#1…
zacw7 Apr 18, 2026
f6bee22
feat(iceberg): Add DWRF file format support for Iceberg data sink (#1…
apurva-meta Apr 18, 2026
14d5317
feat(function): Add ip_version and ip_prefix_masklen Presto functions…
tc25898 Apr 18, 2026
6a49643
fix: Remove duplicate TDigestAccumulator causing ODR violation and SI…
amitkdutta Apr 20, 2026
cd335e8
refactor(reader): Rename memoryPool_ to pool_ in SelectiveColumnReade…
Apr 20, 2026
f666b03
fbcode/velox/expression/ComplexViewTypes.h (#16987)
Apr 20, 2026
ab3eea6
fix(build): Fix absl required by s2geometry not resolved (#17225)
zhztheplayer Apr 20, 2026
bd52780
fbcode/velox/serializers/PrestoSerializerEstimationUtils.cpp (#17240)
Apr 20, 2026
a9e2686
fix(cudf): Fix AST/Function mode switching during Expression Evaluati…
simoneves Apr 20, 2026
f1c6510
fix: Integer overflow in range filter boundary conversion (#17216)
peterenescu Apr 20, 2026
5e918cf
refactor(cudf): Migrate CudfMarkDistinct to CudfOperatorBase (#17206)
perlitz Apr 20, 2026
58b6fcc
fix(fuzzer): Handle task completion timeout in MemoryArbitrationFuzze…
kgpai Apr 20, 2026
3eb73e9
perf(expr): Add EvalCtx constructor with pre-computed inputFlatNoNull…
Apr 21, 2026
dc9e9f3
feat(rpc): Add AIMD congestion control and batch dispatch chunking fo…
zhichenxu-meta Apr 21, 2026
0187d5b
fix(cudf): Guard hash join build debug logging against empty inputs (…
pramodsatya Apr 21, 2026
9ab9879
feat(Cudf): Experimental direct GPU-to-GPU exchange (#16037)
dan13bauer Apr 21, 2026
7ed90a5
feat(cudf): GPU Decimal (Part 2 of 3) (#16750)
simoneves Apr 21, 2026
b481be0
feat(cudf): Add cudf to velox conversion benchmark (#17283)
jinchengchenghh Apr 22, 2026
999dd99
feat(cudf): Add startswith expression support (#17205)
firestarman Apr 22, 2026
ed87bc9
fix: Replace folly::grow_capacity_by with std::vector::reserve in Pre…
srsuryadev Apr 22, 2026
39843f2
Add pipeline-level driver timing stats. (#17237)
spershin Apr 22, 2026
24cee7c
feat(serializers): Add min_shuffle_compression_page_size_bytes to ski…
hdikeman Apr 22, 2026
954629e
Split Enums.h into EnumDeclare.h and EnumDefine.h (#17239)
srsuryadev Apr 23, 2026
5c79c29
feat: Add extraction pushdown in common reader framework (#17198)
Yuhta Apr 23, 2026
158139b
docs: Add blog post introducing Axiom (#17316)
mbasmanova Apr 23, 2026
3196b16
feat: Add expression.min_rows_for_peeling config to skip peeling for …
Apr 23, 2026
8ae7d8f
refactor: Migrate folly symbolizer references from folly/experimental…
8Keep Apr 23, 2026
a81dd18
feat(cudf): Add endswith expression support (#17313)
firestarman Apr 23, 2026
0615ad8
fix(cudf): Add back cudf_interop test (#17302)
jinchengchenghh Apr 23, 2026
e57e5fa
build: Update CI failure analysis comment in place across re-runs (#1…
pratikpugalia Apr 23, 2026
6efe23e
build: Allow meta-codesync bot to trigger CI failure analysis (#17322)
pratikpugalia Apr 23, 2026
c2e0241
perf: Optimize FlatMapVector map_filter with bitmap uniform detection…
peterenescu Apr 23, 2026
8d6a8b1
feat: Fold DATE, ROW, and struct constants in DuckDB parser (#17318)
mbasmanova Apr 23, 2026
8475135
fix: Ambiguous QueryConfig constructor call in QueryConfigTest (#17321)
Apr 24, 2026
fbe88a2
refactor(cudf): Refactor CudfHashAggregation into CudfGroupby, CudfRe…
devavret Apr 24, 2026
b60bde6
feat(udf): Add map_values UDFs (#17320)
abhinavmuk04 Apr 24, 2026
95c221d
Bypass buffer for oversized writes in BufferedWriterSink (#17295)
aaupov Apr 24, 2026
7f78744
fix(build): Replace undefined target velox_enums (#17317)
czentgr Apr 24, 2026
7dcf49c
feat: Add partition-based cluster index with LookupRequest API (#17329)
xiaoxmeng Apr 25, 2026
ce9ff5a
perf: Avoid re-hashing overflow rows in parallelJoinBuild (#17330)
duxiao1212 Apr 25, 2026
2cb0c55
feat(cudf): Add TPC-DS benchmark with reusable plan loader and CuDF s…
karthikeyann Apr 25, 2026
0364fa4
misc: Refactor extract common used getTable function (#17332)
duxiao1212 Apr 26, 2026
ece1dfd
fix(build): Add -fsized-deallocation for s2geometry on Clang (#17323)
Apr 26, 2026
cd3c5d5
fix: Fix spin loop in HashBuild non-last drivers with hash table cach…
shrinidhijoshi Apr 26, 2026
c871d6b
feat: Add raw_vector::shrink_to_fit() (#17349)
xiaoxmeng Apr 27, 2026
bd716d1
fix: Remove misleading comments for parallel hash table build (#17340)
duxiao1212 Apr 27, 2026
777a81f
refactor(reader): Rename pass string buffer config (#17348)
Apr 27, 2026
404e8de
feat: Optimize map_intersect performance with early-exit, reserve, an…
abhinavmuk04 Apr 28, 2026
f13e473
docs: Documentation of "Cudf Hive Connector Configuration (Experiment…
karthikeyann Apr 28, 2026
83d4a1f
feat: Optimize map_except UDF performance (#16811)
abhinavmuk04 Apr 28, 2026
2f0af71
fix: Handle empty index splits in IndexLookupJoin (#17296)
zacw7 Apr 28, 2026
5335309
docs: Add Nimble cluster index tech blog post (#17356)
xiaoxmeng Apr 28, 2026
3b31fc3
misc: Fix typos in TpchBenchmark.rst (#17362)
amitkdutta Apr 29, 2026
ac7f5c5
feat(cudf): Support join filters with non-AST expressions spanning bo…
shrshi Apr 29, 2026
7f64dbb
fix(build): GCC 11.2 build regression (#17208)
zhztheplayer Apr 29, 2026
e29ffc6
refactor: Use char literals for newline/tab in MemoryManager::toStrin…
lingbin Apr 29, 2026
3657f94
fix: Bump website dependencies to fix GitHub security vulnerabilities…
Yuhta Apr 29, 2026
b55874d
fix(function): Remove key-value pair limit from Spark map()
Apr 29, 2026
e72a1a5
Revert D101026565 (#17382)
Apr 30, 2026
932d087
refactor: Extract FileDataSink base class from HiveDataSink (#17345)
konjac-h May 1, 2026
ce7746d
fix: Use Presto-compatible timestamp format in PrestoTypes::valueToSt…
maniloya May 1, 2026
f3c9bf0
fix(exec): Avoid LOG(ERROR) noise from Task::terminate() on cancellat…
May 1, 2026
1d13450
feat(dwio): Expose row selection from projectColumns (#17379)
May 1, 2026
c3aee85
feat(cudf): Add unit tests for CUDF string functions (#16825)
simoneves May 2, 2026
e665d55
feat(cudf): Add GPU "and" and "or" (#16913)
simoneves May 2, 2026
3d33d04
docs: Add FlatMapVector blog post (#17386)
peterenescu May 2, 2026
6b21225
feat: Add separate data and metadata IoStatistics to ReaderOptions (#…
xiaoxmeng May 3, 2026
9bf04fe
feat: Add runtime stats for index lookup join (#17286)
zacw7 May 3, 2026
6e26e22
fix: Remove fmt/format.h from ConfigProperty.h (#17311)
srsuryadev May 4, 2026
0ca6c37
fix(filter): Avoid int64 overflow in combineRangesAndNegatedValues (#…
hdikeman May 4, 2026
c60d3bc
fix: HashAggregation with preGroupedKeys drops distinct rows on batch…
hdikeman May 4, 2026
76a49e6
fix: Handle NullIfTypedExpr in expression optimizer (#17378)
peterenescu May 5, 2026
935c255
fix(iceberg): Augment scanSpec for equality-delete columns not in pro…
apurva-meta May 5, 2026
a811dbd
build: Surface and fix MONO=OFF link gaps (#17387)
pratikpugalia May 5, 2026
2ecb5e2
perf: Use std::string_view instead of std::string reference (#17233)
MatzeB May 5, 2026
138fa8b
perf: `reserve()` vector upfront to avoid unnecessary reallocation (#…
MatzeB May 5, 2026
7cd7a44
perf: `reserve()` vectors upfront to avoid unnecessary reallocation (…
MatzeB May 5, 2026
9037bf6
perf: `reserve()` sets upfront to avoid later re-allocation/re-hashin…
MatzeB May 5, 2026
5dacefa
perf: avoid temporary std::string construction (#17324)
MatzeB May 5, 2026
0323b27
perf: Avoid double lookup when inserting not-preset-yet element into …
MatzeB May 5, 2026
505a21e
refactor: Enforce not-null IoStatistics in ReaderOptions (#17399)
xiaoxmeng May 5, 2026
fa53acd
refactor: Centralize xxhash inline include into XxHashInline.h (#17229)
hdikeman May 5, 2026
7ab4399
fix: Add generic CRC32 software fallback in SimdUtil (#17230)
hdikeman May 5, 2026
d84633c
refactor: Use 1LL/1ULL for 64-bit integer literal portability (#17261)
hdikeman May 5, 2026
16fd8de
fix(exec): Avoid lock contention on OutputBuffer read path (#17405)
hdikeman May 5, 2026
bc31d0d
feat(cudf): Support contains string predicates (#17325)
firestarman May 5, 2026
7558929
fix(cudf): Extend the CuDF function registry to allow for functions w…
mattgara May 5, 2026
6b7a9bb
misc(type): Add std::hash<velox::Type> consistent with operator==
May 5, 2026
7b55920
perf(type): Speed up date extraction using Neri-Schneider algorithm (…
Licht-T May 6, 2026
baea2cc
build: Surface per-instance fuzzer state on cancel/timeout (#17414)
pratikpugalia May 6, 2026
e01adf0
feat: Add contiguous allocation to AsyncDataCache and rename IO stats…
xiaoxmeng May 6, 2026
9137930
feat(): [velox] Add map_subset_key_in_range UDF (#17357)
abhinavmuk04 May 6, 2026
54bac93
build(cudf): Migrate RMM usage to CCCL MR design (#17178)
bdice May 6, 2026
f5a10e8
feat(iceberg): Support Iceberg V3 Default Column Values (initial-defa…
agrawalreetika May 6, 2026
c35450c
refactor: Move selective Nimble reader config from QueryConfig to Fil…
srsuryadev May 7, 2026
edc33c3
build: Bump Presto Java image to 0.297 (#17430)
ReemaAlzaid May 7, 2026
153ac40
feat(cudf): Implement null-aware left semi project join with filter (…
shrshi May 7, 2026
e79b942
fix(writer): Wire WriterOptions::memoryBudget into WriterContext::get…
srsuryadev May 7, 2026
2df0ed0
test: Add extraction delta update interaction and text reader tests (…
Yuhta May 7, 2026
0ccb463
fix(build): Remove unnecessary benchmark link lib (#17429)
czentgr May 7, 2026
d186854
fix(ci): Drop unsupported -v flag from Presto launcher (#17449)
kgpai May 7, 2026
90e8a39
feat(cudf): Add stddev_samp aggregation support (#17234)
shrshi May 8, 2026
72e8108
feat(cudf): Support 3-arg LIKE expression (#17417)
firestarman May 8, 2026
adf62d8
fix(build): Build cudf test utils with VELOX_BUILD_TEST_UTILS (#17426)
czentgr May 8, 2026
7a7f7df
fix: Back out Bump Presto Java image to 0.297 (#17455)
May 8, 2026
2145e80
feat: Add per-phase index lookup stats (#17416)
zacw7 May 8, 2026
c1ac8ac
feat: Add MetadataInput for IO-coalesced metadata loading with option…
xiaoxmeng May 8, 2026
54c5031
perf: skip lazy vector loading in FilterProject when output has no un…
littlesamo May 8, 2026
475ca02
build: Drop unnecessary velox_core_expressions dep (#17451)
pratikpugalia May 8, 2026
d3c4575
fix(cache): drop stale checkpoint when checkpointing is disabled (#17…
pratikpugalia May 8, 2026
e958eb1
feat(velox): Support implicit integer-to-decimal type coercion (#17447)
peterenescu May 9, 2026
6d076b5
feat: Add row lineage metadata columns to Iceberg reader (#16716)
Joe-Abraham May 9, 2026
613c5d7
refactor: Move FileHandle and FileProperties to velox/common/caching …
xiaoxmeng May 9, 2026
2bbc11e
fix(type): Remove folly/dynamic.h from StringView.h, Timestamp.h, and…
srsuryadev May 9, 2026
ffb67c4
fix: Fix BufferPool::release to take BufferPtr&& with uniqueness chec…
xiaoxmeng May 9, 2026
7ebe238
fix: Use singleton instance for opaque type when parsing types (#17403)
May 9, 2026
9788831
fix(type): Move inline serialize/deserialize methods from Type.h to T…
srsuryadev May 9, 2026
47c35e5
refactor: Add IO stats setters to ReaderOptions and use pool-only con…
xiaoxmeng May 10, 2026
101d383
fix(type): Remove folly/dynamic.h from Type.h by extracting TypeSerde…
srsuryadev May 10, 2026
00b0e79
feat(dwio): Add session properties for Nimble reader string optimizat…
May 10, 2026
1d29eef
fix(build): Add missing includes (#17475)
ot May 11, 2026
da5ea4c
test(wave): Use default numeric indices for AtomicTest.cu (#17479)
Yuhta May 11, 2026
6b5e58e
feat: Support reading Iceberg TIME (TIME_MICROS) columns (#17478)
apurva-meta May 12, 2026
1fdeeb0
feat: Add `TimestampUtcType` as built-in logical type (#17091)
rui-mo May 12, 2026
869e131
fix: Fix aliased column name lookup in HiveIndexSource init (#17484)
zacw7 May 12, 2026
362ef8f
fix: Type deserialization of the `NullIfTypedExpr` (#17360)
rui-mo May 12, 2026
fa744f4
fix(cudf): Return null for decimal division by zero (#17481)
shrshi May 12, 2026
91aac8b
feat: Add valueAt() and containsKey() to EnumTypeBase (#17491)
HeidiHan0000 May 12, 2026
0611bbe
feat(parquet): Add support for Parquet column encoded DELTA_LENGTH_BY…
minhancao May 12, 2026
edc328d
fix(cudf): Preserve null literals in AST expressions (#17365)
mattgara May 12, 2026
9bf511b
feat(cudf): Add CudfGroupId operator for GPU-accelerated GROUPING SET…
shrshi May 12, 2026
377515e
feat: Add partition column support to HiveIndexSource (#17485)
zacw7 May 12, 2026
57eaf8f
feat: Add NimbleWriterOptionsAdapter for Iceberg writes (#17493)
apurva-meta May 13, 2026
bb9995d
feat(cudf): Add GPU NestedLoopJoin with inner, left, right, full, and…
perlitz May 13, 2026
2e33a29
feat: Add per-batch scan stats callback to TableScan (#17494)
May 13, 2026
fd63266
fix(build): SpatialJoinFuzzer error: ‘x’ is used uninitialized (#16626)
PingLiuPing May 13, 2026
540ee69
perf(parquet): Direct decompress for snappy/zstd bypassing intermedia…
jaylisde May 13, 2026
1efd356
feat: Add extraction stream skipping in DWRF reader (#17337)
Yuhta May 13, 2026
d60815e
fix(abfs): Avoid extra copy in preadv (#17370)
zhli1142015 May 13, 2026
19bf0ed
fix: Fix the memory pool hierarchy of writer node (#16240)
wecharyu May 13, 2026
70f9740
fix: Netlify CI by rewriting yarn.lock URLs to public registry (#17505)
May 13, 2026
55f7bae
test: Add end-to-end extraction table scan tests (#17339)
Yuhta May 13, 2026
d7922b8
feat: Add non-index condition support to HiveIndexSource (#17486)
zacw7 May 14, 2026
a9263b7
feat: Add KeyDecoder for Velox serializer keys (#17510)
azhavnerchik-meta May 14, 2026
baba102
perf: Reuse vector in DistinctAggregations (#17480)
kagamiori May 14, 2026
ab5738a
fix: Call base initialize method in Expand operator (#17482)
philo-he May 14, 2026
5b3748b
fix(macos): Install protobuf@21 from source instead of Homebrew (#16349)
jkhaliqi May 14, 2026
e3a18e6
feat: Add VectorFunctionListener registry for observing VectorFunctio…
May 14, 2026
59f88d5
fix: Deletion vector offsets must comply with spec (#17511)
mhaseeb123 May 14, 2026
485f4a6
feat: V3 DeleteFile typed fields + dataSequenceNumber on HiveIcebergS…
apurva-meta May 14, 2026
ddb1702
build: Tighten OSS pr-review skill (#17488)
pratikpugalia May 14, 2026
d0a5325
Re-sync with internal repository
facebook-github-bot May 14, 2026
888bab8
feat(wave): Add durable kernel caching infrastructure to wave/common …
oerling May 14, 2026
6800d5b
feat: Add FileMetadata return to Writer::close() and introduce Writer…
mohsaka May 14, 2026
78ebafc
fix(spark): Remove TIMESTAMP_NTZ type (#17512)
rui-mo May 14, 2026
6af81f0
perf(hashtable): Add adaptive prefetch to hashRows normalizedKey path…
jaylisde May 14, 2026
805db6b
fix: Validate reduce_agg initial state (#17398)
pramodsatya May 14, 2026
fd130f4
feat: Add iceberg data file statistics (#17388)
mohsaka May 15, 2026
5680a32
Refactor ReaderOptions IoStatistics from raw pointers to shared_ptr (…
xiaoxmeng May 15, 2026
53b6a8b
fix(tracer): Fix TableWrite trace replayer to register connector and …
May 15, 2026
02f22e8
feat: Add partition key propagation to ScanBatchEvent callback (#17513)
May 15, 2026
2cea959
feat: Add support for TZDIR environment variable (#15871)
boneanxs May 15, 2026
a983d9b
perf(simd): Add arch-aware boolean mask helpers (#17257)
mpurbay-arm May 15, 2026
8801a43
Add reallocateBytes to MemoryAllocator to avoid unnecessary memcpy (#…
May 15, 2026
ea683f0
fix(cudf): Fix CudfSplitReader ReaderOptions init (#17532)
jinchengchenghh May 15, 2026
982637f
feat: Support dialect-specific type coercion (#17519)
mbasmanova May 15, 2026
2a5e3f9
fix: Drop HashTable cache entry on builder failure (#17527)
shrinidhijoshi May 15, 2026
1718279
build(docker): Pin tzdata across all velox-dev images (#17535)
kgpai May 15, 2026
951677f
refactor(cudf): Alphabetize objects in Velox-cuDF CMake files (#17517)
shrshi May 16, 2026
d41a228
refactor: Enforce callers to provide metadataIoStats to TabletReader …
duxiao1212 May 16, 2026
fb0826a
docs: Add PR review scripts and style guide (#17524)
mbasmanova May 17, 2026
472d319
fix: Pass request type to SelectiveDecimalColumnReader (#17463)
beliefer May 17, 2026
81dff12
refactor(encoding): Move encoding selection files to `selection/` sub…
srsuryadev May 17, 2026
103b8c8
fix(cudf): Show stats for adapter operators not in plan tree (#17541)
jaylisde May 18, 2026
bec8806
fix(parquet): Include file column name in schema-mismatch error (#165…
1fanwang May 18, 2026
6af674c
feat(fuzzer): Add WindowNode-based alternate plan to TopNRowNumberFuz…
claude May 18, 2026
81bd75b
perf: `reserve()` desired map size upfront (#17383)
DenisYaroshevskiy May 18, 2026
9ba9079
fix: Mark EnforceSingleRowNode as requiring single-thread execution (…
mbasmanova May 18, 2026
324c2b9
feat(spark): Add bitmap_construct_agg aggregate function (#17487)
May 18, 2026
abfdf48
feat: Add runtime stat measuring number of passthrough rows for aband…
zhztheplayer May 18, 2026
c5653f3
fix(cudf): Consume probe input on empty build in NLJ to prevent excha…
shrshi May 18, 2026
fdac4fb
feat: Add expression-level tracing support to Velox (#17369) (#17369)
patrickstuedi May 18, 2026
ef33d20
feat: Add flux as valid index source (#17542)
azhavnerchik-meta May 18, 2026
c7d0e59
Merge upstream main into velox-cudf
bdice May 19, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
21 changes: 17 additions & 4 deletions .clang-tidy
Original file line number Diff line number Diff line change
@@ -1,20 +1,26 @@
Checks: >
*,
-abseil-*,
-altera-struct-pack-align,
-altera-unroll-loops,
-android-*,
-cert-err58-cpp,
-boost-use-ranges,
-bugprone-easily-swappable-parameters,
-cert-err58-cpp,
-clang-analyzer-osx-*,
-cppcoreguidelines-avoid-c-arrays,
-cppcoreguidelines-avoid-const-or-ref-data-members,
-cppcoreguidelines-avoid-do-while,
-cppcoreguidelines-avoid-goto,
-cppcoreguidelines-avoid-magic-numbers,
-cppcoreguidelines-avoid-non-const-global-variables,
-cppcoreguidelines-non-private-member-variables-in-classes,
-cppcoreguidelines-owning-memory,
-cppcoreguidelines-pro-bounds-array-to-pointer-decay,
-cppcoreguidelines-pro-bounds-constant-array-index,
-cppcoreguidelines-pro-bounds-pointer-arithmetic,
-cppcoreguidelines-pro-type-reinterpret-cast,
-cppcoreguidelines-pro-type-vararg,
-cppcoreguidelines-pro-type-vararg,
-cppcoreguidelines-special-member-functions,
-fuchsia-*,
-google-*,
Expand All @@ -25,25 +31,32 @@ Checks: >
-hicpp-special-member-functions,
-hicpp-use-equals-default,
-hicpp-vararg,
-hicpp-vararg,
-llvm-header-guard,
-llvm-include-order,
-llvm-qualified-auto,
-llvmlibc-*,
-misc-no-recursion,
-misc-include-cleaner,
-misc-no-recursion,
-misc-non-private-member-variables-in-classes,
-misc-unused-parameters,
-modernize-avoid-c-arrays,
-modernize-deprecated-headers,
-modernize-use-designated-initializers,
-modernize-use-nodiscard,
-modernize-use-trailing-return-type,
-mpi-*,
-objc-*,
-openmp-*,
-performance-avoid-endl,
-performance-enum-size,
-readability-avoid-const-params-in-decls,
-readability-convert-member-functions-to-static,
-readability-function-cognitive-complexity,
-readability-identifier-length,
-readability-implicit-bool-conversion,
-readability-magic-numbers,
-readability-math-missing-parentheses,
-readability-qualified-auto,
-zircon-*,

HeaderFilterRegex: '.*'
Expand Down
235 changes: 235 additions & 0 deletions .claude/CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,235 @@
# CLAUDE.md

Guidance for Claude Code when working in the Velox repository.

## Branch Hygiene

Before creating a new feature branch, always:

1. `git checkout main`
2. `git fetch upstream && git rebase upstream/main`
3. Delete stale feature branches.
4. Push main to origin if behind upstream.
5. `git checkout -b <new-branch>`

## PR Review

When asked to review a PR (via `/pr-review`), always use the /pr-review skill.

### Review scripts

Use `scripts/review/fetch.py` and `scripts/review/post.py` for PR reviews
instead of raw `gh api` calls.

```bash
# Fetch PR metadata, diff, comments, and reviews in one shot.
python3 scripts/review/fetch.py <owner/repo> <pr-number>
python3 scripts/review/fetch.py <github-pr-url>

# Post a review from a file.
python3 scripts/review/post.py <owner/repo> <pr-number> <event> <body-file>
python3 scripts/review/post.py <github-pr-url> <event> <body-file>
# Events: APPROVE, REQUEST_CHANGES, COMMENT
```

Always draft the review body in `/tmp/` and get approval before calling
`post.py`.

### Review style

See [scripts/review/REVIEW_GUIDE.md](../scripts/review/REVIEW_GUIDE.md).

## Queries

When asked a question about the PR or codebase (via `/query`), use the /query skill.

## Overview

Velox is an open source C++ library for composable data processing and
query execution. Licensed under Apache 2.0. Requires C++20, GCC 11+ or
Clang 15+.

## Build

```bash
make debug # debug build
make release # optimized build
```

## Testing

```bash
make unittest # run all tests
cd _build/debug && ctest -j 8 # run all tests in parallel
ctest -R ExprTest # run tests matching a pattern
```

Test files live in `tests/` subdirectories alongside source.

### Grouped tests

Four test suites use `velox_add_grouped_tests` to reduce link times on Linux CI
by batching source files into shared binaries:
- `velox/exec/tests` (`velox_exec_test`, `velox_exec_util_test`)
- `velox/functions/prestosql/aggregates/tests`
- `velox/common/caching/tests`
- `velox/serializers/tests`

All other test suites use individual binaries on all platforms.

On macOS, grouping is off by default (`VELOX_ENABLE_GROUPED_TESTS=OFF`) and each
test file gets its own binary (e.g., `ValuesTest.cpp` → `velox_exec_test_ValuesTest`).
On Linux CI, grouping is on (`velox_exec_test_group0` through `_group7`).
Override with `-DVELOX_ENABLE_GROUPED_TESTS=ON/OFF`.

### Common test workflows

```bash
# Run all test binaries whose ctest name matches a regex.
# On Linux this matches velox_exec_test_group0 … _group7.
# On macOS this matches velox_exec_test_ValuesTest,
# velox_exec_test_HashJoinTest, etc.
cd _build/debug && ctest -R velox_exec

# Run a specific test file (macOS — individual binary)
_build/debug/velox/exec/tests/velox_exec_test_ValuesTest --gtest_filter="ValuesTest.*"

# Run a specific test case (Linux — grouped binary)
_build/debug/velox/exec/tests/velox_exec_test_group3 --gtest_filter="ValuesTest.empty"
```

**Re-running a CI failure locally:** CI reports a failure in
`velox_exec_test_group3` with `ValuesTest.empty`. On Linux, run the grouped
binary directly. On macOS, the grouped binary does not exist — use the
per-file binary instead: `velox_exec_test_ValuesTest --gtest_filter="ValuesTest.empty"`.

**Adding a new test to a grouped suite:** Add the source file to the `SOURCES`
list in the relevant `velox_add_grouped_tests()` call in `CMakeLists.txt`. It
is automatically assigned to a group on Linux and gets its own binary on macOS.

**Creating a new test suite:** Use `velox_add_grouped_tests` for suites with
many test files (10+) that link against large libraries like velox core — each
individual binary pays the full link cost, so grouping them into shared binaries
significantly reduces total CI build time. For suites with only a few test files
or lightweight dependencies, use `add_executable` / `add_test`.

## Formatting

```bash
make format # format all changed files
```

## Coding Style

Read [CODING_STYLE.md](../CODING_STYLE.md) for the complete guide. Key rules
are summarized below.

### Comments

- Use `///` for public API documentation (classes, public methods, public members).
- Use `//` for private/protected members and comments inside code blocks.
- Start comments with active verbs, not "This class…" or "This method…".
- ❌ `/// This class builds query plans.`
- ✅ `/// Builds query plans.`
- Comments should be full English sentences starting with a capital letter and ending with a period.
- Comment every class, every non-trivial method, every member variable.
- Do not restate the variable name. Either explain the semantic meaning or omit the comment.
- ❌ `// A simple counter.` above `size_t count_{0};`
- Avoid redundant comments that repeat what the code already says. Comments should explain *why*, not *what*.
- Use `// TODO: Description.` for future work. Do not include author's username.
- Do not duplicate comments between `.h` and `.cpp`. Document the function in the header; the implementation should not repeat the same comment. Duplicated comments diverge over time.

### Naming Conventions

- **PascalCase** for types and file names.
- **camelCase** for functions, member and local variables.
- **camelCase_** for private and protected member variables.
- **snake_case** for namespace names and build targets.
- **UPPER_SNAKE_CASE** for macros.
- **kPascalCase** for static constants and enumerators.
- Do not abbreviate. Use full, descriptive names. Well-established abbreviations (`id`, `url`, `sql`, `expr`) are acceptable.
- Prefer `numXxx` over `xxxCount` (e.g. `numRows`, `numKeys`).
- Never name a file or class `*Utils`, `*Helpers`, or `*Common`. These generic
names attract unrelated functions over time and lose cohesion. Name files and
classes after the concept they represent. Use a class with static methods to
group related operations, and shorten method names since the class name
provides context.

### Asserts and CHECKs

- Use `VELOX_CHECK_*` for internal errors, `VELOX_USER_CHECK_*` for user errors.
- Prefer two-argument forms: `VELOX_CHECK_LT(idx, size)` over `VELOX_CHECK(idx < size)`.
- Use `VELOX_FAIL()` / `VELOX_USER_FAIL()` to throw unconditionally.
- Use `VELOX_UNREACHABLE()` for impossible branches, `VELOX_NYI()` for unimplemented paths.
- Put runtime information (names, values, types) at the **end** of error messages, after the static description.
- ❌ `VELOX_USER_FAIL("Column '{}' is ambiguous", name);`
- ✅ `VELOX_USER_FAIL("Column is ambiguous: {}", name);`

### Variables

- Prefer value types, then `std::optional`, then `std::unique_ptr`.
- Prefer `std::string_view` over `const std::string&` for function parameters.
- Use uniform initialization: `size_t size{0}` over `size_t size = 0`.
- Declare variables in the smallest scope, as close to usage as possible.
- Use digit separators (`'`) for numeric literals with 4 or more digits: `10'000`, not `10000`.
- Use trailing commas in multi-line initializer lists, enum definitions, and
function-call argument lists that span multiple lines. This produces cleaner
diffs when items are added or reordered.

### API Design

- Keep the public API surface small.
- Prefer free functions in `.cpp` (anonymous namespace) over private/static class methods.
- Define free functions close to where they are used, not grouped together at the top or bottom of the file.
- Keep method implementations in `.cpp` except for trivial one-liners.
- Avoid default arguments when all callers can pass values explicitly.
- Never use `friend`, `FRIEND_TEST`, or any friend declarations. If a test needs access to private members, redesign the API or test through public methods instead.

### Tests

- Place new tests next to related existing tests, not at the end of the file. Group tests by topic (e.g., place `tryCast` next to `types`, `notBetween` next to `ifClause` which uses `between`).

Use gtest container matchers (`testing::ElementsAre`, etc.) for verifying collections:

```cpp
// ❌ Avoid - multiple individual assertions
EXPECT_EQ(result.size(), 3);
EXPECT_EQ(result[0], "a");
EXPECT_EQ(result[1], "b");
EXPECT_EQ(result[2], "c");

// ✅ Prefer - single matcher assertion
EXPECT_THAT(result, testing::ElementsAre("a", "b", "c"));
```

Common matchers:
- `ElementsAre(...)` - exact ordered match
- `UnorderedElementsAre(...)` - exact unordered match
- `Contains(...)` - at least one element matches
- `IsEmpty()` - collection is empty
- `SizeIs(n)` - collection has n elements

Requires `#include <gmock/gmock.h>`.

## Common Mistakes

These are frequently violated rules. Check every new or modified line against
this list before finishing.

- **Bug fixes without a failing test first.** Write the test first, confirm it fails, then fix. A test that passes with and without the fix proves nothing.
- **`///` vs `//` wrong comment style.** `///` is only for public API in headers. Everything else uses `//`.
- **One-letter and abbreviated variable names.** Use full, descriptive names. Only loop indices (`i`, `j`) are acceptable.
- **Undocumented APIs in headers.** Every class, method, and member variable in a `.h` file must have a comment.
- **Non-trivial implementations in headers.** If a method body has more than one statement, it belongs in the `.cpp` file.
- **`goto` statements.** Never use `goto`. Use early returns, helper functions, or duplicated code paths.
- **Fitting tests to buggy code.** Never update test expectations to match buggy output without verifying correctness first.
- **Generic file and class names.** Never name a file or class `*Utils`, `*Helpers`, or `*Common`.
- **Verify causation before asserting it.** Do not attribute failures to a commit based on its message alone. Verify empirically.
- **Silently simplifying an approved plan.** If a step is harder than expected, say so and get approval before reducing scope.
- **Working around infrastructure bugs.** Do not silently work around bugs in shared infrastructure. Report and discuss.

## Design Documents

Design (including proposals) live in `docs/designs/`. When creating new
designs, place them there with a descriptive filename (e.g.,
`column-extraction-pushdown.md`).
Loading
Loading