Releases: rapidsai/rapidsmpf
Releases · rapidsai/rapidsmpf
v26.04.00
What's Changed
🚨 Breaking Changes
- Enable Explicit Context Shutdown by @madsbk in #794
- [Refactor]
AllowOverbookingenum by @madsbk in #797 - String utilities by @madsbk in #803
- Move to_device coroutine to TableChunk by @wence- in #832
- Overhaul/Cleanup of Errors by @madsbk in #831
- Removing cudax components by @nirandaperera in #856
- Remove multi-message support in shuffle chunks by @madsbk in #863
- Rename
Node→Actorby @madsbk in #866 - Statistics cleanup by @madsbk in #871
- Reimplement allreduce using a recursive doubling scheme by @wence- in #872
- Introduce
Statistics::record_copyand consolidate copy/spill statistics by @madsbk in #875 - Remove logger from progress thread by @wence- in #889
- Stream ordered timings by @madsbk in #882
- Split the streaming context and communicator apart by @wence- in #895
🐛 Bug Fixes
- Make C++ SPDX headers consistent by @pentschev in #798
- Pass objects safely from Python coroutines to C++ coroutines by @pentschev in #796
- Fix option rename:
memory_reserve_timeout_ms => memory_reserve_timeoutby @madsbk in #815 - Revert microseconds representation to "us" by @pentschev in #823
read_parquetNode bug by @nirandaperera in #824- Add missing wait to
StreamingMemoryReserveOrWait.CheckPriorityby @pentschev in #836 - Raise missing
std::invalid_argumentby @pentschev in #837 - Avoid using "p2p" shuffle in dask-cudf test by @rjzamora in #877
- Promote nested closures in run_actor_network to top-level by @wence- in #878
- Add ucxx as a runtime dependency by @Matt711 in #876
- Fix deprecated commandline args for mpirun in ctest launch by @wence- in #886
- Update RMM imports for cuda_stream deprecation by @TomAugspurger in #888
📖 Documentation
- Add a glossary to the documentation by @TomAugspurger in #759
- Reorganize the docs by @madsbk in #870
- fix links in docs by @madsbk in #879
🚀 New Features
- Add
AllReducecollective by @pentschev in #683 - Introduce standard
ChannelMetadatadefinition by @rjzamora in #819 - Add Slurm support to rrun with PMIx-based coordination by @pentschev in #775
- Add more bootstrap utils to Python API by @pentschev in #868
- Add "contiguous" partition-assignment variation by @rjzamora in #891
🛠️ Improvements
- Option:
dask_spill_to_pinned_memoryby @madsbk in #800 - Python bindings for
MemoryReserveOrWaitby @madsbk in #787 - One more C++ string utils by @madsbk in #813
- Update versions in pre-commit config by @wence- in #812
- Config options: add typed
insert_if_absent<T>by @madsbk in #816 - Report "means" in
bench_commby @pentschev in #820 - Add mechanism for sending metadata through channels by @wence- in #811
- Update partition logic to accommodate cudf changes by @PointKernel in #805
- Split ndsh validation into run and validate by @TomAugspurger in #801
- Add
from_options()factory methods by @madsbk in #825 - Introduce the coroutine
reserve_memory()and use it in the ndsh workflows by @madsbk in #827 - Handle
datetypes in ndsh queries by @TomAugspurger in #799 - Support PackedData from host bytes in Python by @rjzamora in #828
- Increase tag space for OpID by @wence- in #826
- Drop Python 3.10 support by @gforsyth in #829
- Enable setting/ updating cuda stream in
Bufferby @nirandaperera in #814 - tighten wheel size limits, expand CI-skipping logic, other small build changes by @jameslamb in #830
- Use
TopologyDiscoveryfrom cuCascade by @pentschev in #774 - Migrate bloom filter and related utilities into rapidsmpf proper by @wence- in #834
- Rename
columnstocolumn_namesin parquet reader options by @Matt711 in #835 - remove pip.conf migration code in CI scripts, update CI-skipping rules by @jameslamb in #839
- New
make_table_chunks_available_or_wait()by @madsbk in #838 - Add python bindings for streaming bloom filter by @wence- in #841
- Removing packed_columns from TableChunk by @nirandaperera in #840
- More string parsing utils by @madsbk in #845
- Use verify-hardcoded-version pre-commit hook by @KyleFromNVIDIA in #846
- Add unbounded per-context file read cache by @madsbk in #842
- Minor performance fixes by @wence- in #849
- Use GHA id-token for
sccache-distauth token by @trxcllnt in #852 - Streaming Q4 implementation by @TomAugspurger in #710
- Update clang-format's include order by @madsbk in #861
- Rename columns to column_names in q04 by @Matt711 in #860
- Expose
local_partitionsof shuffler to Python by @wence- in #862 - Add
safe_castutility for overflow-safe integer conversions by @madsbk in #864 - Style Inconsistency Fixes by @madsbk in #867
- refactor: build wheels and conda packages using Python limited API by @gforsyth in #850
- Statistics: decouple formatters from stats by @madsbk in #880
- refactor(limited api): add explicit
wheel.py-apitopyproject.toml
by @gforsyth in #887 - Statistics: add JSON export by @madsbk in #883
- Use read_parquet_metadata to get column types by @Matt711 in #865
- Let the communicator own a progress thread by @wence- in #884
- Simplify python setup of communicators and progress thread in integrations by @wence- in #892
- Store logger as a shared pointer by @wence- in #894
- Update Cython lower bound pin to 3.2.2 by @vyasr in #893
- Remove pytest upper bound pin by @vyasr in #898
- Add support for Python 3.14 by @gforsyth in #881
- Install
rrunby @madsbk in #900 - Add configurable pool properties to PinnedMemoryResource by @nirandaperera in #851
- Add ability to obtain the shape of a
TableChunkindependently of where the packed data resides by @wence- in #905 - Add
devcontainersoutput todependencies.yamlthat excludes keys that pull in ray-default by @trxcllnt in #906 - Rename rrun environment variables prefix to
RRUN_by @pentschev in #908 test_gather_shuffle_statistics(): only check for existence by @madsbk in #912- Clarify and ensure
op_idreuse after local extraction is safe forAllGatherandAllReduceby @wence- in #909 - Remove ack phase in shuffler communication event loop by @wence- in #910
- build wheels with CUDA 13.0.x, test wheels against mix of CTK versions by @jameslamb in #919
- docs-build: use Python ...
v26.02.00
What's Changed
🚨 Breaking Changes
TableChunk.copy(): allow wiggle room by @madsbk in #672- Reorganize buffer/memory code by @madsbk in #673
- Reorganize
ScopedMemoryRecordby @madsbk in #676 - Don't build pip devcontainer by @wence- in #670
- Expose async collectives to Python by @wence- in #685
- Memory type refactor by @madsbk in #701
- Host buffer by @madsbk in #696
PinnedMemoryResourcerefactor by @madsbk in #714PinnedMemoryResource: use unlimitedrelease_thresholdby @madsbk in #735- Introducing
MemoryType::PINNED_HOSTby @madsbk in #731 - Python bindings for
PinnedMemoryResourceby @madsbk in #742 - Refactor a Context's
coro::thread_poolby @madsbk in #776 Contextnow owns aMemoryReserveOrWaitinstance by @madsbk in #783- Backport #794: Enable Explicit Context Shutdown by @pentschev in #809
🐛 Bug Fixes
- Disable sccache for clang-tidy builds by @pentschev in #695
- Add rapids-generate-version by @bdice in #699
- Fix data races by @madsbk in #694
- Add missing memory type parameters docs by @pentschev in #703
- Fix read parquet multi rank by @wence- in #708
- Fix retrieving number of ranks when running benchmarks with
rrunby @pentschev in #709 - Install OpenMPI explicitly in Python test matrix by @pentschev in #720
- Add missing
nogilfor shuffler insertion by @rjzamora in #722 - Pin Ray<2.52 by @pentschev in #730
- Re-enabling linking all tests by @nirandaperera in #741
- Force BUILD_TESTING flag for CTest by @bdice in #739
- q09: fix
cudf::transform()call. by @madsbk in #746 - Fix
rrun's--bind-tooptions by @pentschev in #749 - Fix finding of preexisting rapidsmpf build (for environments like devcontainers) by @vyasr in #754
- [MINOR] Fix
build.shby @nirandaperera in #756 - Fix
non_device_sizecalculation in unpack_and_concat by @TomAugspurger in #765 - Trigger CI by @TomAugspurger in #768
- Remove
modeargument fromto_pylibcudfcall by @pentschev in #770
📖 Documentation
🚀 New Features
- Add missing
rrunsupport to C++ benchmarks by @pentschev in #679 - Add missing
rrunsupport to Python benchmark by @pentschev in #680 - Support
rrunbinding to resources by @pentschev in #704
🛠️ Improvements
- [MINOR] Increase CI timeout to 10m by @nirandaperera in #665
- Use strict priority in CI conda tests by @bdice in #684
- Use
rapidsai/sccachefork inconda-cpp-lintersjob by @trxcllnt in #697 - Python bindings for
Context.spillable_messages()by @madsbk in #690 - bench-shuffle: discard output by @madsbk in #689
- Adding Fanout node by @nirandaperera in #636
- Use strict priority in CI conda tests by @bdice in #700
- TPCH-derived Q9 by @wence- in #663
HostMemoryResource: Enable Transparent Huge Pages (THP) by @madsbk in #712SpillableMessages: support message copy by @madsbk in #713- Remove alpha specs from non-RAPIDS dependencies by @bdice in #715
SPILL_TARGET_MEMORY_TYPESby @madsbk in #718- Extend
bench_memory_resourcesby @madsbk in #719 - Enable merge barriers by @KyleFromNVIDIA in #721
- Formalize Buffer Memory Types and Storage Mapping by @madsbk in #729
- Add static library for
cudf_testsby @pentschev in #733 - Enable clang-tidy checks in
build.shby @pentschev in #728 - Include spill amounts in NVTX payloads by @TomAugspurger in #717
- Add devcontainer fallback for C++ test location by @bdice in #734
- Reorder
bench_shufflebarriers by @pentschev in #744 - libcoro: use the newest commit by @madsbk in #747
- Make unbounded fanout messages spillable by @nirandaperera in #711
- Implement Q1 and Q3 and refactor to provide utilities by @wence- in #738
- Compatibility with CCCL 3.2 by @bdice in #755
- CI: run
bench_memory_resourcessmoke tests by @madsbk in #748 - ndsh: add
--no-pinned-host-memoryby @madsbk in #752 - Cudf pack/ chunked_pack benchmark by @nirandaperera in #745
- Empty commit to trigger a build by @bdice in #763
- [MINOR] Fix build warnings by @nirandaperera in #762
- Use SPDX license identifiers in pyproject.toml, bump build dependency floors by @jameslamb in #766
- Add CUDA 13.1 support by @bdice in #750
- build and test against CUDA 13.1.0 by @jameslamb in #767
- Allow CMake v4+ by @trxcllnt in #669
- Test pip devcontainers in CI by @trxcllnt in #727
- Host buffer with a type-erased owner by @nirandaperera in #764
- Memory reserve or wait by @madsbk in #688
- System info utils by @madsbk in #743
- Cython coroutine handling by @madsbk in #769
- Add streaming benchmark validator by @TomAugspurger in #760
- Empty commit to trigger a build by @jameslamb in #782
- Basic streaming read_parquet benchmark by @wence- in #753
- Multi-GPU implementation of Q21 by @wence- in #758
- Use main shared-workflows branch by @jameslamb in #784
- Add device PCI Bus ID information to
check_resource_bindingby @pentschev in #785 - wheel builds: react to changes in pip's handling of build constraints by @mmccarty in #790
- fix(build): build package on merge to
release/*branch by @gforsyth in #821
New Contributors
Full Changelog: v26.02.00a...v26.02.00
v25.12.00
What's Changed
🚨 Breaking Changes
- Use a CUDA stream pool and remove many stream parameters from the API by @madsbk in #531
- Add optional owner slot to
TableChunks by @wence- in #536 - Clean up
ShufflerAsyncby @madsbk in #557 - Deallocation should be noexcept by @bdice in #569
- Table chunk
is_spillable()by @madsbk in #559 - Upgrade libcoro dependency by @wence- in #570
- Refactor:
Messageby @madsbk in #598 TableChunkcopying by @madsbk in #604- Promote sequence IDs to messages by @wence- in #602
- Message with content description by @madsbk in #623
- Refactor:
Context.create_channel()by @madsbk in #631 Context: use shared pointer for buffer resource by @madsbk in #646- [MINOR] Adding a
Rangearg toBufferResource::reservce_and_failby @nirandaperera in #648 - Implement linearisation by round-robin assignment by @wence- in #654
- Handle lifetime issues in channel callbacks in presence of exceptions by @wence- in #661
🐛 Bug Fixes
- Enable timeout with stacks for ctests by @pentschev in #538
- tests: use initialized input by @madsbk in #541
- Attempt to address deadlocks in streaming shuffle tests by @wence- in #548
- [MINOR] Fix python build by @nirandaperera in #562
- Use a semaphore in async shuffler to manage waiters by @wence- in #564
- Replace path to user's home with wildcard by @pentschev in #579
- Update to Ray 2.49 by @pentschev in #585
- Release the
CuptiMonitormutex before joining thread by @pentschev in #588 - Fix hang when a consumer throws by @madsbk in #577
Table.from_table_view_of_arbitrary:Fix missing stream argument. by @madsbk in #607- Fix a few SPDX-related issues by @KyleFromNVIDIA in #614
- Ensure messages are dropped by refcount through channel sends by @wence- in #638
AllGatherunregisters spill function in destructor by @madsbk in #643- Pin Cython pre-3.2.0 and PyTest pre-9 by @jakirkham in #647
- refactored update-version.sh to handle new branching strategy by @rockhowse in #645
📖 Documentation
- Fix wording in background doc by @TomAugspurger in #568
- Capitalize GPU by @bdice in #603
- Docs for streaming engine by @quasiben in #571
- Fix doxygen warnings and amend docstrings by @quasiben in #606
- Use current system architecture in conda environment creation command by @bdice in #634
- Fix images by @quasiben in #677
🚀 New Features
- Make UCXX progress mode configurable by @pentschev in #487
- Add an arbitrary payload chunk by @wence- in #620
- Add RapidsMPF launcher
rrunby @pentschev in #616 - Separate shuffler's communication into new interface by @pentschev in #437
- Support for dynamic system topology discovery by @pentschev in #624
🛠️ Improvements
- Update
RAPIDS_BRANCH, codify changes inupdate-version.shby @KyleFromNVIDIA in #529 - Make CI scripts more consistent and trap exit error codes by @pentschev in #542
- Fix hang in boostrap_dask_cluster with new clients by @TomAugspurger in #540
- Compute sanitizer suppressions by @madsbk in #544
- Fixing forward merge conflict by @nirandaperera in #558
- Separate synced host data and buffers by @nirandaperera in #543
- ShutdownAtExit: accept a vector of channels by @madsbk in #563
- Add an async interface to the AllGather implementation by @wence- in #551
- Fix rmm.librmm cimport by @bdice in #575
- Use new
removefeature from UCXXtagProbeby @pentschev in #581 - Enable
sccache-distconnection pool by @trxcllnt in #586 - Lock-free pausable thread loop by @nirandaperera in #583
- Adding pinned host buffer impl by @nirandaperera in #549
- Use main in RAPIDS_BRANCH by @bdice in #591
- Use main shared-workflows branch by @bdice in #595
- Coroutine utility function:
coro_results()by @madsbk in #597 - Enable taking Python
Streams fromContextby @nirandaperera in #596 - [MINOR] Remove
rmm/host/pinned_memory_resourceby @nirandaperera in #605 - Use SPDX for all copyright headers by @KyleFromNVIDIA in #608
- Memory Reservation python bindings by @madsbk in #610
- Message copy by @madsbk in #609
- Introduce a streaming read_parquet node by @wence- in #574
- Content description of an object by @madsbk in #613
- Minor fixes by @wence- in #622
- Use a large stack size when running lineariser test by @madsbk in #625
- Minor type stub fixes by @wence- in #626
- Make CUDA stream pool size configurable by @TomAugspurger in #633
- Migrate to new CCCL memory resource interface by @bdice in #635
- Collection for spillable messages by @madsbk in #630
- Update some Python docstrings by @madsbk in #639
- More python bindings by @madsbk in #651
- Some test renaming by @madsbk in #660
- Spilling of in-flight messages by @madsbk in #640
- Add statistics to BufferResource wrapper by @TomAugspurger in #659
- Update RMM includes from
<rmm/mr/device/*>to<rmm/mr/*>by @bdice in #664 - Improved serialization of
Optionsby @pentschev in #642 - [MINOR] Adding a
RAPIDSMPF_DETAILmode by @nirandaperera in #612 - Use
sccache-distbuild cluster for conda and wheel builds by @trxcllnt in #628 - Install OpenMPI explicitly in CI matrices by @pentschev in #723
New Contributors
- @quasiben made their first contribution in #571
- @rockhowse made their first contribution in #645
Full Changelog: v25.12.00a...v25.12.00