Implement P2300 bulk adapter for HPX executors#7240
Conversation
|
Can one of the admins verify this patch? |
Up to standards ✅🟢 Issues
|
5cdcfbd to
7b3fc5f
Compare
|
@shivansh023023 Just FYI, before you invest more time into our old sender/receiver implementation: #7123 will soon remove almost all of our code related to this. Please focus your efforts on whatever stays after this has been merged. |
|
Thank you for the heads-up regarding #7123. I appreciate the guidance to avoid spending effort on code that will soon be deprecated. I will pause my work on the current P2300 bulk adapter and follow the progress of #7123 closely. Once that is merged, I’ll re-evaluate how to implement the bulk functionality within the new framework. In the meantime, I’ll focus on finishing the review items for PR #7070 (local convenience header) and cleaning up any remaining issues on my other active PRs that are unaffected by the sender/receiver removal. |
7b3fc5f to
70c4501
Compare
|
Hi @hkaiser Thank you for the feedback. I have refactored the PR to align with the latest stdexec changes: Purged Legacy Code: Removed all !defined(HPX_HAVE_STDEXEC) blocks from executor_scheduler_bulk.hpp. Header Optimization: Removed the inclusions of executor_scheduler.hpp and executor_scheduler_bulk.hpp from the parallel_executor and sequenced_executor headers. Forward Declarations: Implemented forward declarations for the returned scheduler types in the executor headers to prevent bloat. Fixed Includes: Removed the reference to the deprecated get_scheduler.hpp header. The headers are now much cleaner. Let me know if there are any other areas you'd like me to slim down! |
…nder/receiver concepts, NVCC guard, dedup get_scheduler, fix post() UB, use get_scheduler in tests
bbfc7db to
1f4aeee
Compare
|
Hi @hkaiser Thank you for the architectural guidance! I have implemented your suggestions to clean up the header dependencies and tag_invoke logic: Forwarding Header: I created executor_scheduler_fwd.hpp and moved the declarations there, which allowed me to strip the redundant forward declarations from the individual executor headers. Tag-Invoke Overloads: I removed the default template arguments and duplicate overloads in parallel_executor and sequenced_executor, replacing them with the concrete friend functions you suggested. NVCC Fix: I ensured the NVCC/CUDACC guards for the destructors were correctly restored. Standard Test Patterns: I updated the unit tests to use the standard ex::get_scheduler(exec) call instead of manual constructor calls. The structure feels much more consistent with the rest of the HPX core modules now. Thanks! |
|
Hi @charan-003 Thanks for the catch on the stdexec compliance and the safety issues! I’ve updated the implementation as follows: Sender/Receiver Concepts: I added sender_concept to executor_sender and receiver_concept to executor_bulk_receiver. These are now correctly picked up by the stdexec concept checks. UB Fix in post: I refactored the lambda capture in executor_operation_state::start(). It now captures the state by reference and only moves the receiver inside the task body. This ensures that if hpx::parallel::execution::post throws, the receiver is still valid for the set_error call in the catch block. I really appreciate the eye for detail on that potential move-before-post UB,definitely a safer implementation now! |
d0b7f07 to
7ab5571
Compare
|
@hkaiser C++20 Concepts: I’ve replaced the legacy std::enable_if_t in the executor_scheduler constructor with a clean C++20 requires clause and included the header. Purged Redundant Guards: Removed the #if defined(HPX_HAVE_STDEXEC) guards in executor_scheduler_bulk.hpp, as we now rely strictly on the modern Sender/Receiver path. Header Optimization: I have completely removed the implementation headers (executor_scheduler.hpp and executor_scheduler_bulk.hpp) from parallel_executor.hpp and sequenced_executor.hpp. These files now rely solely on the lightweight executor_scheduler_fwd.hpp header, significantly reducing the include bloat for the core executors. The code is now much cleaner and follows the forwarding pattern we discussed. It should be ready for a final review! |
|
@shivansh023023 Thanks for the updates. I'd like to wait for #7257 to be finalized before merging anything else related to s&r. Please be patient as this may take a moment. |
Sure sir |
|
Hey @hkaiser , all the CI checks are passing and I've implemented all your feedback |
Could you please look into the conflicts? Those will have to be resolved before moving on. @charan-003 Could you please have a look here to see how that interplays with #6655? |
executor_scheduler adapter for legacy executors with bulk via bulk_sync_execute. My #6655 adds the P2079R7 parallel_scheduler with policy-aware bulk in thread_pool_scheduler_bulk. Different scheduler types, no architectural conflict. This PR helps older HPX executors (parallel_executor, sequenced_executor) use P2300 bulk in sender pipelines. My #6655 covers the new scheduler side. Same user goal (bulk support), different backend paths by scheduler I just need to update my PR so it follows the same style(P2300) |
Up to standards ✅🟢 Issues
|
01f7224 to
371a9ee
Compare
…nder/receiver concepts, NVCC guard, dedup get_scheduler, fix post() UB, use get_scheduler in tests Signed-off-by: Shivansh <singhshivansh023@gmail.com>
371a9ee to
6a61c4e
Compare
- Integrated hpx::execution::experimental::bulk with internal bulk_sync_execute - Implemented robust tag_invoke(connect_t) with full const-correctness - Fixed receiver reference collapsing using std::decay_t - Added executor_algorithm_bulk unit tests for parallel and sequenced execution Signed-off-by: Shivansh <singhshivansh023@gmail.com>
…nder/receiver concepts, NVCC guard, dedup get_scheduler, fix post() UB, use get_scheduler in tests Signed-off-by: Shivansh <singhshivansh023@gmail.com>
Signed-off-by: Shivansh <singhshivansh023@gmail.com>
…rm_completion_signatures Signed-off-by: Shivansh <singhshivansh023@gmail.com>
…cutor includes Signed-off-by: Shivansh <singhshivansh023@gmail.com>
The requires clause in both the sequential and parallel tag_fallback_invoke overloads of hpx::unique_copy_t incorrectly constrained the predicate against the output iterator's value type: is_invocable_v<Pred, iter_value_t<InIter>, iter_value_t<OutIter>> // wrong is_invocable_v<Pred, iter_value_t<FwdIter1>, iter_value_t<FwdIter2>> // wrong unique_copy compares consecutive elements of the *input* range only ([alg.unique]). The output iterator is write-only; its value type is irrelevant to the predicate. The correct form is: is_invocable_v<Pred, iter_value_t<InIter>, iter_value_t<InIter>> This is already the form used in hpx::ranges::unique_copy (via is_indirect_callable_v with projected<Proj, InIter> for both arguments), making this fix also restore consistency between the two API surfaces. Add test_unique_copy_constraint<IteratorTag>() to unique_copy_tests.hpp covering all execution policies (seq, par, par_unseq, seq(task), par(task)) with a strongly-typed binary predicate, plus seven edge cases: empty range, single element, all-identical, no-duplicates, and two-element identical/distinct variants. All verified against std::unique_copy. Signed-off-by: Aneek22112007 <das.aneek007@gmail.com> Signed-off-by: Shivansh <singhshivansh023@gmail.com>
The existing godbolt-minimal preset (added in TheHPXProject#7063) is missing several flags required for a correct Compiler Explorer integration: 1. Add HPX_WITH_DISTRIBUTED_RUNTIME=OFF: HPX_WITH_NETWORKING=OFF alone does not disable the distributed runtime -- AGAS, actions, and the parcelset infrastructure are still compiled, wasting ~30% build time and producing libraries that CE will never use. 2. Add HPX_WITH_CXX_STANDARD=20: pins the C++ standard explicitly so builds are deterministic across GCC/Clang versions. Uses HPX's own cache variable (not CMAKE_CXX_STANDARD, which HPX rejects unless HPX_USE_CMAKE_CXX_STANDARD is also set). 3. Add HPX_WITH_TOOLS=OFF: CE only needs libraries and headers; building HPX tools adds unnecessary build time. 4. Improve description: document the produced static libraries (libhpx_wrap.a, libhpx_init.a, libhpx.a, libhpx_core.a) and the required -Wl,-wrap=main linker flag for non-CMake consumers. 5. Reorder cache variables to group related flags logically: standard -> linking -> runtime -> dependencies -> build targets. Signed-off-by: Shivansh <singhshivansh023@gmail.com>
Signed-off-by: Shivansh <singhshivansh023@gmail.com>
6a61c4e to
993005b
Compare
There was a problem hiding this comment.
These changes are unrelated to this PR. Please remove.
There was a problem hiding this comment.
Same here, these changes are unrelated.
There was a problem hiding this comment.
Unrelated changes, please remove.
There was a problem hiding this comment.
These changes are unrelated as well.
…tions Signed-off-by: Shivansh Singh <singhshivansh023@gmail.com>
Signed-off-by: Shivansh Singh <singhshivansh023@gmail.com>
… modifiers Signed-off-by: Shivansh Singh <singhshivansh023@gmail.com>
Ok sir , I will fix my other PR's till then |
P2300
bulkIntegration for HPX ExecutorsDescription
This PR implements the P2300
bulksender adapter specifically for HPX's legacy executors, including theparallel_executorandsequenced_executor. By bridging thehpx::execution::experimental::bulkalgorithm with HPX's internalbulk_sync_executemechanism, this change ensures that data-parallel workloads are properly load-balanced and optimized using HPX's high-performance partitioners.Key Technical Improvements:
bulksender to the underlying HPX execution engine for native parallel performance.tag_invokeOverloads: Implemented a full suite ofconnect_toverloads (supporting&&,&, andconst&) to ensure compatibility with consumer algorithms likesync_wait.std::decay_twithin thebulk_receiverto prevent reference collapsing and ensure the safety of deferred functional object execution.executorsmodule rather than the coreexecutionmodule.Proposed Changes
libs/core/executors/include/hpx/executors/executor_scheduler_bulk.hpp: Implementsexecutor_bulk_senderandexecutor_bulk_receiver.libs/core/executors/include/hpx/executors/executor_scheduler.hpp: Exposesget_completion_scheduler_tfor ADL discovery.libs/core/executors/include/hpx/executors/parallel_executor.hpp&sequenced_executor.hpp: Integrated the new bulk headers.libs/core/executors/tests/unit/executor_algorithm_bulk.cpp: Comprehensive validation for sequential and parallel policies.Background context
This work is part of the ongoing effort to modernize HPX's execution model to align with the C++23 Sender/Receiver (P2300) standard. Implementing
bulkis a "Big Impact" milestone because it allows modern asynchronous pipelines to tap into the mature, multi-threaded performance of the HPX runtime.Checklist
executor_algorithm_bulk_testtarget.inspecttool.