Address some performance regressions in shuffle#1018
Open
wence- wants to merge 7 commits into
Open
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
92e2244 to
fbebd32
Compare
This allows benchmarking just the data movement part of the shuffle without the unspilling and concatenation of the results. Additionally, remove the unnecessary stream sync, the contract is the downstream data is available in stream-ordered fashion, so do that.
8e200d0 to
8be0963
Compare
8be0963 to
a583ce0
Compare
0b25269 to
a3fff0b
Compare
a3fff0b to
1c763bb
Compare
Contributor
Author
|
The waking is not safe yet. |
We can wake if the MPE has finished polling for new metadata, and don't need to wait for it to be completely idle.
Seems from benchmarking this won't be worth it
1c763bb to
011ed99
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
After #927 we lost about 10% performance in the shuffle benchmarks when using cuda async memory.
Recover, from my benchmarking, most of this with a number of updates:
In addition, to allow benchmarking just the communication part of the shuffle, add a "discard output without even concatenating it" mode to the shuffle benchmark.