-
Notifications
You must be signed in to change notification settings - Fork 51
Parallelize tuning across GPUs #2179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
- Redesign for better maintainability and readability - Unclutter output with proper quiet flag and progress bar - Distribute work on multi-gpu systems
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
mlir/utils/performance/tuningRunner.py:1
- Off-by-one error: loop creates
numThreads - 1compilation threads, but should createnumThreadsthreads. This will result in one fewer compilation worker than intended.
#!/usr/bin/env python3
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
umangyadav
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change Jenkinsfile to not use MITuna after this PR is merged to test this out ?
| if len(device_ids) > 0: | ||
| return device_ids |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it is only picking values from first HIP_VISIBLE_DEVICES and if not found then it tries to pick values form ROCR_VISIBLE_DEVICES. Can you add comment about it so that it sounds intentional instead of ROCR_VISIBLE_DEVICES being ignored
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm moving away from using the env vars because they are inconsistent and unreliable. I landed on retrieving the GPU IDs using rocm-smi instead, because it ignores any mapping and returns the physical IDs.
Created a tracking issue: https://github.com/ROCm/rocMLIR-internal/issues/2206 |
- Update docstrings - Use rocm-smi instead of env vars for GPU selection
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| class TuningError(Exception): | ||
| """Raised when tuning or verification fails.""" | ||
| pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class TuningError is the same as Exception class (except the it's name). Does it make sense having this as separate class ? If it's raised when tuning or verification fails should we maybe add the config attribute so that it's easier to see for which config TuningError occured. Otherwise we would only get info that the exception happened but not on which config (in a easy readable way) (if we don't provide context somehow manually later). Maybe it would be useful to have special attribute related to config when we use this kind of exceptions so then it would be intuitive to attach the config when using class TuningError. Since we will use more GPUs maybe it would be useful to have optional attribute in this class for which gpu was used
| if not tune_mlir_kernels(configs, conf_class, paths, options): | ||
| print("Tuning aborted", file=sys.stderr) | ||
| try: | ||
| return not tune_configs(configs, conf_class, paths, options) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tune_configs returns bool in this code version, and we expect 0 to mean success so that is why we invert the boolean? so to get success in this case we switch the boolean to False (if it was True). It might be confusing. Can we handle this differently, so that we don't directly flip the bool value to get actually the int return value?
maybe it would be more understandable if we do somethjing like
tuning_success = tune_configs(...
return 0 if tuning_success else 1
not sure if my suggestion makes sense
Motivation
Improve tuning time by parallelizing tuning on multi-GPU systems.
Technical Details
tuningRunner.py
rocm-smito retrieve a list of available GPUs. Utilize all of them for tuning, or a subset of them if--gpusis specified.ROCR_VISIBLE_DEVICESset accordingly and--num-compile-threadsset toceil(num_cpus / num_gpus) - 1.--retuneis specified.--quiet.--compact-printand correct the semantics of--quietto suppress non-error output (used in CI).rocmlir-tuning-driver.cpp
Resolves https://github.com/ROCm/rocMLIR-internal/issues/2018
Test Plan
Test Result
Submission Checklist