-
Notifications
You must be signed in to change notification settings - Fork 2
Skip ramp_up/ramp_down during batch processing #115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
The diff in pipeline_commands looks a bit confusing due to the try-finally indentation. Other commit is a fix to allow loading images from Google Artifact Registry |
|
@gemini-cli /review |
|
🤖 Hi @ranlu, I've received your request, and I'm working on it now! You can track my progress in the logs for more details. |
1 similar comment
|
🤖 Hi @ranlu, I've received your request, and I'm working on it now! You can track my progress in the logs for more details. |
|
🤖 I'm sorry @ranlu, but I was unable to process your request. Please see the logs for more details. |
|
@gemini-cli /review |
|
🤖 Hi @ranlu, I've received your request, and I'm working on it now! You can track my progress in the logs for more details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pull request aims to optimize batch processing by skipping cluster ramp-up/down between jobs. The changes in cloud/google/common.py and dags/google_api_helper.py seem correct and beneficial.
However, the refactoring in slackbot/pipeline_commands.py introduces a critical bug that will cause batch jobs to fail. I've left a comment with a detailed explanation and a suggested fix.
🔍 General Feedback
- The optimization to avoid sleeping during cluster ramp-up if the target size is already met is a good addition.
- Please address the critical issue in
slackbot/pipeline_commands.pyto ensure the batch processing works as expected.
| if i > 0: | ||
| if 'NAME' in param: | ||
| del param['NAME'] | ||
| for k in p: | ||
| param[k] = p[k] | ||
| supply_default_param(param) | ||
| replyto(msg, "*Sanity check: batch job {} out of {}*".format(i+1, len(json_obj))) | ||
| state = "unknown" | ||
| current_task = guess_run_type(param) | ||
| if current_task == "seg_run": | ||
| set_variable('param', param, serialize_json=True) | ||
| state = run_dag("sanity_check", wait_for_completion=True).state | ||
| elif current_task == "inf_run": | ||
| set_variable('inference_param', param, serialize_json=True) | ||
| state = run_dag("chunkflow_generator", wait_for_completion=True).state | ||
| elif current_task == "syn_run": | ||
| set_variable("synaptor_param.json", param, serialize_json=True) | ||
| state = run_dag("synaptor_sanity_check", wait_for_completion=True).state | ||
|
|
||
| if state != "success": | ||
| replyto(msg, "*Sanity check failed, abort!*") | ||
| break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for guessing the task type and running the sanity check should be executed for all batch jobs.
| if i > 0: | |
| if 'NAME' in param: | |
| del param['NAME'] | |
| for k in p: | |
| param[k] = p[k] | |
| supply_default_param(param) | |
| replyto(msg, "*Sanity check: batch job {} out of {}*".format(i+1, len(json_obj))) | |
| state = "unknown" | |
| current_task = guess_run_type(param) | |
| if current_task == "seg_run": | |
| set_variable('param', param, serialize_json=True) | |
| state = run_dag("sanity_check", wait_for_completion=True).state | |
| elif current_task == "inf_run": | |
| set_variable('inference_param', param, serialize_json=True) | |
| state = run_dag("chunkflow_generator", wait_for_completion=True).state | |
| elif current_task == "syn_run": | |
| set_variable("synaptor_param.json", param, serialize_json=True) | |
| state = run_dag("synaptor_sanity_check", wait_for_completion=True).state | |
| if state != "success": | |
| replyto(msg, "*Sanity check failed, abort!*") | |
| break | |
| if i > 0: | |
| if 'NAME' in param: | |
| del param['NAME'] | |
| for k in p: | |
| param[k] = p[k] | |
| supply_default_param(param) | |
| replyto(msg, "*Sanity check: batch job {} out of {}*".format(i+1, len(json_obj))) | |
| state = "unknown" | |
| current_task = guess_run_type(param) | |
| if current_task == "seg_run": | |
| set_variable('param', param, serialize_json=True) | |
| state = run_dag("sanity_check", wait_for_completion=True).state | |
| elif current_task == "inf_run": | |
| set_variable('inference_param', param, serialize_json=True) | |
| state = run_dag("chunkflow_generator", wait_for_completion=True).state | |
| elif current_task == "syn_run": | |
| set_variable("synaptor_param.json", param, serialize_json=True) | |
| state = run_dag("synaptor_sanity_check", wait_for_completion=True).state | |
| if state != "success": | |
| replyto(msg, "*Sanity check failed, abort!*") | |
| break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. current_task still gets initialized on line 194. and guess_run_type(param) was also called inside the if i > 0 block on the main branch (line 230)...
| set_variable("batch_keep_cluster", "true") | ||
| else: | ||
| set_variable("batch_keep_cluster", "false") | ||
| state = run_dag("chunkflow_worker", wait_for_completion=True).state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
@nkemnitz, I feel it is probably more nature to implement this as an extension of |
|
@nkemnitz , can you make a separate PR for the google artifact fix, thanks. In the future I think you can separate the try final change to a separate commit, making it easier to review the commits at least. |
31e48af to
5901a1f
Compare
No description provided.