Feature/data 2175 kill timeout spark#45
Conversation
siklosid
left a comment
There was a problem hiding this comment.
Lgtm, it would be nice to add more context and the PR description. E.g.: some documentation about what is best practice in ariflow handling timeout, how did you come to this implementation.
Also it would be nice to handle when we cancel/clear the job on the UI, so that also kills the running spark job.
| job_args=_parse_args(self._template_parameters), | ||
| spark_args=_parse_spark_args(self._task.spark_args), | ||
| spark_conf_args=_parse_spark_args(self._task.spark_conf_args, '=', 'conf '), | ||
| spark_app_name=self._task.spark_conf_args.get("spark.app.name", ""), |
| Parameters={"commands": [kill_command]} | ||
| ) | ||
| raise AirflowException( | ||
| f"Spark job exceeded the execution timeout of {self._execution_timeout} seconds and was terminated.") |
There was a problem hiding this comment.
Can this only happen if it's a timeout? What if we cancel the job on airflow UI?
| ) | ||
| if application_id: | ||
| self.kill_spark_job(emr_master_instance_id, application_id) | ||
| raise AirflowException("Task timed out and the Spark job was terminated.") |
There was a problem hiding this comment.
We are already raising an exception in the kill_spark_job, so I'm not even sure the code will ever get here.
siklosid
left a comment
There was a problem hiding this comment.
Lgtm. @claudiazi Do you know why were there so many code style changes? It made the review a bit harder, because it was not clear what is an actual change and what not.
|
@siklosid sorry!! I blacked the code 🙈 |
|
@claudiazi I think black is part of the pre-commit hook. Does this mean it was not blacked before? Or was it blacked with different settings? |
|
@siklosid seems that the pre-commit was not working at all. 🤔 |
Feature:
kill_spark_jobhandles the termination of Spark jobs via sending the yarn command to ssmon_killto terminate the spark via callingkill_spark_jobfunction when the Airflow task is marked as failed, or manually killed via the Airflow UI or CLI.kill_spark_jobTested in datastg: