Skip to content

Re-enable start_from_trigger feature with rendering of template fields#55068

Open
dabla wants to merge 61 commits intoapache:mainfrom
dabla:fix/render-templated-fields-start-from-trigger-in-trigger-subprocess
Open

Re-enable start_from_trigger feature with rendering of template fields#55068
dabla wants to merge 61 commits intoapache:mainfrom
dabla:fix/render-templated-fields-start-from-trigger-in-trigger-subprocess

Conversation

@dabla
Copy link
Contributor

@dabla dabla commented Aug 29, 2025

This PR tries to fix the rendering of templated fields with start from trigger args without breaking architectural assumptions, which the initial PR 53071 did violate and was reverted in PR 55037.

This PR could be simplified without the need of parsing the DAG if the render_template_fields would be re-usable in the triggerers, as of now it's part of the BaseOperator and thus not re-usable in BaseTrigger. As discussed with @ashb , apparently @amoghrajesh would be working on this (e.g. making template renderers reusable).

As opposed to previous attempt, here the rendering of the templates isn't done while creating the trigger into the database within the TriggerRunnerSupervisor. Here, the TriggerRunnerSupervisor will only retrieve the serialized DAG from the DagModel, and pass it to the workloads RunTrigger as an extra parameter (e.g named dag_data), which in turn will be used by the TriggerRunner in the subprocess in which there it will have access to the XCom's, and thus, shouldn't raise the: ImportError: cannot import name 'SUPERVISOR_COMMS' from 'airflow.sdk.execution_time.task_runner'.

So, if template rendering would be reusable outside the BaseOperator, then we wouldn't need to parse the DAG from the DagBag to retrieve the task (e.g. BaseOperator) to be able to do the template rendering, which would make the solution even easier but also less heavy (e.g. more performant) for the triggerer subprocess.

After further reflection, also regarding the work I'm doing for AIP-88, the current solution will still need to get the task (e.g. Operator) for the trigger when yielding multiple events. Same for the rendering of templates, you need the context to be able to render those, and to get the context, you need at least a RuntimeTaskInstance, which of course, requires guess what? a task (e.g. BaseOperator). So even if the rendering of templates would be reusable across operator and triggerer, you will still need the context, thus, in this case, wouldn't change that much to the current solution.

As loading DAG code is prohibited in the triggerer, we load the serialized DAG version from the database. As a consequence, callables aren't supported as kwargs when start_from_trigger is enabled. Thus, when enabling an operator as start_from_trigger, we will automatically validate the kwargs and check if those don't contain callables, otherwise an AirflowException will be raised stating which kwargs has a callable.

image

Also the test dag from @kaxil is working:

image image

This PR only fixes the start_from_trigger with rendered templates for non-expanded tasks, to allow start_from_trigger with expanded tasks, another PR will be needed, but at least, we can already support start_from_trigger on non-expanded tasks.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

@boring-cyborg boring-cyborg bot added area:Executors-core LocalExecutor & SequentialExecutor area:Triggerer labels Aug 29, 2025
@dabla dabla marked this pull request as draft August 29, 2025 12:30
@ashb
Copy link
Member

ashb commented Sep 3, 2025

https://github.com/apache/airflow/blob/3.0.0rc3/airflow-core/newsfragments/aip-66.significant.rst

Dag bundles are not initialized in the triggerer. In practice, this means that triggers cannot come from a dag bundle. This is because the triggerer does not deal with changes in trigger code over time, as everything happens in the main process. Triggers can come from anywhere else on sys.path instead.

(Emphasis mine)

@ramitkataria
Copy link
Contributor

This would also be very useful for async callbacks (currently used for Deadline Alerts) running in the triggerer! Once this is merged in, I could create a followup PR to replace the implementation in #55241

@dabla dabla requested a review from ashb September 9, 2025 12:47
@dabla
Copy link
Contributor Author

dabla commented Sep 9, 2025

@ashb I've introduced a SerializedDagBag class, which acts as a simple cache in the same as the DagBag but then only uses serialized DAG's from the DB instead of from the filesystem. The SerializedDagBag needs dag_id and dag_version_id to be able to return the corresponding DAG.

The reason why I still retrieve the serialized DAG from within the update_triggers (which has DB access) instead of the scheduler to render the templates there, as you proposed, is to still be able to construct the RuntimeTaskInstance from within the update_triggers (which needs a task to be able to construct) so that we can assign that instance to the trigger instead of the serialized TI from the workloads module, as this will also be needed to be able to run the multiple yielded trigger events for AIP-88 (as those will need the template context to be able to do the pagination from within the triggerer while yielding the events). I know this is a very complicated explanation, let me know if you want some clarification there.

I still need to write a unit test for SerializedDagBag though.

@ashb
Copy link
Member

ashb commented Sep 9, 2025

SerializedDagBag already exists in the form of SchedulerDagBag I think -- rather than a new one, it might be better to rename that one if it otherwise fits your need

@dabla
Copy link
Contributor Author

dabla commented Sep 9, 2025

SerializedDagBag already exists in the form of SchedulerDagBag I think -- rather than a new one, it might be better to rename that one if it otherwise fits your need

You're right, it's called DBDagBag, it serves the same purpose, so will replace it with that one, thx @ashb for pointing this out 👍

@dabla
Copy link
Contributor Author

dabla commented Sep 9, 2025

SerializedDagBag already exists in the form of SchedulerDagBag I think -- rather than a new one, it might be better to rename that one if it otherwise fits your need

I ditched it and won't be using it in this PR as I need the SerializedDagModel, not the SerializedDAG, but I'm already using it for AIP-88.

@dabla dabla changed the title Second attempt to fix rendering of template fields with start from trigger Fix rendering of template fields with start from trigger Sep 9, 2025
@dabla dabla marked this pull request as ready for review September 10, 2025 07:24
@dabla dabla marked this pull request as draft September 10, 2025 16:51
@dabla dabla marked this pull request as ready for review September 12, 2025 16:30
@dabla
Copy link
Contributor Author

dabla commented Sep 12, 2025

So small explanation what I did in this PR.

  • @kaxil First of all I had to re-implement (a simplified version) of the defer_task method in TaskInstance, which was called by the schedule_tis method of the DagRun but which was commented out as start_from_trigger wasn't working correctly.
  • @ashb proposed to try to have a common defer_task classmethod in the Trigger, and which would be re-useable across the execution api and the schedule_tis method of the DagRun, but that didn't work, as there the update of the TaskInstance is done differently, and doing it this way didn't work in the schedule_tis method as modification weren't picked up by the scheduler, thus the task instance got stuck. I even tried refreshing the ti on the session, flushing and even committing it but no avail. Thus at the moment we have a little of duplication there. I've added a TODO to keep that in mind as at the moment my main focus was to make the rendering of template fields working in triggerers.
  • I also had to exclude the base Trigger module from the check-sdk-imports, as there I had to import the Templater from the sdk (which was previously in airflow.template.templater).
  • @kaxil Your test dag is also working with start_from_trigger, see screenshot at top of PR.

@dabla dabla marked this pull request as draft September 20, 2025 11:16
@dabla dabla force-pushed the fix/render-templated-fields-start-from-trigger-in-trigger-subprocess branch from e218c17 to da7d880 Compare September 23, 2025 12:17
@dabla dabla marked this pull request as ready for review September 24, 2025 14:05
@dabla dabla marked this pull request as draft September 25, 2025 12:53
…er-in-trigger-subprocess

# Conflicts:
#	airflow-core/src/airflow/executors/workloads.py
#	airflow-core/src/airflow/jobs/triggerer_job_runner.py
@dabla dabla force-pushed the fix/render-templated-fields-start-from-trigger-in-trigger-subprocess branch from 1d5e587 to 85ac14d Compare March 3, 2026 14:00
@dabla dabla force-pushed the fix/render-templated-fields-start-from-trigger-in-trigger-subprocess branch from 8fb8cd7 to c2c98b7 Compare March 4, 2026 07:41
dabla added 2 commits March 4, 2026 08:48
…er-in-trigger-subprocess

# Conflicts:
#	airflow-core/src/airflow/models/dagrun.py
Copy link
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great having this!

I note that @dabla is continously rebasing... I assume this is ready. As nobody is jumping on this, here a first approval. Unfortunately I understand only 80% of the change but would request others who have doubt to either speak-up or support with an approval.

(Means: Please another reviewer prior merge)

@dabla dabla marked this pull request as draft March 12, 2026 16:10
@dabla dabla force-pushed the fix/render-templated-fields-start-from-trigger-in-trigger-subprocess branch from 9fd292d to 12502c3 Compare March 12, 2026 17:16
@dabla dabla force-pushed the fix/render-templated-fields-start-from-trigger-in-trigger-subprocess branch from 12502c3 to 5331992 Compare March 12, 2026 17:17
@eladkal
Copy link
Contributor

eladkal commented Mar 13, 2026

I assume this is ready.

The PR is still in draft. so if the code is OK and all tests passes it would be good @dabla to mark it as ready for review

@dabla
Copy link
Contributor Author

dabla commented Mar 17, 2026

I assume this is ready.

The PR is still in draft. so if the code is OK and all tests passes it would be good @dabla to mark it as ready for review

Will try to fix ci/cd errors today.

try:
rendered_content = self.render_template(value, context, jinja_env)
except Exception:
# TODO: Mask the value. Depends on https://github.com/apache/airflow/issues/45438
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this TODO to be fioxed prior merge? Or can you add a follow-up issue not to forget about this?
Is this security critical that if not implemented secrets might leak here to logs? THat woud be a blocker in my view.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That code is duplicated from BaseOperator, as BaseOperator is from Task-SDK, I had to duplicate it for trigger. But I saw template_rendering is now also available in _shared, so will have a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:Executors-core LocalExecutor & SequentialExecutor area:Triggerer

Projects

None yet

Development

Successfully merging this pull request may close these issues.