Chart: Implement deployPerBundle feature for DAG processor deployments#61039
Chart: Implement deployPerBundle feature for DAG processor deployments#61039uplsh580 wants to merge 2 commits intoapache:mainfrom
deployPerBundle feature for DAG processor deployments#61039Conversation
2531585 to
1241a7a
Compare
9930581 to
1241a7a
Compare
|
Great work! |
1241a7a to
8f898af
Compare
8f898af to
5195d85
Compare
372cbf5 to
59b62a1
Compare
jscheffl
left a comment
There was a problem hiding this comment.
I like the feature very much - mainly that different dag processors can be spawned - but am not sure if I really like the parallel structure of parameters for the deployment. I feel like it would be better in the bundle definition directly. But would like to have other opinions as well.
As we are currently discussing a structural change to a version 2.0 of the helm chart it might be we are hesitant to add this short term as a feature as if we in general re-structure (potentially) then it might be also very good suited in a 2.0 version.
chart/values.yaml
Outdated
|
|
||
| # Per-bundle deployment option | ||
| # When enabled, creates a separate deployment for each bundle in `dagBundleConfigList` | ||
| deployPerBundle: |
There was a problem hiding this comment.
I am not sure and would like opinions of other maintainers whether this structure is well suited. It is fully arallel to the definition of the bundles itself. Technically this is possilbe but then two lists need to be maintained with a risk of inconsistency.
Is there a reason that you did not add the structure below the bundle list definition?
There was a problem hiding this comment.
Thank you for the feedback. I agree that maintaining two separate lists could lead to configuration drift and unnecessary complexity.
To address this, how about nesting the deployment configuration directly within the dagBundleConfigList? This way, each bundle's definition and its deployment strategy are coupled together, ensuring better consistency.
Here is an example of the proposed structure:
deployPerBundle: true # default is `false`
dagBundleConfigList:
- name: bundle1
classpath: ...
kwargs: ...
deploymentOverride: # Bundle-specific deployment settings
enabled: true
replicas: 3
resources: ...
- name: bundle2
classpath: ...
kwargs: {}
# If 'deployment' is omitted, it could fall back to default values
chart/values.yaml
Outdated
| enabled: false | ||
| # Command args template for per-bundle deployments | ||
| # `{{ bundleName }}` will be replaced with the actual bundle name | ||
| args: ["bash", "-c", "exec airflow dag-processor --bundle-name {{ bundleName }}"] |
There was a problem hiding this comment.
Is there a reason that the args are populated? Which reasons would be there to override this? Is it needed to expose this as (yet another) parameter?
There was a problem hiding this comment.
Thank you for pointing that out. To be honest, I also had concerns that exposing args might add unnecessary complexity and potentially confuse users.
The reason I included it was primarily for consistency with the existing single DAG processor configuration, which already allows overriding args in the current Helm Chart. I wanted to ensure that the bundle-specific processors followed the same pattern as the standard one, just in case users needed that same level of flexibility.
Line 2383 in fbf687c
To simplify this, I’d like to propose automating the argument injection instead.
When deployPerBundle is enabled, the template can take the base args from the global dagProcessor (e.g., ["bash", "-c", "exec airflow dag-processor"]) and automatically append the --bundle-name {{ bundleName }} flag for each specific deployment.
However, I am a bit cautious about this approach because I am not fully aware of all the different ways users might be overriding the default args. If a user has a very complex custom command, simply appending a flag at the end might not work in every scenario.
04172b7 to
9e6ca24
Compare
That’s a fair point. While I’d personally love to see this feature implemented, I agree that structural consistency in version 2.0 is just as important. I’m happy to go with whatever the maintainers feel is best for the long-term roadmap, whether it's added now or as part of the 2.0 restructuring. |
9e6ca24 to
7670886
Compare
|
@jscheffl |
7670886 to
e5136d4
Compare







Related Issue
Issue: #61037
Description
This PR implements per-bundle DAG Processor deployments feature, allowing users to create separate Kubernetes deployments for each DAG bundle defined in
dagBundleConfigList. This enables independent resource isolation, scaling, and configuration per bundle.Motivation
Currently, when multiple DAG bundles are configured, all bundles are processed by a single DAG processor deployment. This limits the ability to:
Changes
Configuration (
values.yaml)Added new
dagProcessor.deployPerBundlesection:Features
deployPerBundle.enabledistrue, creates a separate Deployment for each bundle indagBundleConfigList{{ bundleName }}placeholder that gets replaced with actual bundle namebundleOverridesmap:deployPerBundle.enabledisfalse, maintains existing single deployment behaviorImplementation Details
dag-processor.deploymentdag-processor.poddisruptionbudgetbundleOverrides[bundleName].enabledFiles Changed
chart/templates/dag-processor/dag-processor-deployment.yaml: Added per-bundle deployment logicchart/templates/dag-processor/dag-processor-poddisruptionbudget.yaml: Added per-bundle PDB logicchart/values.yaml: AddeddeployPerBundleconfiguration sectionchart/values.schema.json: Added schema validation fordeployPerBundlehelm-tests/tests/helm_tests/airflow_core/test_dag_processor_per_bundle.py: Added comprehensive test casesUsage Example
This will create:
{release-name}-dag-processor-bundle1deployment with 3 replicas and production resources{release-name}-dag-processor-bundle2deployment with 1 replica and standard resourcesWas generative AI tooling used to co-author this PR?
{pr_number}.significant.rstor{issue_number}.significant.rst, in airflow-core/newsfragments.