Skip to content

Chart: Implement deployPerBundle feature for DAG processor deployments#61039

Open
uplsh580 wants to merge 2 commits intoapache:mainfrom
uplsh580:chart/deployPerBundle
Open

Chart: Implement deployPerBundle feature for DAG processor deployments#61039
uplsh580 wants to merge 2 commits intoapache:mainfrom
uplsh580:chart/deployPerBundle

Conversation

@uplsh580
Copy link
Contributor

@uplsh580 uplsh580 commented Jan 25, 2026

Related Issue

Issue: #61037

Description

This PR implements per-bundle DAG Processor deployments feature, allowing users to create separate Kubernetes deployments for each DAG bundle defined in dagBundleConfigList. This enables independent resource isolation, scaling, and configuration per bundle.

Motivation

Currently, when multiple DAG bundles are configured, all bundles are processed by a single DAG processor deployment. This limits the ability to:

  • Scale bundles independently based on their workload
  • Apply bundle-specific resource configurations
  • Isolate bundle processing failures
  • Use bundle-specific node selectors, affinities, or tolerations

Changes

Configuration (values.yaml)

Added new dagProcessor.deployPerBundle section:

dagProcessor:
  deployPerBundle:
    enabled: false  # Enable per-bundle deployments
    args: ["bash", "-c", "exec airflow dag-processor --bundle-name {{ bundleName }}"]
    bundleOverrides: {}  # Per-bundle configuration overrides

Features

  1. Per-bundle Deployments: When deployPerBundle.enabled is true, creates a separate Deployment for each bundle in dagBundleConfigList
  2. Bundle-specific Args: Supports templated args with {{ bundleName }} placeholder that gets replaced with actual bundle name
  3. Bundle Overrides: Allows per-bundle configuration overrides via bundleOverrides map:
    • Resources (CPU, memory)
    • Replicas
    • Node selectors, affinities, tolerations
    • Environment variables
    • Pod disruption budgets
    • And other deployment settings
  4. Per-bundle PodDisruptionBudget: Creates separate PDBs for each bundle when enabled
  5. Backward Compatibility: When deployPerBundle.enabled is false, maintains existing single deployment behavior

Implementation Details

  • Refactored deployment logic into a reusable helper template dag-processor.deployment
  • Refactored PDB logic into a reusable helper template dag-processor.poddisruptionbudget
  • Supports per-bundle enable/disable via bundleOverrides[bundleName].enabled

Files Changed

  • chart/templates/dag-processor/dag-processor-deployment.yaml: Added per-bundle deployment logic
  • chart/templates/dag-processor/dag-processor-poddisruptionbudget.yaml: Added per-bundle PDB logic
  • chart/values.yaml: Added deployPerBundle configuration section
  • chart/values.schema.json: Added schema validation for deployPerBundle
  • helm-tests/tests/helm_tests/airflow_core/test_dag_processor_per_bundle.py: Added comprehensive test cases

Usage Example

dagProcessor:
  enabled: true

  dagBundleConfigList:
    - name: bundle1
      classpath: "airflow.providers.git.bundles.git.GitDagBundle"
      kwargs:
        git_conn_id: "GITHUB__SAMPLE"
        subdir: "dags"
        tracking_ref: "main"
        refresh_interval: 60
    - name: bundle2
      classpath: "airflow.providers.git.bundles.git.GitDagBundle"
      kwargs:
        git_conn_id: "GITHUB__SAMPLE2"
        subdir: "dags"
        tracking_ref: "main"
        refresh_interval: 60

  deployPerBundle:
    enabled: true
    bundleOverrides:
      bundle1:
        replicas: 3
        resources:
          requests:
            memory: "2Gi"
            cpu: "1000m"
          limits:
            memory: "4Gi"
            cpu: "2000m"
        podDisruptionBudget:
          config:
            minAvailable: 2
            maxUnavailable: ~
      bundle2:
        replicas: 1
        resources:
          requests:
            memory: "1Gi"
            cpu: "500m"
        podDisruptionBudget:
          config:
            minAvailable: 1
            maxUnavailable: ~

This will create:

  • {release-name}-dag-processor-bundle1 deployment with 3 replicas and production resources
  • {release-name}-dag-processor-bundle2 deployment with 1 replica and standard resources
  • Separate PodDisruptionBudgets for each bundle (if enabled)

Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)
  • cursor

  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

@boring-cyborg boring-cyborg bot added the area:helm-chart Airflow Helm Chart label Jan 25, 2026
@uplsh580
Copy link
Contributor Author

uplsh580 commented Jan 25, 2026

Validation - deployPerBundle

image

bundle1

image

bundle2

image

poddisruptionbudgets

image

@uplsh580
Copy link
Contributor Author

uplsh580 commented Jan 25, 2026

Validation - ASIS (Single Deployments)

image

dag processor log

image

poddisruptionbudgets

image

@uplsh580 uplsh580 force-pushed the chart/deployPerBundle branch 3 times, most recently from 2531585 to 1241a7a Compare January 25, 2026 16:32
@uplsh580 uplsh580 marked this pull request as ready for review January 26, 2026 00:14
@uplsh580 uplsh580 force-pushed the chart/deployPerBundle branch from 9930581 to 1241a7a Compare January 26, 2026 00:14
@jscheffl jscheffl added this to the Airflow Helm Chart 1.20.0 milestone Jan 26, 2026
@choo121600
Copy link
Member

Great work!
It looks like there’s a merge conflict. Could you please take a look?

@uplsh580 uplsh580 force-pushed the chart/deployPerBundle branch from 1241a7a to 8f898af Compare February 27, 2026 12:23
@uplsh580 uplsh580 requested a review from bugraoz93 as a code owner February 27, 2026 12:23
@uplsh580 uplsh580 marked this pull request as draft February 27, 2026 12:40
@uplsh580 uplsh580 force-pushed the chart/deployPerBundle branch from 8f898af to 5195d85 Compare February 27, 2026 12:43
@uplsh580 uplsh580 marked this pull request as ready for review February 27, 2026 12:43
@uplsh580 uplsh580 force-pushed the chart/deployPerBundle branch 2 times, most recently from 372cbf5 to 59b62a1 Compare February 27, 2026 18:57
Copy link
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the feature very much - mainly that different dag processors can be spawned - but am not sure if I really like the parallel structure of parameters for the deployment. I feel like it would be better in the bundle definition directly. But would like to have other opinions as well.

As we are currently discussing a structural change to a version 2.0 of the helm chart it might be we are hesitant to add this short term as a feature as if we in general re-structure (potentially) then it might be also very good suited in a 2.0 version.


# Per-bundle deployment option
# When enabled, creates a separate deployment for each bundle in `dagBundleConfigList`
deployPerBundle:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure and would like opinions of other maintainers whether this structure is well suited. It is fully arallel to the definition of the bundles itself. Technically this is possilbe but then two lists need to be maintained with a risk of inconsistency.

Is there a reason that you did not add the structure below the bundle list definition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the feedback. I agree that maintaining two separate lists could lead to configuration drift and unnecessary complexity.

To address this, how about nesting the deployment configuration directly within the dagBundleConfigList? This way, each bundle's definition and its deployment strategy are coupled together, ensuring better consistency.

Here is an example of the proposed structure:

deployPerBundle: true  # default is `false`

dagBundleConfigList:
  - name: bundle1
    classpath: ...
    kwargs: ...
    deploymentOverride:           # Bundle-specific deployment settings
      enabled: true
      replicas: 3
      resources: ...
  - name: bundle2
    classpath: ...
    kwargs: {}
    # If 'deployment' is omitted, it could fall back to default values

enabled: false
# Command args template for per-bundle deployments
# `{{ bundleName }}` will be replaced with the actual bundle name
args: ["bash", "-c", "exec airflow dag-processor --bundle-name {{ bundleName }}"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason that the args are populated? Which reasons would be there to override this? Is it needed to expose this as (yet another) parameter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing that out. To be honest, I also had concerns that exposing args might add unnecessary complexity and potentially confuse users.

The reason I included it was primarily for consistency with the existing single DAG processor configuration, which already allows overriding args in the current Helm Chart. I wanted to ensure that the bundle-specific processors followed the same pattern as the standard one, just in case users needed that same level of flexibility.

args: ["bash", "-c", "exec airflow dag-processor"]

To simplify this, I’d like to propose automating the argument injection instead.
When deployPerBundle is enabled, the template can take the base args from the global dagProcessor (e.g., ["bash", "-c", "exec airflow dag-processor"]) and automatically append the --bundle-name {{ bundleName }} flag for each specific deployment.
However, I am a bit cautious about this approach because I am not fully aware of all the different ways users might be overriding the default args. If a user has a very complex custom command, simply appending a flag at the end might not work in every scenario.

@uplsh580 uplsh580 force-pushed the chart/deployPerBundle branch 2 times, most recently from 04172b7 to 9e6ca24 Compare March 2, 2026 23:36
@uplsh580
Copy link
Contributor Author

uplsh580 commented Mar 2, 2026

As we are currently discussing a structural change to a version 2.0 of the helm chart it might be we are hesitant to add this short term as a feature as if we in general re-structure (potentially) then it might be also very good suited in a 2.0 version.

That’s a fair point. While I’d personally love to see this feature implemented, I agree that structural consistency in version 2.0 is just as important. I’m happy to go with whatever the maintainers feel is best for the long-term roadmap, whether it's added now or as part of the 2.0 restructuring.

@uplsh580 uplsh580 force-pushed the chart/deployPerBundle branch from 9e6ca24 to 7670886 Compare March 2, 2026 23:42
@uplsh580
Copy link
Contributor Author

uplsh580 commented Mar 2, 2026

@jscheffl
I’ve updated the code based on your comments. I’d appreciate any further feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:helm-chart Airflow Helm Chart

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants