Skip to content

API - Improve public get_dag_runs endpoint #62025

@pierrejeambrun

Description

@pierrejeambrun

On big installation the "get_dag_runs" endpoint to list all the dagruns in the UI is taking a long time to return a response.

The critical part of the code is the actual db query. Most likely due to the number of joins creating rows explosion. We need to optimize the query, most likely by double checking the eager loading options and verify that there is no row explosion (joinedload vs selectinload) cause by a wrong option. Also we can add load_only to limit the number of columns selected in each joins.

query = select(DagRun).options(*eager_load_dag_run_for_validation())
if dag_id != "~":
get_latest_version_of_dag(dag_bag, dag_id, session) # Check if the DAG exists.
query = query.filter(DagRun.dag_id == dag_id).options()
# Add join with DagVersion if dag_version filter is active
if dag_version.value:
query = query.join(DagVersion, DagRun.created_dag_version_id == DagVersion.id)
dag_run_select, total_entries = paginated_select(
statement=query,
filters=[
run_after,
logical_date,
start_date_range,
end_date_range,
update_at_range,
duration_range,
conf_contains,
state,
run_type,
dag_version,
readable_dag_runs_filter,
run_id_pattern,
triggering_user_name_pattern,
dag_id_pattern,
partition_key_pattern,
],
order_by=order_by,
offset=offset,
limit=limit,
session=session,
)
dag_runs = session.scalars(dag_run_select)

Also needs investigation to check if an extra index could help.

Screen.Recording.2026-02-16.at.16.22.49.mov

Committer

  • I acknowledge that I am a maintainer/committer of the Apache Airflow project.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions