[backend] Pipeline task stuck in Pending with no user-visible error when pod is Unschedulable (e.g., insufficient resources)

### Environment

*  How did you deploy Kubeflow Pipelines (KFP)?
**The issue seems to be significant regardless of how we deploy the KFP.** In particular, I use it on Kubernetes, Kubeflow Pipelines instance in RHOAI. 

*  KFP version: 
2.16.0
*  KFP SDK version: 
2.16.1


### Steps to reproduce
When a pipeline task's pod cannot be scheduled by Kubernetes — for example because the
requested CPU or memory exceeds what any node can satisfy — the run appears to hang
indefinitely in the KFP UI with no error or explanation visible to the user. The run
stays in `RUNNING` state and only a cluster administrator who can inspect raw pod events
will discover the actual cause (`Unschedulable: Insufficient cpu`). 

1. Author a pipeline component that sets an explicit resource limit that exceeds cluster
   capacity, for example:

   ```python
   from kfp import dsl

   @dsl.component
   def heavy_task() -> str:
       return "done"

   @dsl.pipeline
   def my_pipeline():
       task = heavy_task()
       task.set_cpu_limit("128")        # exceeds any available node
       task.set_memory_limit("512Gi")   # exceeds any available node
   ```

2. Compile and submit the pipeline to a KFP deployment running on a cluster that cannot
   satisfy those resource requests.

3. Observe the run in the Pipelines UI.

#### Actual Behavior

- The run transitions to `RUNNING` and stays there indefinitely.
- No error message or warning is shown in the UI.
- The task node shows no failure reason.
- The root cause (pod unschedulable due to insufficient resources) is only discoverable
  by running `kubectl describe pod <pod-name>` and reading the Kubernetes event:
  `Warning  FailedScheduling  ... 0/N nodes are available: N Insufficient cpu`.


#### Why This Matters

Users who set resource limits (via `set_cpu_limit`, `set_memory_limit`, or
`kubernetes.set_resources_v2`) have no indication that their pipeline will never make
progress until they (or an admin) inspect the cluster directly. This is especially
problematic because:

- Non-admin users typically cannot access pod-level events.
- The UI gives no hint to retry with smaller limits or to contact an administrator.
- The run continues consuming a pipeline run slot indefinitely.

### Expected result

If a pod remains in `Pending` state and its `PodScheduled` condition is `False` with
reason `Unschedulable` after several retires, KFP should surface this as a meaningful error to the user:

- **Failing the task** with an error message that includes the Kubernetes event reason
  (e.g., _"Pod could not be scheduled: Insufficient cpu. Adjust resource limits or use
  a node with sufficient capacity."_)



### Materials and Reference




---


Impacted by this bug? Give it a 👍.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[backend] Pipeline task stuck in Pending with no user-visible error when pod is Unschedulable (e.g., insufficient resources) #13401

Environment

Steps to reproduce

Actual Behavior

Why This Matters

Expected result

Materials and Reference

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[backend] Pipeline task stuck in Pending with no user-visible error when pod is Unschedulable (e.g., insufficient resources) #13401

Description

Environment

Steps to reproduce

Actual Behavior

Why This Matters

Expected result

Materials and Reference

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions