Skip to content

[Bug] Spark engine might not stop when the kyuubi session is closed. #7290

@ruanwenjun

Description

@ruanwenjun

Code of Conduct

Search before asking

  • I have searched in the issues and found no similar issues.

Describe the bug

Hi team,

I’m using Kyuubi to submit Spark jobs to Kubernetes and have recently started running some resource-related tests.

I noticed that when I kill my JDBC client, the Kyuubi session is closed as expected, but the Spark driver pods may still remain running.

After looking into the code, my understanding of the current workflow is roughly:

JDBC driver -> Kyuubi session -> Spark engine

More specifically:

When a JDBC connection is created, the JDBC driver connects to the Kyuubi server and creates a Kyuubi session.

The Kyuubi session then launches a Spark engine and waits for it to be ready.

Once the engine is launched, it registers itself in ZooKeeper.

The Kyuubi session discovers the engine address from ZooKeeper and establishes a Thrift connection to it.

When the JDBC client is closed, the Kyuubi Session Manager closes the corresponding Kyuubi session and the engine session.

However, the Spark engine itself is not stopped.

The issue I’m seeing is that if the Thrift client has not been created yet (for example, if the engine has not finished starting up), the Kyuubi engine will never be killed, even though the JDBC client and Kyuubi session are already closed.

Before #4241, will directly close the engine when session closed.

Affects Version(s)

1.9

Kyuubi Server Log Output

Here is the server log, we can find after the session is closed, then the thrift connection created.

2025-12-29 16:07:16.910 INFO KubernetesApplicationAuditLogger: label=375d3aed-6fac-4a10-95ae-e47ab7882496       context=cls-nctmeua9-100031385429-context-default       namespace=spark pod=kyuubi-stress-test-cql-7-375d3aed-6fac-4a10-95ae-e47ab7882496-driver   podState=Pending        containers=[]   appId=spark-3cda6199f3164f488cd234a020c8b557    appState=PENDING        appError=''
2025-12-29 16:07:17.270 INFO KubernetesApplicationAuditLogger: label=375d3aed-6fac-4a10-95ae-e47ab7882496       context=null    namespace=spark pod=kyuubi-stress-test-cql-7-375d3aed-6fac-4a10-95ae-e47ab7882496-driver        podState=Pending   containers=[]   appId=spark-3cda6199f3164f488cd234a020c8b557    appState=PENDING        appError=''
2025-12-29 16:07:17.352 INFO KubernetesApplicationAuditLogger: label=375d3aed-6fac-4a10-95ae-e47ab7882496       context=null    namespace=spark pod=kyuubi-stress-test-cql-7-375d3aed-6fac-4a10-95ae-e47ab7882496-driver        podState=Pending   containers=[]   appId=spark-3cda6199f3164f488cd234a020c8b557    appState=PENDING        appError=''
2025-12-29 16:19:41.326 INFO KyuubiSessionManager: Closing session 375d3aed-6fac-4a10-95ae-e47ab7882496 that has been idle for more than 600000 ms
2025-12-29 16:19:41.326 INFO KyuubiSessionManager: hive's KyuubiSessionImpl with SessionHandle [375d3aed-6fac-4a10-95ae-e47ab7882496] is closed, current opening sessions 24
org.apache.kyuubi.KyuubiSQLException: Invalid SessionHandle [375d3aed-6fac-4a10-95ae-e47ab7882496]
2025-12-29 16:19:43.813 INFO KyuubiTBinaryFrontendService: Received request of closing SessionHandle [375d3aed-6fac-4a10-95ae-e47ab7882496]
org.apache.kyuubi.KyuubiSQLException: Invalid SessionHandle [375d3aed-6fac-4a10-95ae-e47ab7882496]
2025-12-29 16:19:43.817 INFO KyuubiTBinaryFrontendService: Finished closing SessionHandle [375d3aed-6fac-4a10-95ae-e47ab7882496]
2025-12-29 16:35:00.893 INFO KubernetesApplicationAuditLogger: label=375d3aed-6fac-4a10-95ae-e47ab7882496       context=cls-nctmeua9-100031385429-context-default       namespace=spark pod=kyuubi-stress-test-cql-7-375d3aed-6fac-4a10-95ae-e47ab7882496-driver   podState=Pending        containers=[]   appId=spark-3cda6199f3164f488cd234a020c8b557    appState=PENDING        appError=''
2025-12-29 16:35:00.893 INFO KubernetesApplicationAuditLogger: label=375d3aed-6fac-4a10-95ae-e47ab7882496       context=null    namespace=spark pod=kyuubi-stress-test-cql-7-375d3aed-6fac-4a10-95ae-e47ab7882496-driver        podState=Pending   containers=[]   appId=spark-3cda6199f3164f488cd234a020c8b557    appState=PENDING        appError=''
2025-12-29 16:35:00.954 INFO KubernetesApplicationAuditLogger: label=375d3aed-6fac-4a10-95ae-e47ab7882496       context=cls-nctmeua9-100031385429-context-default       namespace=spark pod=kyuubi-stress-test-cql-7-375d3aed-6fac-4a10-95ae-e47ab7882496-driver   podState=Pending        containers=[aggregation->ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=PodInitializing, additionalProperties={}), additionalProperties={}),spark-kubernetes-driver->ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=PodInitializing, additionalProperties={}), additionalProperties={})]       appId=spark-3cda6199f3164f488cd234a020c8b557       appState=PENDING        appError=''
2025-12-29 16:35:00.954 INFO KubernetesApplicationAuditLogger: label=375d3aed-6fac-4a10-95ae-e47ab7882496       context=null    namespace=spark pod=kyuubi-stress-test-cql-7-375d3aed-6fac-4a10-95ae-e47ab7882496-driver        podState=Pending   containers=[aggregation->ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=PodInitializing, additionalProperties={}), additionalProperties={}),spark-kubernetes-driver->ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=PodInitializing, additionalProperties={}), additionalProperties={})]       appId=spark-3cda6199f3164f488cd234a020c8b557    appState=PENDING        appError=''
2025-12-29 16:35:01.291 INFO KubernetesApplicationAuditLogger: label=375d3aed-6fac-4a10-95ae-e47ab7882496       context=cls-nctmeua9-100031385429-context-default       namespace=spark pod=kyuubi-stress-test-cql-7-375d3aed-6fac-4a10-95ae-e47ab7882496-driver   podState=Pending        containers=[aggregation->ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=PodInitializing, additionalProperties={}), additionalProperties={}),spark-kubernetes-driver->ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=PodInitializing, additionalProperties={}), additionalProperties={})]       appId=spark-3cda6199f3164f488cd234a020c8b557       appState=PENDING        appError=''
2025-12-29 16:35:01.291 INFO KubernetesApplicationAuditLogger: label=375d3aed-6fac-4a10-95ae-e47ab7882496       context=null    namespace=spark pod=kyuubi-stress-test-cql-7-375d3aed-6fac-4a10-95ae-e47ab7882496-driver        podState=Pending   containers=[aggregation->ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=PodInitializing, additionalProperties={}), additionalProperties={}),spark-kubernetes-driver->ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=PodInitializing, additionalProperties={}), additionalProperties={})]       appId=spark-3cda6199f3164f488cd234a020c8b557    appState=PENDING        appError=''
2025-12-29 16:35:02.006 INFO KubernetesApplicationAuditLogger: label=375d3aed-6fac-4a10-95ae-e47ab7882496       context=null    namespace=spark pod=kyuubi-stress-test-cql-7-375d3aed-6fac-4a10-95ae-e47ab7882496-driver        podState=Pending   containers=[aggregation->ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=PodInitializing, additionalProperties={}), additionalProperties={}),spark-kubernetes-driver->ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=PodInitializing, additionalProperties={}), additionalProperties={})]       appId=spark-3cda6199f3164f488cd234a020c8b557    appState=PENDING        appError=''
2025-12-29 16:35:02.006 INFO KubernetesApplicationAuditLogger: label=375d3aed-6fac-4a10-95ae-e47ab7882496       context=cls-nctmeua9-100031385429-context-default       namespace=spark pod=kyuubi-stress-test-cql-7-375d3aed-6fac-4a10-95ae-e47ab7882496-driver   podState=Pending        containers=[aggregation->ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=PodInitializing, additionalProperties={}), additionalProperties={}),spark-kubernetes-driver->ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=PodInitializing, additionalProperties={}), additionalProperties={})]       appId=spark-3cda6199f3164f488cd234a020c8b557       appState=PENDING        appError=''
2025-12-29 16:35:03.026 INFO KubernetesApplicationAuditLogger: label=375d3aed-6fac-4a10-95ae-e47ab7882496       context=cls-nctmeua9-100031385429-context-default       namespace=spark pod=kyuubi-stress-test-cql-7-375d3aed-6fac-4a10-95ae-e47ab7882496-driver   podState=Pending        containers=[aggregation->ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=PodInitializing, additionalProperties={}), additionalProperties={}),spark-kubernetes-driver->ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=PodInitializing, additionalProperties={}), additionalProperties={})]       appId=spark-3cda6199f3164f488cd234a020c8b557       appState=PENDING        appError=''
2025-12-29 16:35:03.027 INFO KubernetesApplicationAuditLogger: label=375d3aed-6fac-4a10-95ae-e47ab7882496       context=null    namespace=spark pod=kyuubi-stress-test-cql-7-375d3aed-6fac-4a10-95ae-e47ab7882496-driver        podState=Pending   containers=[aggregation->ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=PodInitializing, additionalProperties={}), additionalProperties={}),spark-kubernetes-driver->ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=PodInitializing, additionalProperties={}), additionalProperties={})]       appId=spark-3cda6199f3164f488cd234a020c8b557    appState=PENDING        appError=''
2025-12-29 16:35:04.052 INFO KubernetesApplicationAuditLogger: label=375d3aed-6fac-4a10-95ae-e47ab7882496       context=cls-nctmeua9-100031385429-context-default       namespace=spark pod=kyuubi-stress-test-cql-7-375d3aed-6fac-4a10-95ae-e47ab7882496-driver   podState=Running        containers=[aggregation->ContainerState(running=ContainerStateRunning(startedAt=2025-12-29T08:35:03Z, additionalProperties={}), terminated=null, waiting=null, additionalProperties={}),spark-kubernetes-driver->ContainerState(running=ContainerStateRunning(startedAt=2025-12-29T08:35:03Z, additionalProperties={}), terminated=null, waiting=null, additionalProperties={})]   appId=spark-3cda6199f3164f488cd234a020c8b557    appState=RUNNING   appError=''
2025-12-29 16:35:04.052 INFO KubernetesApplicationAuditLogger: label=375d3aed-6fac-4a10-95ae-e47ab7882496       context=null    namespace=spark pod=kyuubi-stress-test-cql-7-375d3aed-6fac-4a10-95ae-e47ab7882496-driver        podState=Running   containers=[aggregation->ContainerState(running=ContainerStateRunning(startedAt=2025-12-29T08:35:03Z, additionalProperties={}), terminated=null, waiting=null, additionalProperties={}),spark-kubernetes-driver->ContainerState(running=ContainerStateRunning(startedAt=2025-12-29T08:35:03Z, additionalProperties={}), terminated=null, waiting=null, additionalProperties={})]   appId=spark-3cda6199f3164f488cd234a020c8b557    appState=RUNNING        appError=''
2025-12-29 16:35:42.627 INFO ZookeeperDiscoveryClient: Get service instance:xx:38295 engine id:spark-3cda6199f3164f488cd234a020c8b557 and version:1.9.1.1 under /kyuubi-CONNECTION_SPARK_SQL/hive/375d3aed-6fac-4a10-95ae-e47ab7882496
2025-12-29 16:35:43.539 INFO KyuubiSessionImpl: [hive:xx] SessionHandle [375d3aed-6fac-4a10-95ae-e47ab7882496] - Connected to engine [xx:38295]/[spark-3cda6199f3164f488cd234a020c8b557] with SessionHandle [375d3aed-6fac-4a10-95ae-e47ab7882496]]

Kyuubi Engine Log Output

Kyuubi Server Configurations

Kyuubi Engine Configurations

Additional context

No response

Are you willing to submit PR?

  • Yes. I would be willing to submit a PR with guidance from the Kyuubi community to fix.
  • No. I cannot submit a PR at this time.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions