-
Notifications
You must be signed in to change notification settings - Fork 972
Description
Code of Conduct
- I agree to follow this project's Code of Conduct
Search before asking
- I have searched in the issues and found no similar issues.
Describe the bug
Hi team,
I’m using Kyuubi to submit Spark jobs to Kubernetes and have recently started running some resource-related tests.
I noticed that when I kill my JDBC client, the Kyuubi session is closed as expected, but the Spark driver pods may still remain running.
After looking into the code, my understanding of the current workflow is roughly:
JDBC driver -> Kyuubi session -> Spark engine
More specifically:
When a JDBC connection is created, the JDBC driver connects to the Kyuubi server and creates a Kyuubi session.
The Kyuubi session then launches a Spark engine and waits for it to be ready.
Once the engine is launched, it registers itself in ZooKeeper.
The Kyuubi session discovers the engine address from ZooKeeper and establishes a Thrift connection to it.
When the JDBC client is closed, the Kyuubi Session Manager closes the corresponding Kyuubi session and the engine session.
However, the Spark engine itself is not stopped.
The issue I’m seeing is that if the Thrift client has not been created yet (for example, if the engine has not finished starting up), the Kyuubi engine will never be killed, even though the JDBC client and Kyuubi session are already closed.
Before #4241, will directly close the engine when session closed.
Affects Version(s)
1.9
Kyuubi Server Log Output
Here is the server log, we can find after the session is closed, then the thrift connection created.
2025-12-29 16:07:16.910 INFO KubernetesApplicationAuditLogger: label=375d3aed-6fac-4a10-95ae-e47ab7882496 context=cls-nctmeua9-100031385429-context-default namespace=spark pod=kyuubi-stress-test-cql-7-375d3aed-6fac-4a10-95ae-e47ab7882496-driver podState=Pending containers=[] appId=spark-3cda6199f3164f488cd234a020c8b557 appState=PENDING appError=''
2025-12-29 16:07:17.270 INFO KubernetesApplicationAuditLogger: label=375d3aed-6fac-4a10-95ae-e47ab7882496 context=null namespace=spark pod=kyuubi-stress-test-cql-7-375d3aed-6fac-4a10-95ae-e47ab7882496-driver podState=Pending containers=[] appId=spark-3cda6199f3164f488cd234a020c8b557 appState=PENDING appError=''
2025-12-29 16:07:17.352 INFO KubernetesApplicationAuditLogger: label=375d3aed-6fac-4a10-95ae-e47ab7882496 context=null namespace=spark pod=kyuubi-stress-test-cql-7-375d3aed-6fac-4a10-95ae-e47ab7882496-driver podState=Pending containers=[] appId=spark-3cda6199f3164f488cd234a020c8b557 appState=PENDING appError=''
2025-12-29 16:19:41.326 INFO KyuubiSessionManager: Closing session 375d3aed-6fac-4a10-95ae-e47ab7882496 that has been idle for more than 600000 ms
2025-12-29 16:19:41.326 INFO KyuubiSessionManager: hive's KyuubiSessionImpl with SessionHandle [375d3aed-6fac-4a10-95ae-e47ab7882496] is closed, current opening sessions 24
org.apache.kyuubi.KyuubiSQLException: Invalid SessionHandle [375d3aed-6fac-4a10-95ae-e47ab7882496]
2025-12-29 16:19:43.813 INFO KyuubiTBinaryFrontendService: Received request of closing SessionHandle [375d3aed-6fac-4a10-95ae-e47ab7882496]
org.apache.kyuubi.KyuubiSQLException: Invalid SessionHandle [375d3aed-6fac-4a10-95ae-e47ab7882496]
2025-12-29 16:19:43.817 INFO KyuubiTBinaryFrontendService: Finished closing SessionHandle [375d3aed-6fac-4a10-95ae-e47ab7882496]
2025-12-29 16:35:00.893 INFO KubernetesApplicationAuditLogger: label=375d3aed-6fac-4a10-95ae-e47ab7882496 context=cls-nctmeua9-100031385429-context-default namespace=spark pod=kyuubi-stress-test-cql-7-375d3aed-6fac-4a10-95ae-e47ab7882496-driver podState=Pending containers=[] appId=spark-3cda6199f3164f488cd234a020c8b557 appState=PENDING appError=''
2025-12-29 16:35:00.893 INFO KubernetesApplicationAuditLogger: label=375d3aed-6fac-4a10-95ae-e47ab7882496 context=null namespace=spark pod=kyuubi-stress-test-cql-7-375d3aed-6fac-4a10-95ae-e47ab7882496-driver podState=Pending containers=[] appId=spark-3cda6199f3164f488cd234a020c8b557 appState=PENDING appError=''
2025-12-29 16:35:00.954 INFO KubernetesApplicationAuditLogger: label=375d3aed-6fac-4a10-95ae-e47ab7882496 context=cls-nctmeua9-100031385429-context-default namespace=spark pod=kyuubi-stress-test-cql-7-375d3aed-6fac-4a10-95ae-e47ab7882496-driver podState=Pending containers=[aggregation->ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=PodInitializing, additionalProperties={}), additionalProperties={}),spark-kubernetes-driver->ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=PodInitializing, additionalProperties={}), additionalProperties={})] appId=spark-3cda6199f3164f488cd234a020c8b557 appState=PENDING appError=''
2025-12-29 16:35:00.954 INFO KubernetesApplicationAuditLogger: label=375d3aed-6fac-4a10-95ae-e47ab7882496 context=null namespace=spark pod=kyuubi-stress-test-cql-7-375d3aed-6fac-4a10-95ae-e47ab7882496-driver podState=Pending containers=[aggregation->ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=PodInitializing, additionalProperties={}), additionalProperties={}),spark-kubernetes-driver->ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=PodInitializing, additionalProperties={}), additionalProperties={})] appId=spark-3cda6199f3164f488cd234a020c8b557 appState=PENDING appError=''
2025-12-29 16:35:01.291 INFO KubernetesApplicationAuditLogger: label=375d3aed-6fac-4a10-95ae-e47ab7882496 context=cls-nctmeua9-100031385429-context-default namespace=spark pod=kyuubi-stress-test-cql-7-375d3aed-6fac-4a10-95ae-e47ab7882496-driver podState=Pending containers=[aggregation->ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=PodInitializing, additionalProperties={}), additionalProperties={}),spark-kubernetes-driver->ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=PodInitializing, additionalProperties={}), additionalProperties={})] appId=spark-3cda6199f3164f488cd234a020c8b557 appState=PENDING appError=''
2025-12-29 16:35:01.291 INFO KubernetesApplicationAuditLogger: label=375d3aed-6fac-4a10-95ae-e47ab7882496 context=null namespace=spark pod=kyuubi-stress-test-cql-7-375d3aed-6fac-4a10-95ae-e47ab7882496-driver podState=Pending containers=[aggregation->ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=PodInitializing, additionalProperties={}), additionalProperties={}),spark-kubernetes-driver->ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=PodInitializing, additionalProperties={}), additionalProperties={})] appId=spark-3cda6199f3164f488cd234a020c8b557 appState=PENDING appError=''
2025-12-29 16:35:02.006 INFO KubernetesApplicationAuditLogger: label=375d3aed-6fac-4a10-95ae-e47ab7882496 context=null namespace=spark pod=kyuubi-stress-test-cql-7-375d3aed-6fac-4a10-95ae-e47ab7882496-driver podState=Pending containers=[aggregation->ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=PodInitializing, additionalProperties={}), additionalProperties={}),spark-kubernetes-driver->ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=PodInitializing, additionalProperties={}), additionalProperties={})] appId=spark-3cda6199f3164f488cd234a020c8b557 appState=PENDING appError=''
2025-12-29 16:35:02.006 INFO KubernetesApplicationAuditLogger: label=375d3aed-6fac-4a10-95ae-e47ab7882496 context=cls-nctmeua9-100031385429-context-default namespace=spark pod=kyuubi-stress-test-cql-7-375d3aed-6fac-4a10-95ae-e47ab7882496-driver podState=Pending containers=[aggregation->ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=PodInitializing, additionalProperties={}), additionalProperties={}),spark-kubernetes-driver->ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=PodInitializing, additionalProperties={}), additionalProperties={})] appId=spark-3cda6199f3164f488cd234a020c8b557 appState=PENDING appError=''
2025-12-29 16:35:03.026 INFO KubernetesApplicationAuditLogger: label=375d3aed-6fac-4a10-95ae-e47ab7882496 context=cls-nctmeua9-100031385429-context-default namespace=spark pod=kyuubi-stress-test-cql-7-375d3aed-6fac-4a10-95ae-e47ab7882496-driver podState=Pending containers=[aggregation->ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=PodInitializing, additionalProperties={}), additionalProperties={}),spark-kubernetes-driver->ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=PodInitializing, additionalProperties={}), additionalProperties={})] appId=spark-3cda6199f3164f488cd234a020c8b557 appState=PENDING appError=''
2025-12-29 16:35:03.027 INFO KubernetesApplicationAuditLogger: label=375d3aed-6fac-4a10-95ae-e47ab7882496 context=null namespace=spark pod=kyuubi-stress-test-cql-7-375d3aed-6fac-4a10-95ae-e47ab7882496-driver podState=Pending containers=[aggregation->ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=PodInitializing, additionalProperties={}), additionalProperties={}),spark-kubernetes-driver->ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=PodInitializing, additionalProperties={}), additionalProperties={})] appId=spark-3cda6199f3164f488cd234a020c8b557 appState=PENDING appError=''
2025-12-29 16:35:04.052 INFO KubernetesApplicationAuditLogger: label=375d3aed-6fac-4a10-95ae-e47ab7882496 context=cls-nctmeua9-100031385429-context-default namespace=spark pod=kyuubi-stress-test-cql-7-375d3aed-6fac-4a10-95ae-e47ab7882496-driver podState=Running containers=[aggregation->ContainerState(running=ContainerStateRunning(startedAt=2025-12-29T08:35:03Z, additionalProperties={}), terminated=null, waiting=null, additionalProperties={}),spark-kubernetes-driver->ContainerState(running=ContainerStateRunning(startedAt=2025-12-29T08:35:03Z, additionalProperties={}), terminated=null, waiting=null, additionalProperties={})] appId=spark-3cda6199f3164f488cd234a020c8b557 appState=RUNNING appError=''
2025-12-29 16:35:04.052 INFO KubernetesApplicationAuditLogger: label=375d3aed-6fac-4a10-95ae-e47ab7882496 context=null namespace=spark pod=kyuubi-stress-test-cql-7-375d3aed-6fac-4a10-95ae-e47ab7882496-driver podState=Running containers=[aggregation->ContainerState(running=ContainerStateRunning(startedAt=2025-12-29T08:35:03Z, additionalProperties={}), terminated=null, waiting=null, additionalProperties={}),spark-kubernetes-driver->ContainerState(running=ContainerStateRunning(startedAt=2025-12-29T08:35:03Z, additionalProperties={}), terminated=null, waiting=null, additionalProperties={})] appId=spark-3cda6199f3164f488cd234a020c8b557 appState=RUNNING appError=''
2025-12-29 16:35:42.627 INFO ZookeeperDiscoveryClient: Get service instance:xx:38295 engine id:spark-3cda6199f3164f488cd234a020c8b557 and version:1.9.1.1 under /kyuubi-CONNECTION_SPARK_SQL/hive/375d3aed-6fac-4a10-95ae-e47ab7882496
2025-12-29 16:35:43.539 INFO KyuubiSessionImpl: [hive:xx] SessionHandle [375d3aed-6fac-4a10-95ae-e47ab7882496] - Connected to engine [xx:38295]/[spark-3cda6199f3164f488cd234a020c8b557] with SessionHandle [375d3aed-6fac-4a10-95ae-e47ab7882496]]Kyuubi Engine Log Output
Kyuubi Server Configurations
Kyuubi Engine Configurations
Additional context
No response
Are you willing to submit PR?
- Yes. I would be willing to submit a PR with guidance from the Kyuubi community to fix.
- No. I cannot submit a PR at this time.