Executing the raw_pyspark notebook results in the SparkContext shutting down at step 8. Trips by Hour of Day when creating the hourly spark dataframe.
Notebook Error
Py4JJavaError: An error occurred while calling o160.showString.
: org.apache.spark.SparkException: Job 18 cancelled because SparkContext was shut down
Inspecting the container logs shows that the crash is due to a jvm oom error.
Command
kubectl logs jupyterhub-<pod_id>-adm
Truncated Error
The API gave the following container statuses:
container name: spark-kubernetes-executor
container image: quay.io/okdp/spark-py:spark-3.5.6-python-3.11-scala-2.12-java-17
container state: terminated
container started at: 2026-04-24T09:53:41Z
container finished at: 2026-04-24T09:54:01Z
exit code: 137
termination reason: OOMKilled
Proposed Fix
Increasing spark executor and driver memory settings as well the kubernetes-memoryoverheadfactor allows the notebook to run successfully.
Initial Values:
spark = (
SparkSession.builder
.appName("NYC Tripdata — PySpark")
.config("spark.executor.memory", "2000M")
.config("spark.executor.cores", "2")
...
New Values:
spark = (
SparkSession.builder
.appName("NYC Tripdata — PySpark")
.config("spark.executor.memory", "4G")
.config("spark.driver.memory", "2G")
.config("spark.kubernetes.memoryOverheadFactor", "0.3")
...
Executing the raw_pyspark notebook results in the SparkContext shutting down at step 8. Trips by Hour of Day when creating the hourly spark dataframe.
Notebook Error
Inspecting the container logs shows that the crash is due to a jvm oom error.
Command
Truncated Error
Proposed Fix
Increasing spark executor and driver memory settings as well the kubernetes-memoryoverheadfactor allows the notebook to run successfully.
Initial Values:
New Values: