Skip to content

pyspark notebook results in sparkcontext shutdown due to oom error #6

@jsanc525

Description

@jsanc525

Executing the raw_pyspark notebook results in the SparkContext shutting down at step 8. Trips by Hour of Day when creating the hourly spark dataframe.

Notebook Error

Py4JJavaError: An error occurred while calling o160.showString.
: org.apache.spark.SparkException: Job 18 cancelled because SparkContext was shut down

Inspecting the container logs shows that the crash is due to a jvm oom error.

Command

kubectl logs jupyterhub-<pod_id>-adm

Truncated Error

The API gave the following container statuses:


         container name: spark-kubernetes-executor
         container image: quay.io/okdp/spark-py:spark-3.5.6-python-3.11-scala-2.12-java-17
         container state: terminated
         container started at: 2026-04-24T09:53:41Z
         container finished at: 2026-04-24T09:54:01Z
         exit code: 137
         termination reason: OOMKilled

Proposed Fix

Increasing spark executor and driver memory settings as well the kubernetes-memoryoverheadfactor allows the notebook to run successfully.

Initial Values:

spark = (
    SparkSession.builder
    .appName("NYC Tripdata — PySpark")
    .config("spark.executor.memory", "2000M")
    .config("spark.executor.cores", "2")
    ...

New  Values:

spark = (
    SparkSession.builder
    .appName("NYC Tripdata — PySpark")
    .config("spark.executor.memory", "4G")
    .config("spark.driver.memory", "2G")
    .config("spark.kubernetes.memoryOverheadFactor", "0.3")
    ...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions