spark-local-setup

Repo with ready-to-run Spark jobs or notebooks.

This setup lets you run PySpark notebooks/jobs directly inside VS Code, connected to the Spark cluster. You get an isolated environment without needing to manually install Python, PySpark, or Jupyter locally.

Prerequisites

Make sure you have the following installed:

Docker
Docker Compose
VS Code
Dev Containers extension (optional, for VS Code)

Project Structure

project-root/
├── .devcontainer/
│   └── devcontainer.json
├── docker-compose.yml
├── requirements.txt
├── jobs/
│   └── example_job.py
└── notebooks/
    └── example_notebook.ipynb

jobs/ — Python scripts using PySpark
notebooks/ — Jupyter notebooks
requirements.txt — Python dependencies (shared for all jobs and notebooks)

Option 1: Using VS Code Dev Containers

Open the project in VS Code
```
code .
```
Reopen in Dev Container:

Press fn + F1 → search “Dev Containers: Rebuild and Reopen in Container”.

VS Code will build the container based on docker-compose.yml and devcontainer.json.
Run PySpark jobs
```
python /home/jovyan/jobs/example_job.py
```
Run notebooks
- Open any .ipynb file in notebooks/.
- The pre-configured kernel has PySpark ready — run cells directly.
- Spark UI logs are accessible in VS Code output and web UI.

Option 2: Using Docker Compose Directly

Build and start containers
```
docker-compose up -d
```
This will start:
- Spark master + Spark history server
- Jupyter notebook/lab
- Mounts for jobs, notebooks, and logs
Access Jupyter Lab
- Open your browser: http://localhost:8888
- Use the token specified in docker-compose.yml (e.g., yourtoken123).

Run PySpark scripts inside the container

docker exec -it pyspark bash
python /home/jovyan/jobs/example_job.py

Spark UI
- Spark Master UI: http://localhost:8080
- Spark Application UI: http://localhost:4040
- History Server: http://localhost:18080

Adding Dependencies

Edit requirements.txt with any packages you need, e.g.:

pandas
numpy
matplotlib
findspark

If using Dev Container:

Press fn + F1 → search “Dev Containers: Rebuild and Reopen in Container”

If using Docker Compose directly:

docker exec -it jupyter pip install -r /home/jovyan/requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

spark-local-setup

Prerequisites

Project Structure

Option 1: Using VS Code Dev Containers

Option 2: Using Docker Compose Directly

Adding Dependencies

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.devcontainer		.devcontainer
jobs		jobs
notebooks		notebooks
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

spark-local-setup

Prerequisites

Project Structure

Option 1: Using VS Code Dev Containers

Option 2: Using Docker Compose Directly

Adding Dependencies

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages