git clone https://github.com/huy-dataguy/Spark-on-YARN.git
cd Spark-on-YARN⏳ Note: The first build may take a few minutes as no cached layers exist.
docker build -t base -f docker/base.dockerfile .- build image (build in the first time or after make changes in dockerfile)
docker compose -f docker/compose.yaml build- run container
docker compose -f docker/compose.yaml up -d- Go inside master container's CLI
💡 Start the HDFS - YARN services:
start-dfs.sh
start-yarn.sh
Create folder store spark logs
hdfs dfs -mkdir /spark-logsRun spark on yarn
spark-submit \
--class org.apache.spark.examples.SparkPi \
$SPARK_HOME/examples/jars/spark-examples_*.jar 10If success you will see answear Pi = 3,14159

You can access the following web interfaces to monitor and manage your Hadoop cluster:
-
YARN Resource Manager UI → http://localhost:9004
Provides an overview of cluster resource usage, running applications, and job details. -
NameNode UI → http://localhost:9870
Displays HDFS file system details, block distribution, and overall health status. -
Spark Web UI → http://localhost:4040 Provides an interface to monitor running Spark jobs, stages, and tasks. Note: Because you are using YARN client mode, the Spark UI will automatically redirect to the master node's web UI.
📧 Email: quochuy.working@gmail.com
💬 Feel free to contribute and improve this project! 🚀
