项目作者: PierreKieffer

项目描述 :
Docker multi-nodes Hadoop cluster with Spark 2.4.1 on Yarn
高级语言: Shell
项目地址: git://github.com/PierreKieffer/docker-spark-yarn-cluster.git
创建时间: 2019-04-15T15:37:37Z
项目社区:https://github.com/PierreKieffer/docker-spark-yarn-cluster

开源协议:Apache License 2.0

下载


Docker hadoop yarn cluster for spark 2.4.1



Provides Docker multi-nodes Hadoop cluster with Spark 2.4.1 on Yarn.

Usage

Build

  1. make build

Run

  1. make start

Stop

  1. make stop

Connect to Master Node

  1. make connect
  1. ---- MASTER NODE ----
  2. root@cluster-master:/#

Run spark applications on cluster :

Once connected to the master node

spark-shell

  1. spark-shell --master yarn --deploy-mode client

spark submit

  1. spark-submit --master yarn --deploy-mode [client or cluster] --num-executors 2 --executor-memory 4G --executor-cores 4 --class org.apache.spark.examples.SparkPi $SPARK_HOME/examples/jars/spark-examples_2.11-2.4.1.jar

Web UI

  • Get master node ip:
    1. make master-ip
    1. ---- MASTER NODE IP ----
    2. Master node ip : 172.20.0.4
  • Access to Hadoop cluster Web UI : master-node-ip:8088
  • Access to spark Web UI : master-node-ip:8080
  • Access to hdfs Web UI : master-node-ip:50070