项目作者: yennanliu

项目描述 :
Run a simple spark word count job via 1) scala sbt spark 2) docker
高级语言: Scala
项目地址: git://github.com/yennanliu/spark-scala-word-count.git
创建时间: 2019-10-01T00:59:07Z
项目社区:https://github.com/yennanliu/spark-scala-word-count

开源协议:

下载


SPARK-SCALA-WORD-COUNT

A simplist demo on how to write, compile, export, and run a spark word count job via spark scala with sbt tool

Quick Start

  1. # STEP 0)
  2. $ git clone https://github.com/yennanliu/spark-scala-word-count.git && cd spark-scala-word-count
  3. # STEP 1) download the used dependencies.
  4. $ sbt clean compile
  5. # STEP 2) run spark word count via `sbt run`
  6. $ sbt run
  7. # STEP 3) create jars from spark scala scriots
  8. $ sbt assembly
  9. # STEP 4) run spark word count via `spark submit`
  10. $ spark-submit /Users/$USER/spark-scala-word-count/target/scala-2.11/spark-scala-word-count-assembly-1.0.jar

Quick Start (Docker)

  1. # STEP 0)
  2. $ git clone https://github.com/yennanliu/spark-scala-word-count.git
  3. # STEP 1)
  4. $ cd spark-scala-word-count
  5. # STEP 2) docker build
  6. $ docker build . -t spark_env
  7. # STEP 3) ONE COMMAND : run the docker env and sbt compile and sbt run and assembly once
  8. $ docker run --mount \
  9. type=bind,\
  10. source="$(pwd)"/.,\
  11. target=/spark-word-count \
  12. -i -t spark_env \
  13. /bin/bash -c "cd ../spark-word-count && sbt clean compile && sbt run && sbt assembly && spark-submit /spark-word-count/target/scala-2.11/spark-scala-word-count-assembly-1.0.jar"
  14. # STEP 3') : STEP BY STEP : access docker -> sbt clean compile -> sbt run -> sbt assembly -> spark-submit
  15. # docker run
  16. $ docker run --mount \
  17. type=bind,\
  18. source="$(pwd)"/.,\
  19. target=/spark-word-count \
  20. -i -t spark_env \
  21. /bin/bash
  22. # inside docker bash
  23. root@942744030b57:~ cd ../spark-word-count && sbt clean compile && sbt assembly
  24. root@942744030b57:~ spark-submit /spark-word-count/target/scala-2.11/spark-scala-word-count-assembly-1.0.jar

Reference

Todo

  • Auto commit built jar to S3/github/slack
  • Auto run spark jar at cloud