项目作者: 5thempire

项目描述 :
PySpark 2.4.3 Docker environment for development and testing
高级语言: Python
项目地址: git://github.com/5thempire/pyspark.git
创建时间: 2019-07-22T21:51:20Z
项目社区:https://github.com/5thempire/pyspark

开源协议:MIT License

下载


PySpark

Docker Cloud Automated build
Docker Cloud Build Status
GitHub

PySpark is a unified analytics engine. For documentation you should check Spark and PySpark.

This meant to be a platform for development and testing.

PySpark version 2.4.3

alt text

How to use this image

Start a pyspark instance

  1. docker run -ti 5thempire/pyspark:latest spark-submit /opt/pyspark/pi.py

…via docker-compose

Example docker-compose.yml for pyspark

  1. version: '3.4'
  2. services:
  3. spark:
  4. image: 5thempire/pyspark:latest
  5. container_name: spark
  6. stdin_open: true
  7. tty: true
  8. volumes:
  9. - ./code:/opt/pyspark
  10. ports:
  11. - "8080:8080"
  12. - "8888:8888"

How to use the Makefile

The Makefile is meant to automate with ease the typical tasks in the project.

To set it up, you should run

  1. make setup

For a pi sample, you should run

  1. make pi-sample

To explore pdb, run the following

  1. make pi-debug

Samples

All the samples are based upon pi.py.