项目作者: dwp

项目描述 :
Docker python/3.6-alpine image with pyspark and pytest.
高级语言: Python
项目地址: git://github.com/dwp/docker-python-pyspark-pytest.git
创建时间: 2020-04-30T11:49:23Z
项目社区:https://github.com/dwp/docker-python-pyspark-pytest

开源协议:MIT License

下载


docker-python-pyspark-pytest

Docker python/3.6-alpine image with pyspark and pytest.

To run your own project’s unit tests within this container:

  1. docker run -v $(pwd):/some-container-dir -it dwpdigital/python3-pyspark-pytest /bin/sh
  2. cd /some-container-dir
  3. pytest tests

Note that if your container is running in an environment with no/limited
Internet connectivity then you should configure PySpark to use the included local
Ivy & Maven repositories by setting the PYSPARK_SUBMIT_ARGS environment variable
before creating your Spark session, e.g. in tests/conftest.py:

  1. def spark():
  2. os.environ["PYSPARK_SUBMIT_ARGS"] = '--packages "org.apache.hadoop:hadoop-aws:2.7.3" --conf spark.jars.ivySettings=/root/ivysettings.xml pyspark-shell'
  3. os.environ["PYSPARK_PYTHON"] = ('python3')
  4. os.environ["PYSPARK_DRIVER_PYTHON"] = ('python3')
  5. spark = (
  6. SparkSession.builder.master("local")
  7. .appName("test")
  8. .enableHiveSupport()
  9. .getOrCreate()
  10. )