项目作者: riyadparvez

项目描述 :
PySpark notebooks
高级语言: Jupyter Notebook
项目地址: git://github.com/riyadparvez/pyspark-datascience.git
创建时间: 2018-05-02T02:49:24Z
项目社区:https://github.com/riyadparvez/pyspark-datascience

开源协议:MIT License

下载


pyspark-notebooks

PySpark Jupyter notebooks

Installation

We provide a pre-built docker image for easy experimentation. The docker image is based on offical jupyter pyspark-notebook image. Some additional packages have been installed.

To pull the image:
docker pull riyadparvez/pyspark-notebooks

To run a container:
docker run --rm -p 8888:8888 -p 8080:8080 -p 4040:4040 -v /path/to/pyspark-notebooks:/home/jovyan/work --name pyspark-notebook riyadparvez/pyspark-notebooks start-notebook.sh --NotebookApp.token=''

Please see the documentation of official jupyter docker image for more usage.

Notebooks

Most of the notebooks are WIP.
Complete notebooks are:

Datasets

Most of the notebooks are from Kaggle competitions or datasets from University of California at Irvine Machine Learning Repository. For UCI repositories, data are downloaded automatically in notebooks themselves. But for Kaggle datasets, you have to download the datasets yourself, since there is not good automated way to download those datasets.