项目作者: Arkaprabha-B

项目描述 :
Implementation of GraphFrames using PySpark in Eclipse IDE
高级语言:
项目地址: git://github.com/Arkaprabha-B/PySpark-GraphFrames.git
创建时间: 2019-06-26T16:13:21Z
项目社区:https://github.com/Arkaprabha-B/PySpark-GraphFrames

开源协议:

下载


PySpark-GraphFrames

Implementation of GraphFrames using PySpark in Eclipse Oxygen IDE

1. Installation Steps:

a. Apache Spark vesrion: 2.4.0: Download Apache Spark from spark website and extract thrice using “7-Zip”
Final Folder Structure will be: D:/Spark/spark-2.4.0-bin-hadoop2.7

b. Download Scala 2.11.8 windows .msi file from Scala website

c. Download Java 8, Python 3.7.2, PyDev 6.4.3 and winutils-master (GitHub URL: https://github.com/steveloughran/winutils)

  1. * PyDev is a plugin that enables Eclipse to be used as a Python IDE.
  2. * winutils-master: It is hadoop flavor on top of windows.
  3. * Hadoop final folder structure: D:\hadoop-2.7.1
  4. *bin, README.md,winutils (copy winutils.exe here after extracting using "7-Zip")

d. Go to extracted folder for PyDev 6.4.3—->Go to plugins—->Copy all plugins and paste it into—-> Eclipse—->Plugins
Also, PyDev 6.4.3 features—->copy all features from PyDev and paste in Eclipse features folder.

e. Eclipse Oxygen IDE environment variables setup:

  1. * HADOOP_HOME path of the folder where winutils.exe is present
  2. * IP_HOME IP of the system
  3. * SPARK_CONF go to final extracted folder spark--->conf and paste the path in value
  4. * SPARK_HOME path of spark folder (final extracted folder)

f. Add GraphaFrames 0.7.0 jar inside the spark-2.4.0-bin-hadoop2.7 jars

2. Setup System Variables:

a. HADOOP_HOME

b. JAVA_HOME

c. SCALA_HOME

d. PYSPARK_PYTHON

e. SPARK_HOME

f. %SPARK_HOME%\bin to Path variables(editing the System variable Path)

g. PYSPARK_DRIVER_PYTHON = C:\Users\user\Anaconda3\envs\Scripts\jupyter.exe

h. PYSPARK_DRIVER_PYTHON_OPTS = notebook

* g. and h. in-case of using Jupyter Notebook, using Anaconda Navigator.