项目作者: nitinvinayak

项目描述 :
CS F320: Foundations of Data Science Assignment
高级语言: Scilab
项目地址: git://github.com/nitinvinayak/ML-using-PySpark.git
创建时间: 2019-11-30T07:12:48Z
项目社区:https://github.com/nitinvinayak/ML-using-PySpark

开源协议:

下载


ML-using-PySpark

CS F320: Foundations of Data Science Assignment

Dataset

KNN

https://www.kaggle.com/nitinvinayak/13-dimension-10-million-big-data-high-dimension

Bisecting K Means

https://www.kaggle.com/nitinvinayak/shuttle

KNN and Bisecting Kmeans implementation on PySpark

Python 3.8 and Spark 3.0 are used

For spark installation help:

https://stackoverflow.com/questions/54377365/apache-spark-on-cluster-of-only-2-computers
https://towardsdatascience.com/how-to-use-pyspark-on-your-computer-9c7180075617
@josemarcialportilla/installing-scala-and-spark-on-ubuntu-5665ee4b62b1"">https://medium.com/@josemarcialportilla/installing-scala-and-spark-on-ubuntu-5665ee4b62b1
@josemarcialportilla/installing-scala-and-spark-on-windows-249632e6b83b"">https://medium.com/@josemarcialportilla/installing-scala-and-spark-on-windows-249632e6b83b