项目作者: rss161030

项目描述 :
我实现了各种ETL过程,例如使用sqoop从mysql加载数据到hdfs,使用Spark和Scala转换数据,使用Spark和Scala执行分析并将数据加载回HDFS。
高级语言:
项目地址: git://github.com/rss161030/ETL-processes-using-Sqoop-Hadoop-Hive-Spark-and-Scala.git


ETL-processes-using-Sqoop-Hadoop-Hive-Spark-and-Scala

There is a retail database with the following tables:
1.)Customers
2.)Orders
3.)Order_Items
4.)Products
5.)Categories
6.)Departments

I implemented various ETL processes like loading the data using sqoop from mysql to hdfs, transform the data using Spark and Scala,
perform analytics using Spark and Scala and loading the data back to HDFS.

I have added a document called ‘Project Requirements’ which specifies the problem statements in this project.