我实现了各种ETL过程,例如使用sqoop从mysql加载数据到hdfs,使用Spark和Scala转换数据,使用Spark和Scala执行分析并将数据加载回HDFS。
There is a retail database with the following tables:
1.)Customers
2.)Orders
3.)Order_Items
4.)Products
5.)Categories
6.)Departments
I implemented various ETL processes like loading the data using sqoop from mysql to hdfs, transform the data using Spark and Scala,
perform analytics using Spark and Scala and loading the data back to HDFS.
I have added a document called ‘Project Requirements’ which specifies the problem statements in this project.