项目作者: anthonywong611

项目描述 :
Create a data pipeline on AWS to execute batch processing in a Spark cluster provisioned by Amazon EMR. ETL using managed airflow: extracts data from S3, transform data using spark, load transformed data back to S3.
高级语言: Python
项目地址: git://github.com/anthonywong611/Batch-ETL-with-AWS-EMR-and-MWAA.git