项目作者: cdubiel08
项目描述 :
Project for exploration of extract, transform, load process using Python, mongoDB and Flask. Data sets included cryptocurrency pricing and COVID case counts.
高级语言: Jupyter Notebook
项目地址: git://github.com/cdubiel08/ETL-Project-Group-9.git
ETL-Project-Group-9
Team Members:
Chad Dubiel, David Martinez, Katy Fuentes
Scope of Research:
Correlation between cryptocurrency pricing and Covid case counts.
Github Repo:
https://github.com/cdubiel08/ETL-Project-Group-9
Data Sources:
Source:
Other:
- What useful investigation could be done with the final database?
Use the output and compare to markets, commodities, or US dollar. - Whether final database will be relational or non-relational. Why?
Relational because the information will be interconnected based on a timeframe.
Considerations:
Dates not a good join method, need a unique ID for primary key
Data Analysis
- Pandas - for data formatting, date cleaning, reduce columns
- Mongo - better for skipping null values which would skip data column, any covid/crypto overlaps captured
Steps
Data Sources:
- At least 2 (or more) sources
- If possible, try to incorporate a web API as one of your data sources.
ETL Process:
- Within Jupyter, build out the ETL process to extract your data from their sources, apply some level of transformation, and
load the resulting data to a database (relational or non-relational)
Flask API:
- Build a Flask application that has a route that will execute a query to your database and return the results in JSON format.
Final Report:
- Write up a short report that details your 3 ETL steps.
- More details on a later slide.
Github Repo:
- Store all of your project files in a well-organized project repository
- Each member of your team will submit a link to your project repo to BCS by the end of class Tuesday
Write Up Process Summary:
- What data sources you chose and why?
- Detailing the process of the extraction, transformation, and loading steps
- Explain why you have performed the types of transformation you did
- Why you chose the type of final database
- Schema of the tables/collections in the final database
- Hypothetical use case(s) for your database