Movie_ETL

Using the ETL process to clean and merge data.

Goal

📽️ Extract the movie data from Wikipedia and Kaggle from their respective files, transform the datasets by cleaning them and merging them together, then load the cleaned dataset into a SQL database.

ETL Process

Two examples of how the movie information from Wikipedia was cleaned is the identifican of alternate titles for the films and the standardization of the column names.

One other way the information was condesned was to filter out TV programs using a if statement.

Tables in Database

The “movies” table contains 6,052 rows based on the kaggle and wikipedia data.

The “ratings” table includes 26,024,289 rows of data.