项目作者: AkashSDas

项目描述 :
Using regression models(one build from scratch and other build using sklearn module) to predict Canada's GDP in the upcoming years.
高级语言: Jupyter Notebook
项目地址: git://github.com/AkashSDas/predict-gdp-of-canada.git
创建时间: 2020-05-09T18:22:43Z
项目社区:https://github.com/AkashSDas/predict-gdp-of-canada

开源协议:MIT License

下载


predict-gdp-of-canada

forthebadge made-with-python

Maintenance

Ask Me Anything !

PyPI license

Table of contents

About

In this project there are two jupyter notebooks namely from-scratch.ipynb and using-sklearn.ipynb.

In from-scratch.ipynb a linear regression model is built from scratch, using numpy for mathematical operations. This model is then trained with the data to predict Canada’s GDP where Year of which we want the GDP is the input data.

In from-scratch.ipynb Gradient Descentand Normal Equation(since the size of data is less than 10,000) are for finding the best parameters and then the model is evaluated using the test data.

In using-sklearn.ipynb sklearn module is used the and machine learning techinques like Cross Validation, Analyzing Learning Curve and Parameter Tunning are used to train the model and then it is evaluated with the test data.

Technologies Used

is used as Programming Language.

Numpy is used for the mathematical and data manipulation.

Pandas is used to analysis and manipulation of data.

Matplotlib and Seaborn are used for data visualisation which helped in the analysis of data.

Sciki-learn is used for data preprocessing, creating machine learning model and evaluating it, thus creating a pipeline.

Pipenv is the virtual environment used for the project. Jupyter Notebook is used to for the entire data science and machine learning life cycle.

Results of the Project

Results of from-scratch.ipynb and using-sklearn.ipynb are same i.e. the regression model built using sklearn module and the one built just using numpy gives the same results.

Line Plot

Line Plot

Correlation Matrix

Correlation Matrix

Cross Validation Score

Cross Validation Score

Learning Curve

Learning Curve

Fitted Line

Fitted Line

Metrics Scores

Metrics Scores

Actual VS Prediction

Metrics Scores

Installation

It is highly recommended to use virtual enviroment for this project to avoid any issues related to dependencies.

Here pipenv is used for this project.

There is a requirements.txt file in 'Predict-GDP-of-Canada'/requirements.txt which has all the dependencies for this project.

  • First, start by closing the repository
  1. git clone https://github.com/AkashSDas/Predict-GDP-of-Canada
  • Start by installing pipenv if you don’t have it
  1. pip install pipenv
  • Once installed, access the venv folder inside the project folder
  1. cd 'Predict-GDP-of-Canada'/venv/
  • Create the virtual environment
  1. pipenv install

The Pipfile of the project must be for creating replicating project’s virtual enviroment.

This will install all the dependencies and create a Pipfile.lock (this should not be altered).

  • Enable the virtual environment
  1. pipenv shell
  • dataset, jupyter notebook and model are in 'Predict-GDP-of-Canada'/venv/src folder.
  1. cd src/
  • To start/view the jupyter notebook
  1. jupyter noterbook

This will open a webpage in the browser from there you can click on notebook.ipynb to view it.

Data Source

The source of the data used here is the World Bank national accounts data, and OECD National Accounts data files.

License

This project is licensed under the MIT License - see the MIT LICENSE file for details.