项目作者: williamd4112

项目描述 :
A python implementation of linear regression algorithm. (including Maximum Likelihood, Maximum a posterior, Bayesian)
高级语言: Python
项目地址: git://github.com/williamd4112/simple-linear-regression.git
创建时间: 2017-03-11T17:28:21Z
项目社区:https://github.com/williamd4112/simple-linear-regression

开源协议:MIT License

下载


Introduction

Python Tensorflow implementation for three kinds of linear regression algorithm. (Maximum Likelihood, Maximum a posterior, Bayesian). This project aims to predict height map of south Taiwan and study the difference of these three kinds of linear regression algorithms. (Implementation detail mentioned in doc/report.pdf)

Results

Minimumn Mean-square error (MSE) of three approaches

ML MAP Bayesian
64.525 48.585709 39.591061

Visualization

Maximum Likelihood

3D 2D
ml-3d ml-2d

Maximum a Posterior

3D 2D
map-3d map-2d

Bayesian

3D 2D
bayes-3d bayes-2d

Dependencies

  • numpy
  • Tensorflow
  • Scipy (kmeans)
  • Scikit-learn (k-fold)

To run pre-trained model

  1. ./test_bayes.sh {X} model/bayes/bayes.npy model/bayes/bayes-mean.npy model/bayes/bayes-sigma.npy {Y}
  2. ./test_ml.sh {X} model/ml/ml.npy model/ml/ml-mean.npy model/ml/ml-sigma.npy {Y}
  3. ./test_map.sh {X} model/map/map.npy model/map/map-mean.npy model/map/map-sigma.npy {Y}

Prediction results would be saved at {Y} (output path)

To score the predictions

  1. python score.py {predictions}.csv {ground truth}.csv

To train the model

  1. ./train_bayes.sh {Fraction of training data}
  2. ./train_ml.sh {Fraction of training data}
  3. ./train_map.sh {Fraction of training data}
  4. All hyperparameters in the scripts are set to optimal settings.

To train with cross validation

  1. ./train_bayes_cross_validation.sh "{list of m0}" "{list of s0}" "{list of beta}" "{list of d}"
  2. ./train_ml_cross_validation.sh "{list of epoch}" "{list of batch size}" "{list of learning rate}" "{list of d}"
  3. ./train_map_cross_validation.sh "{list of epoch}" "{list of batch size}" "{list of learning rate}" "{list of d}" "{list of alpha}"
  4. NOTE: parameter 'd' depends on the pre-preprocessing method defined in script.
  5. pre=grid: grid cell size
  6. pre=kmeans: number of cluster
  7. e.g. ./train_bayes_cross_validation.sh "0.0" "2.0" "25.0 12.5" "1024 2048"

The result of cross validation will be saved at log/{model description}

To test the model

  1. ./test_bayes.sh {input data X} {model path} {model mean path} {model sigma path} {output path}
  2. ./test_ml.sh {input data X} {model path} {model mean path} {model sigma path} {output path}
  3. ./test_map.sh {input data X} {model path} {model mean path} {model sigma path} {output path}
  4. e.g. ./test_bayes.sh X_test.csv model/bayes-m0-0.0-s0-2.0-beta-25.0-grid-0.015.npy model/bayes-m0-0.0-s0-2.0-beta-25.0-grid-0.015-mean.npy model/bayes-m0-0.0-s0-2.0-beta-25.0-grid-0.015-sigma.npy