项目作者: curiousily

项目描述 :
Tutorial on experiment tracking and reproducibility for Machine Learning projects with DVC
高级语言: Python
项目地址: git://github.com/curiousily/Reproducible-ML-with-DVC.git
创建时间: 2020-05-18T19:34:34Z
项目社区:https://github.com/curiousily/Reproducible-ML-with-DVC

开源协议:MIT License

下载


Setup

Read the complete tutorial here

  1. git clone git@github.com:curiousily/Reproducible-ML-with-DVC.git
  1. pipenv install --dev
  1. git checkout pre-dvc

DVC

Initialize DVC

  1. dvc init

and add remote storage (local in this case)

  1. dvc remote add -d localremote /tmp/dvc-storage

disable analytics (optional)

  1. dvc config core.analytics false

Experiment with Linear Regression

Build Dataset

  1. dvc run -f assets/data.dvc \
  2. -d studentpredictor/create_dataset.py \
  3. -o assets/data \
  4. python studentpredictor/create_dataset.py

Create features

  1. dvc run -f assets/features.dvc \
  2. -d studentpredictor/create_features.py \
  3. -d assets/data \
  4. -o assets/features \
  5. python studentpredictor/create_features.py

Train model

  1. dvc run -f assets/models.dvc \
  2. -d studentpredictor/train_model.py \
  3. -d assets/features \
  4. -o assets/models \
  5. python studentpredictor/train_model.py

Evaluate the model and save metrics (RMSE and r^2)

  1. dvc run -f assets/evaluate.dvc \
  2. -d studentpredictor/evaluate_model.py \
  3. -d assets/features \
  4. -d assets/models \
  5. -M assets/metrics.json \
  6. python studentpredictor/evaluate_model.py

Check the metrics for your current model:

  1. dvc metrics show -T

Experiment with Random Forest

Checkout the Random Forest experiment:

  1. git checkout rf-experiment

Reproduce everything with the RF model

  1. dvc repro assets/evaluate.dvc

Check the metrics for the Random Forest model compared to the Linear Regression:

  1. dvc metrics show -T

Read the complete tutorial here

License

MIT