项目作者: nagdevAmruthnath

项目描述 :
Ensemble ML is an R package for Ensemble training and deployment of ml models
高级语言: R
项目地址: git://github.com/nagdevAmruthnath/EnsembleML.git
创建时间: 2019-10-31T17:52:35Z
项目社区:https://github.com/nagdevAmruthnath/EnsembleML

开源协议:

下载


EnsembleML

EnsembleML is an R package for performing feature creation in time series and frequency series, building multiple regression and classification models and combining those models to be an ensemble. You can save and read models created using this package and also deploy them as API within the same model.

Installation of the package

The pacakge is currently only available in Github and won’t be seeing anytime in CRAN. Use devtools to install from github as follows.

  1. devtools::install_github("nagdevAmruthnath/EnsembleML")
  2. library(EnsembleML)

Feature creation

Features can be created both in time series and frequency series using this package. Use featureCreationTS() for time series and featureCreationF for frequency domain. The standard features include mean, sd, median, trimmed, mad, min, max, range, skew, kurtosis, se, iqr, nZero, nUnique, lowerBound, upperBound, and quantiles.

  1. data = rnorm(50)
  2. featureCreationTS(data)
  3. # TS_mean TS_sd TS_median TS_trimmed TS_mad TS_min TS_max TS_range TS_skew TS_kurtosis
  4. #X1 0.2107398 1.025315 0.1822303 0.2097342 0.8026293 -2.194434 3.161181 5.355616 0.1251634 0.6376311
  5. # TS_se TS_iqr TS_nZero TS_nUnique TS_lowerBound TS_upperBound TS_X1. TS_X5. TS_X25.
  6. #X1 0.1450015 1.041068 0 50 -1.850106 2.314167 -2.116863 -1.388573 -0.288504
  7. # TS_X50. TS_X75. TS_X95. TS_X99.
  8. #X1 0.1822303 0.7525642 1.661972 2.814687

Summary of the data

numSummary() function can be used to generate the numerical summary of the entire data set. The example for iris data set is shown below. Rest of the documentation will include using iris data set.

  1. data(iris)
  2. numSummary(iris)
  3. # n mean sd max min range nunique nzeros iqr lowerbound upperbound noutlier kurtosis
  4. # Sepal.Length 150 5.84 0.828 7.9 4.3 3.6 35 0 1.30 3.15 8.35 0 -0.606
  5. # Sepal.Width 150 3.06 0.436 4.4 2.0 2.4 23 0 0.50 2.05 4.05 4 0.139
  6. # Petal.Length 150 3.76 1.765 6.9 1.0 5.9 43 0 3.55 -3.72 10.42 0 -1.417
  7. # Petal.Width 150 1.20 0.762 2.5 0.1 2.4 22 0 1.50 -1.95 4.05 0 -1.358
  8. # skewness mode miss miss% 1% 5% 25% 50% 75% 95% 99%
  9. # Sepal.Length 0.309 5.0 0 0 4.40 4.60 5.1 5.80 6.4 7.25 7.70
  10. # Sepal.Width 0.313 3.0 0 0 2.20 2.34 2.8 3.00 3.3 3.80 4.15
  11. # Petal.Length -0.269 1.4 0 0 1.15 1.30 1.6 4.35 5.1 6.10 6.70
  12. # Petal.Width -0.101 0.2 0 0 0.10 0.20 0.3 1.30 1.8 2.30 2.50

Training multiple models

For most prototyping we end up training multiple models manually. This is not only time consuming but also not very efficient. multipleModels() function can be used to train multiple models at once as shown below. All the models uses caret function models. You an read more about it here https://topepo.github.io/caret/available-models.html

  1. mm = multipleModels(train = iris, test = iris, y = "Species", models = c("C5.0", "parRF"))
  2. # $summary
  3. # Accuracy Kappa AccuracyLower AccuracyUpper AccuracyNull AccuracyPValue McnemarPValue
  4. # C5.0 0.960 0.94 0.915 0.985 0.333 2.53e-60 NaN
  5. # parRF 0.973 0.96 0.933 0.993 0.333 8.88e-64 NaN

The bench mark for training multiple models for iris data set is as follows

  1. microbenchmark::microbenchmark(multipleModels(train = iris, test = iris, y = "Species", models = c("C5.0", "parRF")), times = 5)
  2. # Unit: seconds
  3. # expr min lq mean
  4. # multipleModels(train = iris, test = iris, y = "Species", models = c("C5.0", "parRF")) 22.6 22.6 22.9
  5. # median uq max neval
  6. # 22.7 22.7 23.8 5

Training an ensemble

Ensemble training is a concept of joining results from multiple models and feeding it to a different model. You can use ensembleTrain() function to achieve this. We use the results from multiple models mm and then feed it to this function as follows

  1. em = ensembleTrain(mm, train = iris, test = iris, y = "Species", emsembleModelTrain = "C5.0")
  2. # $summary
  3. # Confusion Matrix and Statistics
  4. #
  5. # Reference
  6. # Prediction setosa versicolor virginica
  7. # setosa 50 0 0
  8. # versicolor 0 47 1
  9. # virginica 0 3 49
  10. #
  11. # Overall Statistics
  12. #
  13. # Accuracy : 0.973
  14. # 95% CI : (0.933, 0.993)
  15. # No Information Rate : 0.333
  16. # P-Value [Acc > NIR] : <2e-16
  17. #
  18. # Kappa : 0.96
  19. #
  20. # Mcnemar's Test P-Value : NA
  21. #
  22. # Statistics by Class:
  23. #
  24. # Class: setosa Class: versicolor Class: virginica
  25. # Sensitivity 1.000 0.940 0.980
  26. # Specificity 1.000 0.990 0.970
  27. # Pos Pred Value 1.000 0.979 0.942
  28. # Neg Pred Value 1.000 0.971 0.990
  29. # Prevalence 0.333 0.333 0.333
  30. # Detection Rate 0.333 0.313 0.327
  31. # Detection Prevalence 0.333 0.320 0.347
  32. # Balanced Accuracy 1.000 0.965 0.975

Predicting from ensemble

predictEnsemble() function is used to predict from ensemble model

  1. predictEnsemble(em, iris)
  2. # prediction
  3. # 1 setosa
  4. # 2 setosa
  5. # 3 setosa
  6. # 4 setosa
  7. # 5 setosa
  8. # 6 setosa
  9. # 7 setosa
  10. # 8 setosa
  11. # .
  12. # .
  13. # .

Saving and reading the model

Ensemble models can be saved and read back to the memory as follows

  1. saveRDS(ensembleModel, "/home/savedEnsembleModel.RDS")
  2. readRDS("/home/savedEnsembleModel.RDS")

Deploying models as API

The trained models could be deployed as API using the same package as follows. First we need to save the models and then call them as follows

  1. library(dplyr)
  2. createAPI(host = '192.168.1.1', port = 8890)
  3. # Serving the jug at http://192.168.1.1:8890
  4. # [1] "Model was successfully loaded"
  5. # HTTP | /predict - POST - 200

Lets curl and see what we get

  1. curl -X POST \
  2. http://192.168.1.1:8890/predict \
  3. -H 'Host: http://192.168.1.1:8890' \
  4. -H 'content-type: multipart/form-data' \
  5. -F 'jsondata={"model":["/home/savedEnsembleModel.RDS"],"test":[{"Sepal.Length":5.1,"Sepal.Width":3.5,"Petal.Length":1.4,"Petal.Width":0.2,"Species":"setosa"}]}'

Issues and Tracking

If you have any issues related to the project, please post an issue and I will try to address it.