项目作者: Sacry

项目描述 :
Yet another ml from scratch.
高级语言: Python
项目地址: git://github.com/Sacry/mla_sani.git
创建时间: 2018-06-12T06:54:43Z
项目社区:https://github.com/Sacry/mla_sani

开源协议:MIT License

下载


Machine Learning Algorithms - Simple and Naive Implementation

As a software engineer with not that strong math background, I had a great difficulty in understanding ML algorithms. It may take me only several minutes to know what an algorithm looks like, but it really takes some time to understand the further details. As a way of learning, I decided to implement some basic ml algorithms from scratch, and this project is the result.

The API is merely a copy from scikit-learn (and Keras). There’s no parameter check nor any optimization. I tried to focus on the most simple and naive part of these algorithms and keep the code as “dense” as possible.

Currently implemented algorithms are as follows:

The traditional ML algorithms are implemented in the way that it can be used just like scikit-learn,

  1. import numpy as np
  2. from sklearn.datasets import load_boston
  3. # from sklearn.linear_model import LinearRegression
  4. # from sklearn.model_selection import train_test_split
  5. # from sklearn.metrics import mean_absolute_error
  6. # from sklearn.preprocessing import StandardScaler
  7. from mla_sani.supervised.linear_model import LinearRegression
  8. from mla_sani.model_selection import train_test_split
  9. from mla_sani.preprocessing import StandardScaler
  10. from mla_sani.metrics import mean_absolute_error
  11. data = load_boston()
  12. X, y = data.data, data.target
  13. X_train, X_test, y_train, y_test = train_test_split(X, y)
  14. scaler = StandardScaler().fit(X_train)
  15. X_train, X_test = scaler.transform(X_train), scaler.transform(X_test)
  16. model = LinearRegression().fit(X_train, y_train)
  17. y_pred = model.predict(X_test)
  18. print(mean_absolute_error(y_test, y_pred))

while the DL algorithms are implemented in the way that it just looks like simplified Keras

  1. import numpy as np
  2. from sklearn.datasets import load_digits
  3. from mla_sani.model_selection import train_test_split
  4. from mla_sani.metrics import confusion_matrix
  5. from mla_sani.nn.layers import Input, Conv2D, Activation, Dropout, Flatten, Dense
  6. from mla_sani.nn.models import Sequential
  7. from mla_sani.nn.optimizers import Adam
  8. from mla_sani.losses import CategoricalCrossEntropy
  9. data = load_digits()
  10. X, y = data.data.reshape(-1, 8, 8, 1), data.target
  11. X_train, X_test, y_train, y_test = train_test_split(X, y)
  12. cnn = Sequential()
  13. cnn.add(Input(X.shape[1:]))
  14. cnn.add(Conv2D(16, (3, 3), padding='same'))
  15. cnn.add(Activation('relu'))
  16. cnn.add(Dropout(rate=0.1))
  17. cnn.add(Flatten())
  18. cnn.add(Dense(10))
  19. cnn.add(Activation('softmax'))
  20. cnn.compile(optimizer=Adam(), loss=CategoricalCrossEntropy(labels=np.unique(y)))
  21. cnn.fit(X_train, y_train, epochs=30, batch_size=128)
  22. y_pred = cnn.predict(X_test).argmax(axis=1)
  23. print(confusion_matrix(y_test, y_pred))

Hopefully, this project could help some engineers who are not that good at math, but know coding well and are tring to grasp these algrithms as quick as possible.