Machine Learning Algorithms - Simple and Naive Implementation

As a software engineer with not that strong math background, I had a great difficulty in understanding ML algorithms. It may take me only several minutes to know what an algorithm looks like, but it really takes some time to understand the further details. As a way of learning, I decided to implement some basic ml algorithms from scratch, and this project is the result.

The API is merely a copy from scikit-learn (~~and Keras~~). There’s no parameter check nor any optimization. I tried to focus on the most simple and naive part of these algorithms and keep the code as “dense” as possible.

Currently implemented algorithms are as follows:

The traditional ML algorithms are implemented in the way that it can be used just like scikit-learn,

import numpy as np
from sklearn.datasets import load_boston
# from sklearn.linear_model import LinearRegression
# from sklearn.model_selection import train_test_split
# from sklearn.metrics import mean_absolute_error
# from sklearn.preprocessing import StandardScaler
from mla_sani.supervised.linear_model import LinearRegression
from mla_sani.model_selection import train_test_split
from mla_sani.preprocessing import StandardScaler
from mla_sani.metrics import mean_absolute_error
data = load_boston()
X, y = data.data, data.target
X_train, X_test, y_train, y_test = train_test_split(X, y)
scaler = StandardScaler().fit(X_train)
X_train, X_test = scaler.transform(X_train), scaler.transform(X_test)
model = LinearRegression().fit(X_train, y_train)
y_pred = model.predict(X_test)
print(mean_absolute_error(y_test, y_pred))

while the DL algorithms are implemented in the way that it just looks like simplified Keras

import numpy as np
from sklearn.datasets import load_digits
from mla_sani.model_selection import train_test_split
from mla_sani.metrics import confusion_matrix
from mla_sani.nn.layers import Input, Conv2D, Activation, Dropout, Flatten, Dense
from mla_sani.nn.models import Sequential
from mla_sani.nn.optimizers import Adam
from mla_sani.losses import CategoricalCrossEntropy
data = load_digits()
X, y = data.data.reshape(-1, 8, 8, 1), data.target
X_train, X_test, y_train, y_test = train_test_split(X, y)
cnn = Sequential()
cnn.add(Input(X.shape[1:]))
cnn.add(Conv2D(16, (3, 3), padding='same'))
cnn.add(Activation('relu'))
cnn.add(Dropout(rate=0.1))
cnn.add(Flatten())
cnn.add(Dense(10))
cnn.add(Activation('softmax'))
cnn.compile(optimizer=Adam(), loss=CategoricalCrossEntropy(labels=np.unique(y)))
cnn.fit(X_train, y_train, epochs=30, batch_size=128)
y_pred = cnn.predict(X_test).argmax(axis=1)
print(confusion_matrix(y_test, y_pred))

Hopefully, this project could help some engineers who are not that good at math, but know coding well and are tring to grasp these algrithms as quick as possible.