CS7641 - Machine Learning

https://github.com/ezerilli/CS7641-Machine_Learning

SETTING UP THE ENVIRONMENT 👨🏻‍💻👨🏻‍💻👨🏻‍💻

The following steps lead to setup the working environment for CS7641 - Machine Learning
in the OMSCS program. 👨🏻‍💻‍📚‍‍‍‍

Installing the conda environment is a ready-to-use solution to be able to run python scripts without having to worry
about the packages and versions used. Alternatively, you can install each of the packages in requirements.yml on your
own independently with pip or conda.

Start by installing Conda for your operating system following the instructions here.
Now install the environment described in requirements.yaml:
```
conda env create -f requirements.yml
```
To activate the environment run:
```
conda activate CS7641
```
Once inside the environment, if you want to run a python file, run:
```
python my_file.py
```
To deactivate the environment run:
```
conda deactivate
```
During the semester I may need to add some new packages to the environment. So, to update it run:
```
conda env update -f requirements.yml
```

ASSIGNMENT1 - SUPERVISED LEARNING ‍🔥🔥🔥

This assignment aims to explore 5 Supervised Learning algorithms (k-Nearest Neighbors, Support Vector Machines,
Decision Trees, AdaBoost and Neural Networks) and to perform model complexity analysis and learning curves while
comparing their performances on two interesting datasets: the Wisconsin Diagnostic Breast Cancer (WDBC) and the
Handwritten Digits Image Classification (the famous MNIST).

The assignment consists of two parts:

experiment 1, producing validation curves, learning curves and performances on the test set, for each of the
algorithms, on the Wisconsin Diagnostic Breast Cancer (WDBC) dataset.
experiment 2, producing validation curves, learning curves and performances on the test set, for each of the
algorithms, on the Handwritten Digits Image Classification (MNIST) dataset.

In order to run the experiments, run:

cd Supervised_Learning
python run_experiments.py

Figures will show up progressively. It takes a while to perform all the experiments and hyperparameter optimizations.
However, they have already been saved into the images directory. Theory, results and experiments are discussed in the
report (not provided here due to Georgia Tech’s Honor Code).

ASSIGNMENT2 - RANDOMIZED OPTIMIZATION 🔥🔥🔥

This assignment aims to explore some algorithms in Randomized Optimization, namely Random-Hill Climbing (RHC), Simulated
Annealing (SA), Genetic Algorithms (GA) and Mutual-Information Maximizing Input Clustering (MIMIC), while comparing
their performances on 3 interesting discrete optimisation problems: the Travel Salesman Problem, Flip Flop and 4-Peaks.
Moreover, RHC, SA and GA will later be compared to Gradient Descent and Backpropagation on a (nowadays) fundamental
optimization problem: training complex Neural Networks.

The assignment consists of four parts:

experiment 1, producing complexity and performances curves for the Travel Salesman problem.
experiment 2, producing complexity and performances curves for Flip Flop.
experiment 3, producing complexity and performances curves for 4-Peaks.
experiment 4, producing complexity and performances curves for Neural Networks training.

In order to run the experiments, run:

cd Randomized_Optimization
python run_experiments.py

Figures will show up progressively. It takes a while to perform all the experiments and parameters optimizations.
However, they have already been saved into the images directory. Theory, results and experiments are discussed in the
report (not provided here due to Georgia Tech’s Honor Code).

ASSIGNMENT3 - UNSUPERVISED LEARNING 🔥🔥🔥

This assignment aims to explore some algorithms in Unsupervised Learning, namely Principal Components Analysis (PCA),
Kernel PCA (KPCA), Independent Components Analysis (ICA), Random Projections (RP), k-Means and
Gaussian Mixture Models (GMM), while comparing their performances on 2 interesting dataset: the
Wisconsin Diagnostic Breast Cancer (WDBC) and the Handwritten Digits Image Classification (the famous MNIST).
Moreover, their contribution to Neural Networks in the supervised setting will be assessed.

The assignment consists of two parts:

experiment 1, producing curves for dimensionality reduction, clustering and neural networks with unsupervised techniques
on the Wisconsin Diagnostic Breast Cancer (WDBC) dataset.
experiment 2, producing curves for dimensionality reduction, clustering and neural networks with unsupervised techniques
on the Handwritten Digits Image Classification (MNIST) dataset.

In order to run the experiments, run:

cd Unsupervised_Learning
python run_experiments.py

ASSIGNMENT4 - MARKOV DECISION PROCESSES 🔥🔥🔥

This assignment aims to explore some algorithms in Reinforcement Learning, namely Value Iteration (VI),
Policy Iteration (PI) and Q-Learning, while comparing their performances on 2 interesting MDPs: the
Frozen Lake environment from OpenAI gym and the Gambler’s Problem from Sutton and Barto.

The assignment consists of two parts:

experiment 1, producing curves for VI, PI and Q-Learning on the Frozen Lake environment from OpenAI gym.
experiment 2, producing curves for VI, PI and Q-Learning on the Gambler’s Problem from Sutton and Barto.

In order to run the experiments, run:

cd Markov_Decision_Processes
python run_experiments.py

REFERENCES

[1] National Cancer Institute. https://www.cancer.gov. Last accessed: 2019-09-20.
[2] Dheeru Dua and Casey Graff. UCI machine learning repository, 2017.
[3] F. Pedregosa, G. Varoquaux, Gramfort, and al. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
[4] Joaquin Vanschoren, Jan N. van Rijn, Bernd Bischl, and Luis Torgo. Openml: Networked science in machine learning. SIGKDD Explorations, 15(2):49–60, 2013.
[5] Stuart Russell and Peter Norvig. Artificial Intelligence: A Modern Approach. Prentice Hall Press, Upper Saddle River, NJ, USA, 3rd edition, 2009.
[6] Thomas M. Mitchell. Machine Learning. McGraw-Hill, New York, NY, USA, 1997.
[7] Jeremy S. De Bonet, Charles L. Isbell, Jr., and Paul Viola. MIMIC: Finding optima by esti- mating probability densities. In Proceedings of the 9th International Conference on Neural Information Processing Systems, pages 424–430, Cambridge, MA, USA, 1996. MIT Press.
[8] G Hayes. mlrose: Machine Learning, Randomized Optimization and SEarch package for python. https://github.com/gkhayes/mlrose, 2019. Accessed: 10/09/2019.
[9] I K Fodor. A survey of dimension reduction techniques. Technical report, Lawrence Livermore National Lab., CA (US), 2002.