This repository contains scripts for implementing various learning from expert architectures, such as mixture of experts and product of experts, and performing various experiments with these architectures.
This repository contains implementations of different Mixture of Experts (MoE) models both existing and novel architectures and methods of training. MoE is a gated modular deep neural network architecture, shown in Figure 1. It consists of simple individual neural network modules called experts and another simple neural network called gate. The gate allocates samples to the experts during training and selects the expert specifialized for a sample during inference. The output of the MoE is some combination of the outputs of the individual experts. The experts and gate are usually trained end-to-end. Since ideally experts specialise in samples, during inference, for each sample only a few experts need to be evaluated and updated. This is called conditional computation.
Figure 1 Original MoE architecture with 3 experts and 1 gate. The output of the model is the expected sum of the outputs of the individual experts
The goal of the various MoE models is to search for good and clean task decompositions among the experts. Good task decompositions enable interpertability and transferability in gated modular neural networks.
Python 3.9
Pytorch 1.10.1, optionally with Cuda 11.2
requirements.txt
In order to install the code locally please follow the steps below:
Clone this repository and go to the cloned directory.
Set the environment variable to point to your python executable:
export PYTHON=<path to python 3.9 executable>
Run the following command to set up the environment:
make
on Linux/Mac
Activate the environment by running:
source mnn/bin/activate
on Linux/Mac
Run the following script to start jupyter:
./bin/run_notebooks.sh
In the jupyter lab go to the notebooks folder which contains all the relevant notebooks
Start with the toy_classification.ipynb.
Select the mnn kernel.
You should now be able to run the notebooks.
email: yamuna dot krishnamurthy at rhul.ac.uk
Yamuna Krishnamurthy and Chris Watkins, Interpretability in Gated Modular Neural Networks, in Explainable AI approaches for debugging and diagnosis Workshop at Neural Information Processing (NeurIPS), Dec 2021.