Gated Modular Deep Learning

This repository contains implementations of different Mixture of Experts (MoE) models both existing and novel architectures and methods of training. MoE is a gated modular deep neural network architecture, shown in Figure 1. It consists of simple individual neural network modules called experts and another simple neural network called gate. The gate allocates samples to the experts during training and selects the expert specifialized for a sample during inference. The output of the MoE is some combination of the outputs of the individual experts. The experts and gate are usually trained end-to-end. Since ideally experts specialise in samples, during inference, for each sample only a few experts need to be evaluated and updated. This is called conditional computation.

Figure 1 Original MoE architecture with 3 experts and 1 gate. The output of the model is the expected sum of the outputs of the individual experts

The goal of the various MoE models is to search for good and clean task decompositions among the experts. Good task decompositions enable interpertability and transferability in gated modular neural networks.

Requirements

Python 3.9
Pytorch 1.10.1, optionally with Cuda 11.2
Linux Operating System. It has been tested on Ubuntu and MacOS.
Additional modules listed in requirements.txt

Installation

In order to install the code locally please follow the steps below:

Clone this repository and go to the cloned directory.
Set the environment variable to point to your python executable:

export PYTHON=<path to python 3.9 executable>
Run the following command to set up the environment:

make on Linux/Mac
Activate the environment by running:

source mnn/bin/activate on Linux/Mac

Running Jupyter Notebook

Run the following script to start jupyter:

./bin/run_notebooks.sh
In the jupyter lab go to the notebooks folder which contains all the relevant notebooks
Start with the toy_classification.ipynb.
Select the mnn kernel.
You should now be able to run the notebooks.

Contact

email: yamuna dot krishnamurthy at rhul.ac.uk

Publication

Yamuna Krishnamurthy and Chris Watkins, Interpretability in Gated Modular Neural Networks, in Explainable AI approaches for debugging and diagnosis Workshop at Neural Information Processing (NeurIPS), Dec 2021.