Implementation of various attention-based models in PyTorch
This repo contains implementation of various forms of attention:
and finally
Each of these sequence to sequence models is trained to learn how to sort a shuffled array of numbers from 1
to N
. The code to generate this data is here.
There is a considerable improvement if an attention based model is used versus the no attention model.
All the models and the data loader are defined in code/
.
Each model is defined in a separate file. The file containing a model also contains train
and test
functions which are self-explanatory.
Output logs are stored under training_outputs/
Attention weights can be visualized using the code in the notebook Visualizing attention.