Sequence-to-Sequence in Tensorflow
Sequence-to-Sequence (Seq2Seq) is a general end-to-end framework which maps sequences in source domain to sequences in target domain. Seq2Seq model first reads the source sequence using an encoder to build vector-based ‘understanding’ representations, then passes them through a decoder to generate a target sequence, so it’s also referred to as the encoder-decoder architecture. Many NLP tasks have benefited from Seq2Seq framework, including machine translation, text summarization and question answering. Seq2Seq models vary in term of their exact architecture, multi-layer bi-directional RNN (e.g. LSTM, GRU, etc.) for encoder and multi-layer uni-directional RNN with autoregressive decoding (e.g. greedy, beam search, etc.) for decoder are natural choices for vanilla Seq2Seq model. Attention mechanism is later introduced to allow decoder to pay ‘attention’ to relevant encoder outputs directly, which brings significant improvement on top of already successful vanilla Seq2Seq model. Furthermore, ‘Transformer’, a novel architecture based on self-attention mechanism is proposed and has outperformed both recurrent and convolutional models in various tasks, although out-of-scope for this repo, I’d like to refer interested readers to this post for more details
Figure 1: Encoder-Decoder architecture of Seq2Seq model
# run experiment in train mode
python seq2seq_run.py --mode train --config config/config_seq2seq_template.xxx.json
# run experiment in eval mode
python seq2seq_run.py --mode eval --config config/config_seq2seq_template.xxx.json
# encode source as CoVe vector
python seq2seq_run.py --mode encode --config config/config_seq2seq_template.xxx.json
# random search hyper-parameters
python hparam_search.py --base-config config/config_seq2seq_template.xxx.json --search-config config/config_search_template.xxx.json --num-group 10 --random-seed 100 --output-dir config/search
# visualize summary via tensorboard
tensorboard --logdir=output
Figure 1: Vanilla Seq2Seq architecture
IWSLT’15 EN-VI | Perplexity | BLEU Score |
---|---|---|
Dev | 25.09 | 9.47 |
Test | 25.87 | 9.35 |
Table 1: The performance of vanilla Seq2Seq model on IWSLT’15 English - Vietnamese task with setting: (1) for encoder, model type = Bi-LSTM, num layers = 1, unit dim = 512; (2) for decoder, model type = LSTM, num layers = 2, unit dim = 512, beam size = 10; (3) pre-trained embedding = false, max len = 300
IWSLT’15 VI-EN | Perplexity | BLEU Score |
---|---|---|
Dev | 29.52 | 8.49 |
Test | 33.16 | 7.88 |
Table 2: The performance of vanilla Seq2Seq model on IWSLT’15 Vietnamese - English task with setting: (1) for encoder, model type = Bi-LSTM, num layers = 1, unit dim = 512; (2) for decoder, model type = LSTM, num layers = 2, unit dim = 512, beam size = 10; (3) pre-trained embedding = false, max len = 300
Figure 2: Attention-based Seq2Seq architecture
IWSLT’15 EN-VI | Perplexity | BLEU Score |
---|---|---|
Dev | 12.56 | 22.41 |
Test | 10.79 | 25.23 |
Table 3: The performance of attention-based Seq2Seq model on IWSLT’15 English - Vietnamese task with setting: (1) for encoder, model type = Bi-LSTM, num layers = 1, unit dim = 512; (2) for decoder, model type = LSTM, num layers = 2, unit dim = 512, beam size = 10; (3) pre-trained embedding = false, max len = 300, att type = scaled multiplicative
IWSLT’15 VI-EN | Perplexity | BLEU Score |
---|---|---|
Dev | 11.83 | 19.37 |
Test | 10.42 | 21.40 |
Table 4: The performance of attention-based Seq2Seq model on IWSLT’15 Vietnamese - English task with setting: (1) for encoder, model type = Bi-LSTM, num layers = 1, unit dim = 512; (2) for decoder, model type = LSTM, num layers = 2, unit dim = 512, beam size = 10; (3) pre-trained embedding = false, max len = 300, att type = scaled multiplicative