Sequence-to-Sequence learning using PyTorch
This is a complete suite for training sequence-to-sequence models in PyTorch. It consists of several models and code to both train and infer using them.
Using this code you can train:
And more…
git clone --recursive https://github.com/eladhoffer/seq2seq.pytorch
cd seq2seq.pytorch; python setup.py develop
Models currently available:
Datasets currently available:
All datasets can be tokenized using 3 available segmentation methods:
After choosing a tokenization method, a vocabulary will be generated and saved for future inference.
The models can be trained using several methods:
Example training scripts are available in scripts
folder. Inference examples are available in examples
folder.
WARMUP=”4000”
LR0=”512**(-0.5)”
python main.py \
—save transformer \
—dataset ${DATASET} \
—dataset-dir ${DATASET_DIR} \
—results-dir ${OUTPUT_DIR} \
—model Transformer \
—model-config “{‘num_layers’: 6, ‘hidden_size’: 512, ‘num_heads’: 8, ‘inner_linear’: 2048}” \
—data-config “{‘moses_pretok’: True, ‘tokenization’:’bpe’, ‘num_symbols’:32000, ‘shared_vocab’:True}” \
—b 128 \
—max-length 100 \
—device-ids 0 \
—label-smoothing 0.1 \
—trainer Seq2SeqTrainer \
—optimization-config “[{‘step_lambda’:
\”lambda t: { \
‘optimizer’: ‘Adam’, \
‘lr’: ${LR0} min(t ** -0.5, t ${WARMUP} ** -1.5), \
‘betas’: (0.9, 0.98), ‘eps’:1e-9}\”
}]”
* example for training attentional LSTM based model with 3 layers in both encoder and decoder:
python main.py \
—save de_en_wmt17 \
—dataset ${DATASET} \
—dataset-dir ${DATASET_DIR} \
—results-dir ${OUTPUT_DIR} \
—model RecurrentAttentionSeq2Seq \
—model-config “{‘hidden_size’: 512, ‘dropout’: 0.2, \
‘tie_embedding’: True, ‘transfer_hidden’: False, \
‘encoder’: {‘num_layers’: 3, ‘bidirectional’: True, ‘num_bidirectional’: 1, ‘context_transform’: 512}, \
‘decoder’: {‘num_layers’: 3, ‘concat_attention’: True,\
‘attention’: {‘mode’: ‘dot_prod’, ‘dropout’: 0, ‘output_transform’: True, ‘output_nonlinearity’: ‘relu’}}}” \
—data-config “{‘moses_pretok’: True, ‘tokenization’:’bpe’, ‘num_symbols’:32000, ‘shared_vocab’:True}” \
—b 128 \
—max-length 80 \
—device-ids 0 \
—trainer Seq2SeqTrainer \
—optimization-config “[{‘epoch’: 0, ‘optimizer’: ‘Adam’, ‘lr’: 1e-3},
{‘epoch’: 6, ‘lr’: 5e-4},
{‘epoch’: 8, ‘lr’:1e-4},
{‘epoch’: 10, ‘lr’: 5e-5},
{‘epoch’: 12, ‘lr’: 1e-5}]” \
```