项目作者: lioutasb

项目描述 :
Official PyTorch implementation of Time-aware Large Kernel (TaLK) Convolutions (ICML 2020)
高级语言: Python
项目地址: git://github.com/lioutasb/TaLKConvolutions.git
创建时间: 2020-04-06T15:27:02Z
项目社区:https://github.com/lioutasb/TaLKConvolutions

开源协议:MIT License

下载


Time-aware Large Kernel (TaLK) Convolutions (Lioutas et al., 2020)

This repository contains the source code, pre-trained models, as well as instructions to reproduce results for our paper Time-aware Large Kernel Convolutions (ICML 2020).

TaLK Convolutions is a sequence modeling method that uses an adaptive convolution operation that learns to predict the size of a summation kernel instead of using a fixed-sized learnable kernel matrix. It utilizes a fast parallelized implementation of the summed-area table, also known as the integral image operation, to efficiently calculate the convolution output that uses the summation kernel. We generate relative offsets for each timestep of the input sequence, which are used to adaptively expand the size of the summation kernel conditioned on the input. This method yields a time complexity of O(n), effectively making the sequence encoding process linear to the number of tokens.

Video Presentation:

Time-aware Large Kernel Convolutions (ICML 2020)

Citation:

  1. @inproceedings{lioutas2020timeaware,
  2. author={Vasileios Lioutas and Yuhong Guo},
  3. title={Time-aware Large Kernel Convolutions},
  4. booktitle={Proceedings of the 37th International Conference on Machine Learning (ICML)},
  5. year={2020}
  6. }

Setup

Requirements

  • PyTorch version >= 1.3.1
  • fairseq version >= 0.10.1
  • Python version >= 3.6
  • CUDA >= 10.1
  • NVIDIA’s apex library (for mixed-precision training)

Clone this repository

  1. git clone https://github.com/lioutasb/TaLKConvolutions.git
  2. cd TaLKConvolutions

Efficient CUDA Kernels

In order to support the parallelization of TaLK Convolutions, we have developed our own CUDA primitives. To install the kernels, use the commands below. We tested compiling the kernels using CUDA 10.1 but if a future CUDA release does not work, please feel free to open an issue.

  1. cd talkconv/talkconv_module/
  2. python setup.py install

We are welcoming contributions from experienced CUDA developers regarding making the CUDA kernels more efficient.

Translation

Pre-trained models

Dataset Model Prepared test set
IWSLT14 German-English download (.pt) IWSLT14 test: download (.zip)
WMT16 English-German download (.pt) newstest2014: download (.zip)
WMT14 English-French download (.pt) newstest2014: download (.zip)

Preprocessing the training datasets

Please follow the instructions https://github.com/pytorch/fairseq/blob/master/examples/translation/README.md to preprocess the data.

IWSLT14 De-En

Training and evaluating TaLK Convolutions on a single GPU:

  1. # Training
  2. SAVE="checkpoints/talkconv_iwslt_deen"
  3. mkdir -p $SAVE
  4. CUDA_VISIBLE_DEVICES=0 \
  5. fairseq-train data-bin/iwslt14.tokenized.de-en \
  6. --user-dir talkconv/talkconv_fairseq \
  7. --arch talkconv_iwslt_de_en \
  8. --optimizer adam --fp16 --lr 0.0005 \
  9. --source-lang de --target-lang en --max-tokens 4000 \
  10. --min-lr '1e-09' --weight-decay 0.0001 \
  11. --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
  12. --lr-scheduler inverse_sqrt \
  13. --dropout 0.3 --attention-dropout 0.1 --weight-dropout 0.1 \
  14. --max-update 85000 --warmup-updates 4000 --warmup-init-lr '1e-07' \
  15. --adam-betas '(0.9, 0.98)' --left-pad-source "False" --max-epoch 52 --seed 1024 \
  16. --save-dir $SAVE
  17. python utils/average_checkpoints.py --inputs $SAVE \
  18. --num-epoch-checkpoints 10 --output "${SAVE}/model.pt"
  19. # Evaluation
  20. fairseq-generate data-bin/iwslt14.tokenized.de-en --user-dir talkconv/talkconv_fairseq \
  21. --path "${SAVE}/model.pt" \
  22. --batch-size 128 --beam 5 --remove-bpe --lenpen 1.6 --gen-subset test --quiet

WMT16 En-De

Training and evaluating TaLK Convolutions on WMT16 En-De using cosine scheduler on one machine with 8 NVIDIA GPUs:

  1. # Training
  2. SAVE="checkpoints/talkconv_wmt_ende_big"
  3. mkdir -p $SAVE
  4. python -m torch.distributed.launch --nproc_per_node 8 fairseq-train \
  5. data-bin/wmt16_en_de_bpe32k --fp16 --log-interval 100 --no-progress-bar --distributed-no-spawn \
  6. --user-dir talkconv/talkconv_fairseq \
  7. --max-update 30243 --share-all-embeddings --optimizer adam \
  8. --adam-betas '(0.9, 0.98)' --clip-norm 0.0 --weight-decay 0.0 \
  9. --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
  10. --min-lr 1e-09 --update-freq 16 \
  11. --ddp-backend=no_c10d --max-tokens 3584 \
  12. --lr-scheduler cosine --warmup-init-lr 1e-7 --warmup-updates 10000 \
  13. --lr-shrink 1 --max-lr 0.001 --lr 1e-7 --min-lr 1e-9 --warmup-init-lr 1e-07 \
  14. --t-mult 1 --lr-period-updates 20000 \
  15. --arch talkconv_wmt_en_de_big \
  16. --save-dir $SAVE
  17. # Checkpoint averaging
  18. python utilss/average_checkpoints.py --inputs $SAVE \
  19. --num-epoch-checkpoints 10 --output "${SAVE}/model.pt"
  20. # Evaluation on newstest2014
  21. CUDA_VISIBLE_DEVICES=0 \
  22. fairseq-generate data-bin/wmt16_en_de_bpe32k --user-dir talkconv/talkconv_fairseq \
  23. --path "${SAVE}/model.pt" \
  24. --batch-size 128 --beam 4 --remove-bpe --lenpen 0.35 --gen-subset test > wmt14_gen_ende.txt
  25. bash utils/compound_split_bleu.sh wmt14_gen_ende.txt

WMT14 En-Fr

Training and evaluating TaLK Convolutions on WMT14 En-Fr using cosine scheduler on one machine with 8 NVIDIA GPUs:

  1. # Training
  2. SAVE="checkpoints/talkconv_wmt_enfr_big"
  3. mkdir -p $SAVE
  4. python -m torch.distributed.launch --nproc_per_node 8 fairseq-train \
  5. data-bin/wmt14_en_fr --fp16 --log-interval 100 --no-progress-bar --distributed-no-spawn \
  6. --user-dir talkconv/talkconv_fairseq \
  7. --max-update 80000 --share-all-embeddings --optimizer adam \
  8. --adam-betas '(0.9, 0.98)' --clip-norm 0.0 --weight-decay 0.0 \
  9. --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
  10. --min-lr 1e-09 --update-freq 32 \
  11. --ddp-backend=no_c10d --max-tokens 1800 \
  12. --lr-scheduler cosine --warmup-init-lr 1e-7 --warmup-updates 10000 \
  13. --lr-shrink 1 --max-lr 0.001 --lr 1e-7 --min-lr 1e-9 --warmup-init-lr 1e-07 \
  14. --t-mult 1 --lr-period-updates 70000 \
  15. --arch talkconv_wmt_en_fr_big \
  16. --save-dir $SAVE
  17. # Checkpoint averaging
  18. python utils/average_checkpoints.py --inputs $SAVE \
  19. --num-epoch-checkpoints 10 --output "${SAVE}/model.pt"
  20. # Evaluation
  21. CUDA_VISIBLE_DEVICES=0 \
  22. fairseq-generate data-bin/wmt14_en_fr --user-dir talkconv/talkconv_fairseq \
  23. --path "${SAVE}/model.pt" \
  24. --batch-size 128 --beam 6 --remove-bpe --lenpen 0.65 --gen-subset test --quiet

License

This project is MIT-licensed. The license applies to the pre-trained models as well.