seq2seq-google-1703.03906.pdf


立即下载 一生流水
2024-04-19
work expen architectures NMT Machine Neural NM py ing task.
298.8 KB

Massive Exploration of Neural Machine Translation
Architectures
Denny Britz∗†, Anna Goldie∗, Minh-Thang Luong, Quoc Le
{dennybritz,agoldie,thangluong,qvl}@google.com
Google Brain
Abstract
Neural Machine Translation (NMT) has
shown remarkable progress over the past
few years with production systems now
being deployed to end-users. One major
drawback of current architectures is that
they are expensive to train, typically re-
quiring days to weeks of GPU time to
converge. This makes exhaustive hyper-
parameter search, as is commonly done
with other neural network architectures,
prohibitively expensive. In this work,
we present the first large-scale analy-
sis of NMT architecture hyperparameters.
We report empirical results and variance
numbers for several hundred experimental
runs, corresponding to over 250,000 GPU
hours on the standard WMT English to
German translation task. Our experiments
lead to novel insights and practical advice
for building and extending NM


work/expen/architectures/NMT/Machine/Neural/NM/py/ing/task./ work/expen/architectures/NMT/Machine/Neural/NM/py/ing/task./
-1 条回复
登录 后才能参与评论
-->