BERT: Pre-training of Deep Bidirectional Transformers for
Language Understanding
Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova
Google AI Language
{jacobdevlin,mingweichang,kentonl,kristout}@google.com
Abstract
We introduce a new language representa-
tion model called BERT, which stands for
Bidirectional Encoder Representations from
Transformers. Unlike recent language repre-
sentation models (Peters et al., 2018; Radford
et al., 2018), BERT is designed to pre-train
deep bidirectional representations by jointly
conditioning on both left and right context in
all layers. As a result, the pre-trained BERT
representations can be fine-tuned with just one
additional output layer to create state-of-the-
art models for a wide range of tasks, such
as question answering and language inference,
without substantial task-specific architecture
modifications.
BERT is conceptually simple and empirically
powerful. It obtains new state-of-the-art re-
sults on eleven nat
language/models/al./2018/BERT/state/of-the-art/answering/NLP/nat/
language/models/al./2018/BERT/state/of-the-art/answering/NLP/nat/
-->