BERT which stands for Bidirectional Encoder Representations from Transformations is the SOTA in Transfer Learning in NLP.