Language Model Pretraining
This repo is to see the effect of Language Model Pretraining on common NLP tasks. The idea is also to provide
a simple interface to use pretrained lanugage models in keras.
The general idea is pretty simple, pretrain LSTM layers on language modelling task then use the trained weights in downstream NLP task. It is just same as ULMFIT but can be used as a model in keras.
There are two steps to it, first pretrain the LSTM encoder and then simply use it Keras Model.
python pretrain.py --train_file TRAIN_FILE --valid_file VAL_FILE --tokenizer [TOKENIZER]
Params:
from keras import layers
import keras.backend as K
from keras.callbacks import Callback
# Model expects a sequence of tokens (words) as input
input_ = layers.Input(shape=(maxlen,), dtype=tf.string)
# pretrained_model_path -> Path where pretrained model is saved
pretrained_model = PretrainedLSTM(pretrained_model_path, input_, return_sequences=False)
encoder_output = pretrained_model.outputs[0]
final_output = layers.Dense(1, activation="sigmoid")(encoder_output)
model = Model(inputs=[input_], outputs=[final_output])
model.compile("adam", loss="binary_crossentropy" , metrics=['acc'])
# This is needed to initialize word to idx lookup layer
class TableInitializerCallback(Callback):
""" Initialize Tables """
def on_train_begin(self, logs=None):
K.get_session().run(tf.tables_initializer())
callbacks = [TableInitializerCallback()]
# Finally fit
model.fit(x_train, y_train, epochs=10, callbacks=callbacks)
Improvement: