Filipino-ULMFiT

This is an accompanying repository to my paper:

Pagsusuri ng RNN-based transfer learning technique sa low-resource language

instructions to download the pre-trained language model.
jupyter notebook to show you how to use the pre-trained model on a text classification task using fastai v2. [notebook]

Contributions

Release a pre-trained AWD LSTM language model in Filipino using fastai v2.
Benchmark AWD LSTM to the Hate Speech Dataset. [reference]

Requirements

fastai v2 and up
NVIDIA GPU (all experiments were done on Colab w/ Tesla T4)

Language Model

Total Epochs	Dataset Size	Train Set	Val Set	Accuracy	Perplexity	Total Training Time	Dataset
20	160428	90%	10%	86.71%	2.028250	26H	WikiText-TL-39

Download pre-trained language model

# Install gdown
pip install gdown
# Make directory
mkdir models
# Download data
gdown --id 19jdv8-XEbDNiqlm_lPb1csbVZYkn3gfA
# Unzip
unzip pretrained.zip -d models
# Finally
You should see two files inside 'models' directory: 
1. finetuned_weights_20.pth (pre-trained weights)
2. vocab.pkl (vocab) 
This will be used later in language model fine-tuning. 
See accompanying jupyter notebook to see usage.

Acknowledgements

Big thanks to Blaise Cruz for answering my questions and for nudging me in the right direction.

Filipino-ULMFiT

Pagsusuri ng RNN-based transfer learning technique sa low-resource language

Contents

Contributions

Requirements

Language Model

Download pre-trained language model

Acknowledgements