Pre-trained AWD-LSTM language model trained on Filipino text corpus using fastai v2. Instructions included.
This is an accompanying repository to my paper:
Total Epochs | Dataset Size | Train Set | Val Set | Accuracy | Perplexity | Total Training Time | Dataset |
---|---|---|---|---|---|---|---|
20 | 160428 | 90% | 10% | 86.71% | 2.028250 | 26H | WikiText-TL-39 |
# Install gdown
pip install gdown
# Make directory
mkdir models
# Download data
gdown --id 19jdv8-XEbDNiqlm_lPb1csbVZYkn3gfA
# Unzip
unzip pretrained.zip -d models
# Finally
You should see two files inside 'models' directory:
1. finetuned_weights_20.pth (pre-trained weights)
2. vocab.pkl (vocab)
This will be used later in language model fine-tuning.
See accompanying jupyter notebook to see usage.
Big thanks to Blaise Cruz for answering my questions and for nudging me in the right direction.