项目作者: danjohnvelasco

项目描述 :
Pre-trained AWD-LSTM language model trained on Filipino text corpus using fastai v2. Instructions included.
高级语言: Jupyter Notebook
项目地址: git://github.com/danjohnvelasco/Filipino-ULMFiT.git
创建时间: 2020-09-26T03:54:11Z
项目社区:https://github.com/danjohnvelasco/Filipino-ULMFiT

开源协议:

下载


Filipino-ULMFiT

This is an accompanying repository to my paper:

Pagsusuri ng RNN-based transfer learning technique sa low-resource language

Contents

  • instructions to download the pre-trained language model.
  • jupyter notebook to show you how to use the pre-trained model on a text classification task using fastai v2. [notebook]

Contributions

  • Release a pre-trained AWD LSTM language model in Filipino using fastai v2.
  • Benchmark AWD LSTM to the Hate Speech Dataset. [reference]

Requirements

  • fastai v2 and up
  • NVIDIA GPU (all experiments were done on Colab w/ Tesla T4)

Language Model

Total Epochs Dataset Size Train Set Val Set Accuracy Perplexity Total Training Time Dataset
20 160428 90% 10% 86.71% 2.028250 26H WikiText-TL-39

Download pre-trained language model

  1. # Install gdown
  2. pip install gdown
  3. # Make directory
  4. mkdir models
  5. # Download data
  6. gdown --id 19jdv8-XEbDNiqlm_lPb1csbVZYkn3gfA
  7. # Unzip
  8. unzip pretrained.zip -d models
  9. # Finally
  10. You should see two files inside 'models' directory:
  11. 1. finetuned_weights_20.pth (pre-trained weights)
  12. 2. vocab.pkl (vocab)
  13. This will be used later in language model fine-tuning.
  14. See accompanying jupyter notebook to see usage.

Acknowledgements

Big thanks to Blaise Cruz for answering my questions and for nudging me in the right direction.