项目作者: toriving

项目描述 :
Named Entity Recognition Model for Naver NLP Challenge 2018 : BiLSTM-CRF model based Korean named entity tagger
高级语言: Python
项目地址: git://github.com/toriving/naver-nlp-challenge-2018.git
创建时间: 2019-01-02T12:24:43Z
项目社区:https://github.com/toriving/naver-nlp-challenge-2018

开源协议:

下载


Naver NLP Challenge 2018

Named Entity Recognition Model for Naver NLP Challenge 2018
Presentation file

NER model for Naver NLP Challenge 2018

Team : State_Of_The_Art (Dongju Park)
1st place on Naver NLP Challenge 2018 NER task

NER model architecture

model

  • The code is implemented based on baseline model
  • Bidirectional LSTM + CRF
  • Embedding layer consists of Word, character(LSTM), and named entity together
  • Shuffle training data on every epoch
  • Total data : 90000, training data : 80000, development data : 10000
  • Important to use RMSPropOptimizer as an optimizer and low value of learning rate
  • The model is an ensemble model that uses hard voting method for N different models

ensemble

Hyper-parameters

Hyper-parameter value
epoch 20
batch_size 128
learning_rate 0.001
keep_prob 0.65
word_embedding_size 128
char_embedding_size 128
tag_embedding_size 128
lstm_units 128
char_lstm_units 128
sentence_length 180
word_length 8
num_ensemble 3
  • Converge between 13 and 15 epochs
  • In this code, the default value of numensemble is set to 5, but I set it to _3 when submitting the model

Usage

```shell script
$ python main.py

  1. ```shell script
  2. $ python main.py \
  3. --mode <Choice operation mode> \
  4. --dinput_dir <Input data directory> \
  5. --output_dir <Output data directory> \
  6. --necessary_file <necessary_file> \
  7. --epochs <num_epoch> \
  8. --batch_size <batch_size> \
  9. --learning_rate <learning_rate> \
  10. --keep_prob <dropout_rate> \
  11. --word_embedding_size <Word, WordPos Embedding Size> \
  12. --char_embedding_size <Char Embedding Size> \
  13. --tag_embedding_size <Tag Embedding Size> \
  14. --lstm_units <Hidden unit size> \
  15. --char_lstm_units <Hidden unit size for Char rnn> \
  16. --sentence_length <Maximum words in sentence> \
  17. --word_length <Maximum chars in word> \
  18. --num_ensemble <Number of submodels>

Result

leaderboard

1st place on NER task

Naver NLP Challenge
Changwon University Adaptive Intelligence Research Lab.
NER Leaderboard