项目作者: v-mipeng

项目描述 :
Lexicon-based Named Entity Recognition
高级语言: Python
项目地址: git://github.com/v-mipeng/LexiconNER.git
创建时间: 2019-05-23T14:09:25Z
项目社区:https://github.com/v-mipeng/LexiconNER

开源协议:Apache License 2.0

下载


LexiconNER

This is the implementation of “Distantly Supervised Named Entity Recognition using Positive-Unlabeled
Learning
“ published at ACL 2019. The highlight of this work is it performs NER using only entity dictionaries without any labeled data.

By the way, we recently publish our another work related to Chinese NER. It designs to augment Chinese NER with lexicons. The highlight of this work is that it has high computational efficiency and at the same time, achieves comparative or better performance over existing methods. You can access the source code of that work and a hyper-link of its associated paper at LexiconAugmentedNER.

Set up and run

Download glove.6B.100d.txt

Environment

pytorch 1.1.0
python 3.6.4
cuda 8.0

Instructions for running code

Phrase one \

Train
Print parameters
run python feature_pu_model.py --h

  1. optional arguments:
  2. -h, --help show this help message and exit
  3. --lr LR learning rate
  4. --beta BETA beta of pu learning (default 0.0)
  5. --gamma GAMMA gamma of pu learning (default 1.0)
  6. --drop_out DROP_OUT dropout rate
  7. --m M class balance rate
  8. --flag FLAG entity type (PER/LOC/ORG/MISC)
  9. --dataset DATASET name of the dataset
  10. --batch_size BATCH_SIZE
  11. batch size for training and testing
  12. --print_time PRINT_TIME
  13. epochs for printing result
  14. --pert PERT percentage of data use for training
  15. --type TYPE pu learning type (bnpu/bpu/upu)

e.g.)
Train on PER type of conll2003 dataset:
python feature_pu_model.py --dataset conll2003 --type PER
Evaluating

  1. python feature_pu_model_evl.py --model saved_model/bnpu_conll2003_PER_lr_0.0001_prior_0.3_beta_0.0_gamma_1.0_percent_1.0 --flag PER --dataset conll2003 --output 1

replace the model name from the training

  1. python final_evl.py

Get the final result on all the entity type. Remember to revise the filenames to be the output file name of evaluating.

Phrase two \

dictionary generation
run python ada_dict_generation.py -h

  1. optional arguments:
  2. -h, --help show this help message and exit
  3. --beta BETA learning rate
  4. --gamma GAMMA gamma of pu learning (default 1.0)
  5. --drop_out DROP_OUT dropout rate
  6. --m M class balance rate
  7. --flag FLAG entity type (PER/LOC/ORG/MISC)
  8. --dataset DATASET name of the dataset
  9. --lr LR learning rate
  10. --batch_size BATCH_SIZE
  11. batch size for training and testing
  12. --iter ITER iteration time
  13. --unlabeled UNLABELED
  14. use unlabeled data or not
  15. --pert PERT percentage of data use for training
  16. --model MODEL saved model name

e.g.)
python ada_dict_generation.py --model saved_model/bnpu_conll2003_PER_lr_0.0001_prior_0.3_beta_0.0_gamma_1.0_percent_1.0 --flag PER --iter 1
adaptive training
run python adaptivepumodel.py -h

  1. optional arguments:
  2. -h, --help show this help message and exit
  3. --beta BETA beta of pu learning (default 0.0)
  4. --gamma GAMMA gamma of pu learning (default 1.0)
  5. --drop_out DROP_OUT dropout rate
  6. --m M class balance rate
  7. --p P estimate value of prior
  8. --flag FLAG entity type (PER/LOC/ORG/MISC)
  9. --dataset DATASET name of the dataset
  10. --lr LR learning rate
  11. --batch_size BATCH_SIZE
  12. batch size for training and testing
  13. --output OUTPUT write the test result, set 1 for writing result to
  14. file
  15. --model MODEL saved model name
  16. --iter ITER iteration time

e.g.)
python adaptive\_pu\_model.py --model saved\_model/bnpu\_conll2003\_PER\_lr\_0.0001\_prior\_0.3\_beta\_0.0\_gamma\_1.0\_percent\_1.0 --flag PER --iter 1
Replace saved model names and iteration times when doing adaptive learning. And in the same iteration the iter number in dictionary generation and adaptive learning should be same.

Cite

Please cite our ACL 2019 paper:

  1. @article{peng2019distantly,
  2. title={Distantly Supervised Named Entity Recognition using Positive-Unlabeled Learning},
  3. author={Peng, Minlong and Xing, Xiaoyu and Zhang, Qi and Fu, Jinlan and Huang, Xuanjing},
  4. journal={Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL)},
  5. year={2019}
  6. }