项目作者: avramandrei

项目描述 :
End-to-end integration of HuggingFace's models for sequence labeling.
高级语言: Python
项目地址: git://github.com/avramandrei/BERT-Sequence-Labeling.git
创建时间: 2020-03-18T12:29:27Z
项目社区:https://github.com/avramandrei/BERT-Sequence-Labeling

开源协议:MIT License

下载


BERT-Sequence-Labeling

This repostiory integrates HuggingFaces‘s models in an end-to-end pipeline for sequence labeling. Here
is a complete list of the available models.

If you found this repository helpful, please give it a star.:blush:

Install

  1. git clone https://github.com/avramandrei/BERT-Sequence-Labeling.git
  2. cd BERT-Sequence-Labeling
  3. pip3 install -r requirements.txt

Input Format

The files used for training, validation and testing must be in a format similar to the CoNLL:

  1. # sent_id = email-enronsent20_01-0048
  2. # text = Please let us know if you have additional questions.
  3. 1 Please please INTJ UH _ 2 discourse 2:discourse _
  4. 2 let let VERB VB Mood=Imp|VerbForm=Fin 0 root 0:root _
  5. 3 us we PRON PRP Case=Acc|Number=Plur|Person=1|PronType=Prs 2 obj 2:obj|4:nsubj:xsubj _
  6. 4 know know VERB VB VerbForm=Inf 2 xcomp 2:xcomp _
  7. 5 if if SCONJ IN _ 7 mark 7:mark _
  8. 6 you you PRON PRP Case=Nom|Person=2|PronType=Prs 7 nsubj 7:nsubj _
  9. 7 have have VERB VBP Mood=Ind|Tense=Pres|VerbForm=Fin 4 advcl 4:advcl:if _
  10. 8 additional additional ADJ JJ Degree=Pos 9 amod 9:amod _
  11. 9 questions question NOUN NNS Number=Plur 7 obj 7:obj SpaceAfter=No
  12. 10 . . PUNCT . _ 2 punct 2:punct _

Training

To train a model, use the train.py script. This will start training a model that will predict the labels of the column specified by the [predict_column] argument.

  1. python3 train.py [path_train_file] [path_dev_file] [tokens_column] [predict_column] [lang_model_name]

Inference

To predict new values, use the predict.py script. This will create a new file by replacing the predicted column of the test file with the predicted values.

  1. python3 predict.py [path_test_file] [model_path] [tokens_column] [predict_column] [lang_model_name]

Results

English EWT

model upos xpos
bert-base-cased 95.92 95.27
roberta-base 95.77 95.18

Cite

Please consider citing the following paper as a thank you to the authors:
```
@article{avram2020upb,
title={UPB at SemEval-2020 Task 6: Pretrained Language Models for Definition Extraction},
author={Avram, Andrei-Marius and Cercel, Dumitru-Clementin and Chiru, Costin-Gabriel},
journal={arXiv e-prints},
pages={arXiv—2009},
year={2020}
}