项目作者: pku-nlp-forfun

项目描述 :
Chinese word segmentation, Part-of-speech tagging and Medical named entity recognition From scratch.
高级语言: Jupyter Notebook
项目地址: git://github.com/pku-nlp-forfun/CWS_POS_NER.git
创建时间: 2019-05-18T04:31:49Z
项目社区:https://github.com/pku-nlp-forfun/CWS_POS_NER

开源协议:

下载


CWS/POS/NER

Chinese word segmentation, Part-of-speech tagging and Medical named entity recognition From scratch.

Our Final Paper 👉

Getting Started

Dependencies:

  • tensorflow
  1. # training, testing and evaluation
  2. python3 run.py

Generate files:

  • Evaluation.md - markdown table of evaluation result
  • Result/ - prediction result
  • FinalResult/ - Final prediction result

Structure

  1. ├── Data => data set given by TA
  2. ├── devset
  3. ├── testset1
  4. └── trainset
  5. ├── Evaluation => eval scripts given by TA
  6. |
  7. ├── CWS => CWS model
  8. ├── POS => POS tagging model
  9. ├── NER => NER model
  10. |
  11. ├── constant.py => some global constants and variables
  12. |
  13. ├── dataset.py => data preprocessing
  14. ├── model.py => high-level model API for all our model
  15. ├── evaluate.py => high-level evaluation API
  16. └── run.py => the entire process

Task Description

Data and scripts given by TA

Directory Structure

  • Data: (each has its _cws, _pos, _ner file)
    • devset
    • testset1
    • trainset
    • final
      • test2.txt - raw article
  • Evaluation
    • pos_evaluate.py
    • ner_evaluate.py

Resources

Article

Paper

Sequence Tagging

Chinese Word Segmentation

Tools’ reference

CRF

Model Structure

image