项目作者: zhongyuchen

项目描述 :
2019 Language and Intelligence Challenge: Information Extraction
高级语言: Python
项目地址: git://github.com/zhongyuchen/information-extraction.git
创建时间: 2019-06-30T04:16:12Z
项目社区:https://github.com/zhongyuchen/information-extraction

开源协议:Apache License 2.0

下载


information-extraction

2019 Language and Intelligence Challenge: Information Extraction

Prerequisites

  • Install required packages by:
    1. pip install -r requirements.txt

Data

Download data: initialize and update the information-extraction-data git submodule by git submodule init and git submodule update, and then unzip the data files

  • sample schema:
    1. {"object_type": "地点", "predicate": "祖籍", "subject_type": "人物"}
  • sample data, with postag and text as input and spo_list as output:
    1. {
    2. "postag": [
    3. {"word": "一直", "pos": "d"},
    4. {"word": "陪", "pos": "v"},
    5. {"word": "我", "pos": "r"},
    6. {"word": "到", "pos": "p"},
    7. {"word": "现在", "pos": "t"},
    8. {"word": "是", "pos": "v"},
    9. {"word": "歌手", "pos": "n"},
    10. {"word": "马健涛", "pos": "nr"},
    11. {"word": "原创", "pos": "v"},
    12. {"word": "的", "pos": "u"},
    13. {"word": "歌曲", "pos": "n"}
    14. ],
    15. "text": "一直陪我到现在是歌手马健涛原创的歌曲",
    16. "spo_list": [
    17. {"predicate": "歌手", "object_type": "人物", "subject_type": "歌曲", "object": "马健涛", "subject": "一直陪我到现在"}
    18. ]
    19. }

Baseline

Idea

  • Train multi-label classification model: predict predicate.
  • Train sequence labeling model: input text and predicate, output text labeling.
  • Extract SPO from sequence labeling result.

Implementation

Check report/PRML-final-project-doc-2019.pdf for details.

Multi-label Classification

  • CNN, BiRNN, BiLSTM, BiLSTM with max pooling and RCNN
  • BERT

Sequence Labeling

  • Encoder: BiLSTM and Transformer
  • Decoder: CRF

Result

Multi-label Classification

classification

Sequence Labeling

labeling

fitlog usage

  • Initialize fitlog in classification folder:
    1. cd classification/
    2. fitlog init
    3. fitlog log logs
  • Initialize fitlog in labeling folder:
    1. cd labeling/
    2. fitlog init
    3. fitlog log logs

Author

Zhongyu Chen