项目作者: rknaebel

项目描述 :
End-to-end shallow discourse parser
高级语言: Python
项目地址: git://github.com/rknaebel/discopy.git
创建时间: 2019-04-10T14:59:11Z
项目社区:https://github.com/rknaebel/discopy

开源协议:MIT License

下载


Shallow Discourse Parser

This project aims to provide an implementation of the standard Lin et al. architecture as well as recent advances in neural architectures.
It consists of a parser pipeline architecture which stacks individual parser components to continuously add discourse information.
The focus is currently on explicit relations that were handled first in most pipelines.
Further, remaining sentence pairs without explicit sense relation are processed with the non-explicit component.
The current implementation is following the Conll2016 implementation guidelines.
It accepts PDTB2 CoNLL format as input for training and evaluation and mainly produces a line-based json document format.

The parser is presented at the CODI 2021 Workshop. For more information, checkout the paper
discopy: A Neural System for Shallow Discourse Parsing.

Setup

You can easily install discopy by using pip:
```shell script
pip install git+https://github.com/rknaebel/discopy

  1. or you just clone the repository.
  2. Then you can install discopy through pip
  3. ```shell script
  4. pip install -e path/to/discopy

Usage

Discopy currently supports different modes and distinguishes standard feature-based models and neural-based (transformer) models.
These example commands are executed from within the repository folder.

Evaluation

```shell script
discopy-eval path/to/conll-gold path/to/prediction

  1. ### Standard Architecture
  2. #### Training
  3. ```shell script
  4. discopy-train lin path/to/model path/to/conll

Training data format is json, the folder contains subfolders en.{train,dev,test} with files relations.json and parses.json.

Prediction

```shell script
discopy-predict lin path/to/conll/en.part path/to/model/lin

  1. ```shell script
  2. discopy-parse lin path/to/model/lin -i path/to/some/documents.json

```shell script
discopy-tokenize -i path/to/textfile | discopy-add-parses -c | discopy-parse lin models/lin

  1. ### Neural Architecture
  2. Neural components are a little bit more complex and ofter require/allow for more hyper-parameters while designing the
  3. component and throughout the training process.
  4. The training cli gives only a single component-parameter choice.
  5. For individual adaptions, one has to write its own training script.
  6. The `bert-model` parameter corresponds to the huggingface transformers model names.
  7. #### Training
  8. ```shell script
  9. discopy-nn-train [BERT-MODEL] [MODEL-PATH] [CONLL-PATH]

Training data format follows the one above.

Prediction

```shell script
discopy-nn-predict [BERT-MODEL] [MODEL-PATH] [CONLL-PATH]

  1. ```shell script
  2. discopy-nn-parse [BERT-MODEL] [MODEL-PATH] -i [JSON-INPUT]

```shell script
cat path/to/textfile | discopy-nn-parse [BERT-MODEL] [MODEL-PATH]

  1. ```shell script
  2. discopy-tokenize --tokenize-only -i path/to/textfile | discopy-nn-parse bert-base-cased models/pipeline-bert-2