项目作者: SUDA-LA

项目描述 :
The implementation of SemEval'19 paper: "HLT@SUDA at SemEval 2019 Task 1: UCCA Graph Parsing as Constituent Tree Parsing"
高级语言: Python
项目地址: git://github.com/SUDA-LA/ucca-parser.git
创建时间: 2019-01-07T08:56:39Z
项目社区:https://github.com/SUDA-LA/ucca-parser

开源协议:MIT License

下载


UCCA Parser

An implementation of “HLT@SUDA at SemEval 2019 Task 1: UCCA Graph Parsing as Constituent Tree Parsing“.

This version of the implementation uses lexical features in the corpus, including POS tags, dependency labels and entity labels, just as described in the paper.

For other models or versions, please see different branches.

Requirements

  1. python >= 3.6.0
  2. pytorch == 1.0.0
  3. ucca == 1.0.127

Note that the code has not been tested on the newest version of ucca module.

Datasets

The datasets are all provided by SemEval-2019 Task 1: Cross-lingual Semantic Parsing with UCCA. The official website is https://competitions.codalab.org/competitions/19160.

Pre-trained embeddings: http://fasttext.cc

Performance

Here are the results I re-ran on June 13, 2019, which are almost the same as the results in the paper.

description dev primary dev remote dev average test wiki primary test wiki remote test wiki average test 20K primary test 20K remote test 20K average
English-Topdown-Lexical 79.7 52.2 79.2 77.9 48.0 77.4 74.0 23.4 73.0
German-Topdown-Lexical 82.9 57.1 82.4 / / / 83.5 61.1 83.0

Usage

You can start the training, evaluation and prediction process by using subcommands registered in parser.cmds or just use the shell scripts included in.

  1. $ python run.py -h
  2. usage: run.py [-h] {train,predict,evaluate} ...
  3. UCCA Parser.
  4. optional arguments:
  5. -h, --help show this help message and exit
  6. Commands:
  7. {train,predict,evaluate}
  8. train Train a model.
  9. predict Use a trained model to make predictions.
  10. evaluate Evaluate the specified model and dataset.

Optional arguments of the subparsers are as follows:

Note that the path to save the model is a directory. After training, there are three files in the directory which are named “config.json”, “vocab.pt” and “parser.pt”.

  1. $ python run.py train -h
  2. usage: run.py train [-h] --train_path TRAIN_PATH --dev_path DEV_PATH
  3. [--emb_path EMB_PATH] --save_path SAVE_PATH --config_path
  4. CONFIG_PATH [--test_wiki_path TEST_WIKI_PATH]
  5. [--test_20k_path TEST_20K_PATH] [--gpu GPU] [--seed SEED]
  6. [--threads THREADS]
  7. optional arguments:
  8. -h, --help show this help message and exit
  9. --train_path TRAIN_PATH
  10. train data dir
  11. --dev_path DEV_PATH dev data dir
  12. --emb_path EMB_PATH pretrained embedding path
  13. --save_path SAVE_PATH
  14. dic to save all file
  15. --config_path CONFIG_PATH
  16. dic to save all file
  17. --test_wiki_path TEST_WIKI_PATH
  18. wiki test data dir
  19. --test_20k_path TEST_20K_PATH
  20. 20k data dir
  21. --gpu GPU gpu id
  22. --seed SEED random seed
  23. --threads THREADS thread num
  24. $ python run.py evaluate -h
  25. usage: run.py evaluate [-h] --gold_path GOLD_PATH --save_path SAVE_PATH
  26. [--batch_size BATCH_SIZE] [--gpu GPU] [--seed SEED]
  27. [--threads THREADS]
  28. optional arguments:
  29. -h, --help show this help message and exit
  30. --gold_path GOLD_PATH
  31. gold test data dir
  32. --save_path SAVE_PATH
  33. path to save the model
  34. --batch_size BATCH_SIZE
  35. batch size
  36. --gpu GPU gpu id
  37. --seed SEED random seed
  38. --threads THREADS thread num
  39. $ python run.py predict -h
  40. usage: run.py predict [-h] --test_path TEST_PATH --save_path SAVE_PATH
  41. --pred_path PRED_PATH [--batch_size BATCH_SIZE]
  42. [--gpu GPU] [--seed SEED] [--threads THREADS]
  43. optional arguments:
  44. -h, --help show this help message and exit
  45. --test_path TEST_PATH
  46. test data dir
  47. --save_path SAVE_PATH
  48. path to save the model
  49. --pred_path PRED_PATH
  50. save predict passages
  51. --batch_size BATCH_SIZE
  52. batch size
  53. --gpu GPU gpu id
  54. --seed SEED random seed
  55. --threads THREADS thread num

Conversion

Conversion code is included in parser.convert. The function UCCA2tree is used to convert a UCCA passage to a tree. The function to_UCCA is used to convert a tree to a UCCA passage. Remote edge recovery code is included in parser.submodel.remote_parser.py independently.