项目作者: ibiscp

项目描述 :
Chinese word segmentation using Bidirectional LSTM
高级语言: Python
项目地址: git://github.com/ibiscp/Chinese-Word-Segmentation.git
创建时间: 2019-04-07T12:55:04Z
项目社区:https://github.com/ibiscp/Chinese-Word-Segmentation

开源协议:

下载


Chinese Word Segmentation

The goal of this project is to train a model based on Bidirectional LSTM to separate chinese words in a sentence.

The dataset used for the training was the concatenation of four different datasets: AS (Traditional Chinese), CITYU (Traditional Chinese), MSR (Simplified Chinese) and PKU (Simplified Chinese).

The training was done using a Google Compute Engine instance running a Tesla K80 GPU.

Instructions

  • Generate dictionary

python preprocess.py [resources_path] [sentence_size]

  • Train

python train.py [resources_path] [sentence_size]

  • Predict

python train.py [input_path] [output_path] [resources_path]

  • Score

python train.py [prediction_file] [gold_file]