Deep neural network for Chinese word segmentation
Deep neural network for Chinese word segmentation
Method used see https://github.com/zhengyuan-liu/DNN-for-CWS/blob/master/Reference-Paper-Summary.pdf
dl_for_cws.py
: initialize and train a DNN for Chinese Word Segmentation on training data set (NN1)dl_for_cws_pretrained.py
: initialize a DNN by pre-trained character embeddings and train it on training data set (NN2)nn1_test.py
: generate segmentation results on test data using NN1nn2_test.py
: generate segmentation results on test data using NN2segment_score.py
: get precision P, recall R, and F1-score F for the segmentation taskbuild_unlabeled_corpus.py
: build unlabeled corpus based on PKU and MSRA training data setword2vec_pretrain.py
: get pre-trained character embeddings by word2vec toolkitnn1
: DNN trained by dl_for_cws.py
nn2
: DNN trained by dl_for_cws_pretrained
word2vector.vector
: pre-trained character embeddings by word2vec toolkitpku_training(.txt and .utf8)
: PKU training data setpku_test(.txt and .utf8)
: PKU test data setpku_test_gold(.txt and .utf8)
: gold segmentation of PKU test data setmsr_training(.txt and .utf8
): MSRA training data setmsr_test(.txt and .utf8)
: MSRA training data setunlabeled_corpus.utf8
: unlabeled corpus to train the word2vec modelpku_test_result1.utf8
: segmentation result of PKU test data set (NN1)pku_test_result2.utf8
: segmentation result of PKU test data set (NN2)