Chinese word segmentation with CRF++.
Chinese word segmentation with CRF++.
If running in a different environment, this program may not work properly.
For instance, considering the difference of newline between Windows and Linux, you need to modify Line 32 of segment.py
.
python prepare.py
.template.utf8
instead.CRF++-0.58/crf_learn.exe
(For Windows only) to train your own model or use the models in CRF_Model/
.python segment.py
to segment.Using the two models in CRF_Model/
, the F1 Scores of testing files are listed as follows:
crf_model_pku | crf_model_both | |
---|---|---|
pku_test | 0.931 | 0.880 |
msr_test | 0.857 | 0.936 |
Maybe adjusting the template and training parameters can make the result better, but as it takes too much time to do it, I just stop here.
Anyway, this is just the beginning of many to explore.