项目作者: gsriram7

项目描述 :
Part of speech tagger using HMM and Viterbi algorithm
高级语言: Python
项目地址: git://github.com/gsriram7/POS.git
创建时间: 2018-03-06T04:15:11Z
项目社区:https://github.com/gsriram7/POS

开源协议:

下载


Part of Speech Tagger

The tagger uses Hidden Markov Model to encode the a language corpus with words tagged with corresponding tags.
Uses Viterbi algorithm to decode and tag sentences from test data.

The encoder is generic and it works for ANY language.

The encoder models the corpus and writes the probabilities into hmmmodel.txt
The decoder consumes the model and tags the test data and writes the output into hmmoutput.txt

Accuracy for the model trained on given corpa

  • English - 88.93%
  • Chinese - 87.08%
  • Hindi - 92.34%

    These accuracies are obtained using a single generic encoder for 3 different languages.