项目作者: yunkang1989

项目描述 :
Creating a part of speech tagger
高级语言: Jupyter Notebook
项目地址: git://github.com/yunkang1989/Part-of-Speech-tagging.git
创建时间: 2020-07-02T13:34:45Z
项目社区:https://github.com/yunkang1989/Part-of-Speech-tagging

开源协议:

下载


Part-of-Speech-tagging

In this project, I use the Pomegranate library to build a hidden Markov model for part of speech tagging using a “universal” tagset. I achieved a >96% tag accuracy with larger tagsets on realistic text corpora. This project includes three steps.

1 Process raw texts.
2 Build a Most Frequent Class tagger to use as a baseline.
3 Build an HMM Part of Speech tagger and compare to the MFC baseline.

All codes are stored in the jupyter notebook.