项目作者: IlyaGusev

项目描述 :
Morphological analyzer for Russian and English languages based on neural networks and dictionary-lookup systems.
高级语言: Python
项目地址: git://github.com/IlyaGusev/rnnmorph.git
创建时间: 2017-09-13T12:53:16Z
项目社区:https://github.com/IlyaGusev/rnnmorph

开源协议:Apache License 2.0

下载


rnnmorph

Current version on PyPI
Python versions
Tests Status
Code Climate

Important: please see https://github.com/natasha/slovnet#morphology-1

Morphological analyzer (POS tagger) for Russian and English languages based on neural networks and dictionary-lookup systems (pymorphy2, nltk).

Contacts

Russian language, MorphoRuEval-2017 test dataset, accuracy

Domain Full tag PoS tag F.t. + lemma Sentence f.t. Sentence f.t.l.
Lenta (news) 96.31% 98.01% 92.96% 77.93% 52.79%
VK (social) 95.20% 98.04% 92.06% 74.30% 60.56%
JZ (lit.) 95.87% 98.71% 90.45% 73.10% 43.15%
All 95.81% 98.26% N/A 74.92% N/A

English language, UD EWT test, accuracy

Dataset Full tag PoS tag F.t. + lemma Sentence f.t. Sentence f.t.l.
UD EWT test 91.57% 94.10% 87.02% 63.17% 50.99%

Speed and memory consumption

Speed: from 200 to 600 words per second using CPU.

Memory consumption: about 500-600 MB for single-sentence predictions

Install

  1. pip install rnnmorph

Usage

Example: Open In Colab

  1. from rnnmorph.predictor import RNNMorphPredictor
  2. predictor = RNNMorphPredictor(language="ru")
  3. forms = predictor.predict(["мама", "мыла", "раму"])
  4. print(forms[0].pos)
  5. >>> NOUN
  6. print(forms[0].tag)
  7. >>> Case=Nom|Gender=Fem|Number=Sing
  8. print(forms[0].normal_form)
  9. >>> мама
  10. print(forms[0].vector)
  11. >>> [0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 1 0 0 1]

Training

Simple model training:
Open In Colab

Acknowledgements