项目作者: ankane

项目描述 :
Named-entity recognition for Ruby
高级语言: Ruby
项目地址: git://github.com/ankane/mitie.git
创建时间: 2020-09-14T20:03:01Z
项目社区:https://github.com/ankane/mitie

开源协议:Boost Software License 1.0

下载


MITIE Ruby

MITIE - named-entity recognition, binary relation detection, and text categorization - for Ruby

  • Finds people, organizations, and locations in text
  • Detects relationships between entities, like PERSON was born in LOCATION

Build Status

Installation

Add this line to your application’s Gemfile:

  1. gem "mitie"

And download the pre-trained models for your language:

Getting Started

Named Entity Recognition

Load an NER model

  1. model = Mitie::NER.new("ner_model.dat")

Create a document

  1. doc = model.doc("Nat works at GitHub in San Francisco")

Get entities

  1. doc.entities

This returns

  1. [
  2. {text: "Nat", tag: "PERSON", score: 0.3112371212688382, offset: 0},
  3. {text: "GitHub", tag: "ORGANIZATION", score: 0.5660115198329334, offset: 13},
  4. {text: "San Francisco", tag: "LOCATION", score: 1.3890524313885309, offset: 23}
  5. ]

Get tokens

  1. doc.tokens

Get tokens and their offset

  1. doc.tokens_with_offset

Get all tags for a model

  1. model.tags

Training

Load an NER model into a trainer

  1. trainer = Mitie::NERTrainer.new("total_word_feature_extractor.dat")

Create training instances

  1. tokens = ["You", "can", "do", "machine", "learning", "in", "Ruby", "!"]
  2. instance = Mitie::NERTrainingInstance.new(tokens)
  3. instance.add_entity(3..4, "topic") # machine learning
  4. instance.add_entity(6..6, "language") # Ruby

Add the training instances to the trainer

  1. trainer.add(instance)

Train the model

  1. model = trainer.train

Save the model

  1. model.save_to_disk("ner_model.dat")

Binary Relation Detection

Detect relationships betweens two entities, like:

  • PERSON was born in LOCATION
  • ORGANIZATION was founded in LOCATION
  • FILM was directed by PERSON

There are 21 detectors for English. You can find them in the binary_relations directory in the model download.

Load a detector

  1. detector = Mitie::BinaryRelationDetector.new("rel_classifier_organization.organization.place_founded.svm")

And create a document

  1. doc = model.doc("Shopify was founded in Ottawa")

Get relations

  1. detector.relations(doc)

This returns

  1. [{first: "Shopify", second: "Ottawa", score: 0.17649169745814464}]

Training

Load an NER model into a trainer

  1. trainer = Mitie::BinaryRelationTrainer.new(model)

Add positive and negative examples to the trainer

  1. tokens = ["Shopify", "was", "founded", "in", "Ottawa"]
  2. trainer.add_positive_binary_relation(tokens, 0..0, 4..4)
  3. trainer.add_negative_binary_relation(tokens, 4..4, 0..0)

Train the detector

  1. detector = trainer.train

Save the detector

  1. detector.save_to_disk("binary_relation_detector.svm")

Text Categorization

Load a model into a trainer

  1. trainer = Mitie::TextCategorizerTrainer.new("total_word_feature_extractor.dat")

Add labeled text to the trainer

  1. trainer.add("This is super cool", "positive")

Train the model

  1. model = trainer.train

Save the model

  1. model.save_to_disk("text_categorization_model.dat")

Load a saved model

  1. model = Mitie::TextCategorizer.new("text_categorization_model.dat")

Categorize text

  1. model.categorize("What a super nice day")

Deployment

Check out Trove for deploying models.

  1. trove push ner_model.dat

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development:

  1. git clone https://github.com/ankane/mitie-ruby.git
  2. cd mitie-ruby
  3. bundle install
  4. bundle exec rake vendor:all
  5. export MITIE_MODELS_PATH=path/to/MITIE-models/english
  6. bundle exec rake test