项目作者: AnalyzePlatypus

项目描述 :
Hebrew - English Transliteration Engine
高级语言: Ruby
项目地址: git://github.com/AnalyzePlatypus/TranslitKit.git
创建时间: 2017-02-23T01:13:07Z
项目社区:https://github.com/AnalyzePlatypus/TranslitKit

开源协议:MIT License

下载


TranslitKit

Build Status
Code Climate
Coverage Status
Inline docs
Gem Version
license

TranslitKit is a framework for Hebrew-English transliteration.

Installation

  1. gem install translit_kit
  1. # in your Gemfile
  2. gem 'translit_kit'

Requires Ruby 2.2 or later

Usage

Basic transliteration

  1. require 'translit_kit'
  2. word = HebrewWord.new "אַברָהָם"
  3. word.transliterate(:single)
  4. # => ["avrohom"]
  5. # Shortcut
  6. word.t(:single)
  7. # => ["avrohom"]

Transliteration is powered by phoneme maps, files that map between Hebrew phonemes, or units of sound, and English characters. (see below)

Three phoneme_maps are provided: :long, :short, and :single.
You can easily add your own (see below)

  1. word.t(:single)
  2. # => ["avrohom"]
  3. word.t(:short)
  4. # => ["avroom", "avroam", "avroem", "avrohom", "avroham",
  5. # "avrohem", "avraom", "avraam", "avraem", "avrahom",
  6. # "avraham", "avrahem", "avreom", "avream", "avreem",
  7. # "avrehom", "avreham", "avrehem" ]
  8. word.t(:long)
  9. # => ["avroom", "avrooom", "avroohm", ... ] # 5,997 more!

The default is :short:

  1. word.t == word.t(:short)
  2. # => true

To get the total permutation count, call HebrewWord#inspect

  1. word.inspect
  2. # => "אַברָהָם: Permutations: 1 single | 18 short | 6000 long"

Adding Custom Phoneme maps

Format

Phoneme Maps are simply JSON files, placed in the lib/phoneme_maps directory.

The file should map between each String (the phonemes) and an Arrays of replacement characters.

  1. {
  2. "ב": ["v"],
  3. "בּ": ["b", "bb"]
  4. }

A phoneme can be a Hebrew character א, nekuda (ָ), or character with modifiers, such as a dagesh (בּ). Keep in mind that many characters will be normalized (see below).

Installation

To install your custom map, place the file in lib/resources

Your file will be available as the symbol:<filename> without the .json extension.

Example: klingon.json becomes :klingon

Now you can use it anywhere:

  1. word.transliterate(:klingon)
  2. # => (Results)

At present, your map will not display results in HebrewWord#inspect

Contributing

TranslitKit is currently maintained by @AnalyzePlatypus.
Contributions welcome!

Appendix: Pre-Processing

When a word is transliterated, it is pre-processed to normalize certain characters.
Specifically:

  • Whitespace is stripped
  • The final letters [םןךףץ] are normalized to their standard forms
  • CHATAF nekudos ['ֲ','ֳ','ֱ'] are normalized to their standard forms
  • Full CHIRIK, TZEIREI, and CHOLOM nekudos have their letters removed
  • DAGESH characters are removed from all but the characters [בוכפת]