项目作者: 3aransia

项目描述 :
Transliteration for languages and dialects
高级语言: Python
项目地址: git://github.com/3aransia/3aransia.git
创建时间: 2019-10-22T01:13:45Z
项目社区:https://github.com/3aransia/3aransia

开源协议:Apache License 2.0

下载


3aransia

Transliteration of languages and dialects

Open Source Love
License
made-with-python
GitHub last commit

Contribution

For contribution you can refer to CONTRIBUTING.md

Features

  • Fast and reliable - it uses default variables to access data
  • Bulk transliteration
  • API available
  • Multilanguage transliteration available
  • 70 languages and dialects supportted

Languages and dialects supported

  1. 1. Afrikaans 2. Algerian 3. Arabic
  2. 4. Azerbaijani 5. Bosnian 6. Catalan
  3. 7. Corsican 8. Czech 9. Welsh
  4. 10. Danish 11. German 12. Greek
  5. 13. English 14. Esperanto 15. Spanish
  6. 16. Estonian 17. Basque 18. Persian
  7. 19. Finnish 20. French 21. Frisian
  8. 22. Irish 23. Gaelic 24. Galician
  9. 25. Hausa 26. Croatian 27. Creole
  10. 28. Hungarian 29. Hawaiian 30. Indonesian
  11. 31. Igbo 32. Icelandic 33. Italian
  12. 34. Kinyarwanda 35. Kurdish 36. Latin
  13. 37. Libyan 38. Lithuanian 39. Luxembourgish
  14. 40. Latvian 41. Moroccan 42. Malagasy
  15. 43. Maori 44. Malay 45. Maltese
  16. 46. Dutch 47. Norwegian 48. Polish
  17. 49. Portuguese 50. Romanian 51. Samoan
  18. 52. Shona 53. Slovak 54. Slovenian
  19. 55. Somali 56. Albanian 57. Sesotho
  20. 58. Sundanese 59. Swedish 60. Swahili
  21. 61. Filipino 62. Tunisian 63. Turkish
  22. 64. Turkmen 65. Urdu 66. Uzbek
  23. 67. Vietnamese 68. Xhosa 69. Yoruba
  24. 70. Zulu

Installation

pip install aaransia

Usage

Transliterate from a language or dialect to another

  1. from aaransia import transliterate
  2. ARABIC_SENTENCE = "كتب بلعربيا هنايا شحال ما بغيتي"
  3. print(transliterate(ARABIC_SENTENCE, source='ar', target='ma'))
  1. >>> ktb bl3rbya hnaya ch7al ma bghiti

Transliterate cross languages and dialects to another, using the universal parameter

  1. from aaransia import transliterate, SourceLanguageError
  2. MOROCCAN_ARABIC_SENTENCE = "ktb بلعربيا hnaya شحال ما بغيتي"
  3. try:
  4. print(transliterate(MOROCCAN_ARABIC_SENTENCE, source='ar', target='ma'))
  5. except SourceLanguageError as source_language_error:
  6. print(source_language_error)
  7. print(transliterate(MOROCCAN_ARABIC_SENTENCE, source='ar', target='ma', universal=True))
  8. print(transliterate(MOROCCAN_ARABIC_SENTENCE, source='ma', target='ar', universal=True))
  1. >>> Source alphabet language doesn't match the input text: ar
  2. >>> ktb bl3rbya hnaya chhal ma bghyty
  3. >>> كتب بلعربيا هنايا شحال ما بغيتي

Get all alphabets codes

  1. from aaransia import get_alphabets_codes
  2. print(len(get_alphabets_codes()))
  3. print(get_alphabets_codes())
  1. >>> 70
  2. >>> ['ar', 'af', 'sq', 'al', 'az', 'eu', 'bo', 'ca', 'co', 'hr', 'cs', 'da',
  3. 'nl', 'en', 'eo', 'et', 'tl', 'fi', 'fr', 'fs', 'gl', 'de', 'ht', 'ha', 'hw',
  4. 'hu', 'is', 'ig', 'id', 'ga', 'it', 'ki', 'ku', 'la', 'lv', 'li', 'lt', 'lu',
  5. 'ma', 'mg', 'ms', 'mt', 'mo', 'no', 'pl', 'pt', 'ro', 'sa', 'gc', 'el',
  6. 'ss', 'sh', 'sk', 'sl', 'so', 'es', 'su', 'sw', 'sv', 'tn', 'tr', 'tu',
  7. 'uz', 'vi', 'cy', 'xh', 'yo', 'zu', 'fa', 'ur']

Get all alphabets

  1. from aaransia import get_alphabets
  2. print(get_alphabets())
  1. >>> {
  2. >>> 'af': 'Afrikaans Alphabet',
  3. >>> 'al': 'Algerian Alphabet',
  4. >>> 'ar': 'Arabic Alphabet',
  5. >>> 'az': 'Azerbaijani Alphabet',
  6. >>> 'bo': 'Bosnian Alphabet',
  7. >>> 'ca': 'Catalan Alphabet',
  8. >>> 'co': 'Corsican Alphabet',
  9. >>> 'cs': 'Czech Alphabet',
  10. >>> 'cy': 'Welsh Alphabet',
  11. >>> 'da': 'Danish Alphabet',
  12. >>> 'de': 'German Alphabet',
  13. >>> 'el': 'Greek Alphabet',
  14. >>> 'en': 'English Alphabet',
  15. >>> 'eo': 'Esperanto Alphabet',
  16. >>> 'es': 'Spanish Alphabet',
  17. >>> 'et': 'Estonian Alphabet',
  18. >>> 'eu': 'Basque Alphabet',
  19. >>> 'fa': 'Persian Alphabet',
  20. >>> 'fi': 'Finnish Alphabet',
  21. >>> 'fr': 'French Alphabet',
  22. >>> 'fs': 'Frisian Alphabet',
  23. >>> 'ga': 'Irish Alphabet',
  24. >>> 'gc': 'Gaelic Alphabet',
  25. >>> 'gl': 'Galician Alphabet',
  26. >>> 'ha': 'Hausa Alphabet',
  27. >>> 'hr': 'Croatian Alphabet',
  28. >>> 'ht': 'Creole Alphabet',
  29. >>> 'hu': 'Hungarian Alphabet',
  30. >>> 'hw': 'Hawaiian Alphabet',
  31. >>> 'id': 'Indonesian Alphabet',
  32. >>> 'ig': 'Igbo Alphabet',
  33. >>> 'is': 'Icelandic Alphabet',
  34. >>> 'it': 'Italian Alphabet',
  35. >>> 'ki': 'Kinyarwanda Alphabet',
  36. >>> 'ku': 'Kurdish Alphabet',
  37. >>> 'la': 'Latin Alphabet',
  38. >>> 'li': 'Libyan Alphabet',
  39. >>> 'lt': 'Lithuanian Alphabet',
  40. >>> 'lu': 'Luxembourgish Alphabet',
  41. >>> 'lv': 'Latvian Alphabet',
  42. >>> 'ma': 'Moroccan Alphabet',
  43. >>> 'mg': 'Malagasy Alphabet',
  44. >>> 'mo': 'Maori Alphabet',
  45. >>> 'ms': 'Malay Alphabet',
  46. >>> 'mt': 'Maltese Alphabet',
  47. >>> 'nl': 'Dutch Alphabet',
  48. >>> 'no': 'Norwegian Alphabet',
  49. >>> 'pl': 'Polish Alphabet',
  50. >>> 'pt': 'Portuguese Alphabet',
  51. >>> 'ro': 'Romanian Alphabet',
  52. >>> 'sa': 'Samoan Alphabet',
  53. >>> 'sh': 'Shona Alphabet',
  54. >>> 'sk': 'Slovak Alphabet',
  55. >>> 'sl': 'Slovenian Alphabet',
  56. >>> 'so': 'Somali Alphabet',
  57. >>> 'sq': 'Albanian Alphabet',
  58. >>> 'ss': 'Sesotho Alphabet',
  59. >>> 'su': 'Sundanese Alphabet',
  60. >>> 'sv': 'Swedish Alphabet',
  61. >>> 'sw': 'Swahili Alphabet',
  62. >>> 'tl': 'Filipino Alphabet',
  63. >>> 'tn': 'Tunisian Alphabet',
  64. >>> 'tr': 'Turkish Alphabet',
  65. >>> 'tu': 'Turkmen Alphabet',
  66. >>> 'ur': 'Urdu Alphabet',
  67. >>> 'uz': 'Uzbek Alphabet',
  68. >>> 'vi': 'Vietnamese Alphabet',
  69. >>> 'xh': 'Xhosa Alphabet',
  70. >>> 'yo': 'Yoruba Alphabet',
  71. >>> 'zu': 'Zulu Alphabet'
  72. >>> }

Adding a language or a dialect

  1. Add it to the alphabet CSV file
  2. Generate the whole alphabet with the construct_alphabet function from data.py
  3. Update the defaults.py (the order the to be respected)
    1. Add the alphabet code
    2. Add the alphabet name
    3. Add both of them to the alphabet dictionary
    4. Add the double letters if there are any
  4. Test a text with the language just added against all other languages in test.py
    1. Add a language text to test in text_samples (the order is to be respected)
    2. Add test handling for the new language
    3. Test it by using the command python -m unittest discover -s aaransia from the 3aransia repository
    4. Fix the bugs
  5. Validate it semantically and phonetically
  6. Make a pull request
  7. Wait for the PR confirmation and add your name to the collaborators

Fixing bugs and adding features

  • pylint code before doing a PR
  • Contribution can also be made through adding issues