项目作者: Pusnow

项目描述 :
Python implementation for KS X 1026-1 and KS X 1026-2
高级语言: Python
项目地址: git://github.com/Pusnow/KS-X-1026-Python.git
创建时间: 2016-08-22T12:32:22Z
项目社区:https://github.com/Pusnow/KS-X-1026-Python

开源协议:MIT License

下载


KS X 1026 Python

Build Status

Python implementation for KS X 1026-1.

KS X 1026-1

KS X 1026-1 is a Korean standard for Hangul processing guide for information interchange. More informations are available here.

Installation

KS X 1026 Python is available via PyPi

  1. pip install ksx1026

or setup.py

  1. python setup.py install

Normalizations

Hangul Decomposition

Returns a Johab Modern Hangul Syllable Block for the given Wanseong Modern Hangul Syllable Block

char S: Single character Hangul Syllable. If not, return input.

  1. >>> from ksx1026.normalization import decomposeHangul
  2. >>> c = "\uAC01"
  3. >>> d = decomposeHangul(c)
  4. >>> print(d.encode('raw_unicode_escape'))
  5. b'\\u1100\\u1161\\u11a8'

Hangul Decomposition String

Returns a Johab Modern Hangul Syllable string for the given Wanseong Modern Hangul Syllable string

string source: unicode string.

  1. >>> from ksx1026.normalization import decomposeHangulStr
  2. >>> source = "\uAC01\uAC01"
  3. >>> d = decomposeHangul(source)
  4. >>> print(d.encode('raw_unicode_escape'))
  5. b'\\u1100\\u1161\\u11a8\\u1100\\u1161\\u11a8'

Hangul Composition

Returns a Wanseong Modern Hangul Syllable Block for the given Johab Modern Hangul Syllable Block. Even when a portion of an Old Hangul Syllable Block is a Modern Hangul Syllable Block,unlike UAX #15, that portion is not transformed to a Wanseong Modern Hangul Syllable Block.

string source: unicode string.

  1. >>> from ksx1026.normalization import composeHangul
  2. >>> source = "\u1100\u1161\u11a8"
  3. >>> d = composeHangul(source)
  4. >>> print(d.encode('raw_unicode_escape'))
  5. b'\\uac01'
  6. >>> source = "\u1100\u1161\u11c3"
  7. >>> d = composeHangul(source)
  8. >>> print(d.encode('raw_unicode_escape'))
  9. b'\\u1100\\u1161\\u11c3'

Hangul Recomposition

If one uses a UAX #15 algorithm instead of the above composeHangul function for normalization, an Old Hangul Syllable Block can be decomposed into a Wanseong Modern Hangul Syllable Block and Johab Hangul Letter(s). In such cases, after applying, one can use the following recomposition algorithm to restore a character string in Normalization Form NFC or NFKC to an L V T format.

string source: unicode string

  1. >>> from ksx1026.normalization import recomposeHangul
  2. >>> source = "\uac00\u11c3"
  3. >>> d = recomposeHangul(source)
  4. >>> print(d.encode('raw_unicode_escape'))
  5. b'\\u1100\\u1161\\u11c3'

Normalization of Compatibility/Halfwidth Hangul Letters and Hangul-embedded symbols

Normalizing Compatibility/Halfwidth Hangul Letters and Hangul-embedded symbols (NormalizeJamoKDKC)

string source: unicode string

  1. >>> from ksx1026.normalization import normalizeJamoKDKC
  2. >>> source = "\u3200"
  3. >>> d = normalizeJamoKDKC(source)
  4. >>> print(d.encode('raw_unicode_escape'))
  5. >>> b'(\\u1100\\u1160)