项目作者: UAlbertaALTLab

项目描述 :
Mirror of the source code for the Plains Cree morphological analyzer/generator.
高级语言: Makefile
项目地址: git://github.com/UAlbertaALTLab/plains-cree-fsts.git
创建时间: 2019-02-17T07:28:00Z
项目社区:https://github.com/UAlbertaALTLab/plains-cree-fsts

开源协议:Other

下载


Plains Cree FSTs

No longer maintained: please see https://github.com/giellalt/lang-crk

Build Status

kîkwây ôma?

This is a mirror of the Plains Cree morphological finite-state
transducers
(FSTs) source code. The FSTs can analyze and
generate nêhiyawêwin word forms.

âh?

You can use the FSTs to explain the grammar (analysis) of a nêhiyawêwin words:

  1. kohkom -> nôhkom+N+A+D+Px2Sg+Sg

And you can use the models to generate a word, based on
a grammatical description:

  1. nôhkom+N+A+D+Px1Pl+Sg -> nôhkominân

The canonical source code for the FSTs, with derivational FSTs, and more
are available at https://gtsvn.uit.no/langtech/trunk/langs/crk/.

Download the FSTs

Download compiled FSTS on the releases page!

You can use *.hfstol files with hfst-optimized-lookup and *.fomabin
with flookup. You can also use the *.fomabin and *.hfstol file in Python using
fst-lookup and hfstol respectively.

Usage

Using the HFST application suite:

  1. $ echo "ewapamat" | hfst-optimized-lookup -q crk-descriptive-analyzer.hfstol
  2. ewapamat PV/e+wâpamêw+V+TA+Cnj+Prs+2Sg+3SgO+Err/Orth
  3. ewapamat PV/e+wâpamêw+V+TA+Cnj+Prs+3Sg+4Sg/PlO+Err/Orth
  4. $ echo "PV/e+wâpamêw+V+TA+Cnj+Prs+3Sg+4Sg/PlO" | hfst-optimized-lookup crk-normative-generator.hfstol
  5. PV/e+wâpamêw+V+TA+Cnj+Prs+3Sg+4Sg/PlO ê-wâpamât

Using Foma:

  1. $ echo "ewapamat" | flookup crk-descriptive-analyzer.fomabin
  2. ewapamat PV/e+wâpamêw+V+TA+Cnj+Prs+2Sg+3SgO+Err/Orth
  3. ewapamat PV/e+wâpamêw+V+TA+Cnj+Prs+3Sg+4Sg/PlO+Err/Orth
  4. $ echo "PV/e+wâpamêw+V+TA+Cnj+Prs+3Sg+4Sg/PlO" | flookup crk-normative-generator.fomabin
  5. PV/e+wâpamêw+V+TA+Cnj+Prs+3Sg+4Sg/PlO ê-wâpamât

Using fst-lookup:

  1. from fst_lookup import FST
  2. analyzer = FST.from_file('crk-descriptive-analyzer.fomabin')
  3. for analysis in analyzer.analyze('ewapamat'):
  4. print(analysis)
  5. # prints: ('PV/e+', 'wâpamêw', '+V', '+TA', '+Cnj', '+Prs', '+2Sg', '+3SgO', '+Err/Orth')
  6. # ('PV/e+', 'wâpamêw', '+V', '+TA', '+Cnj', '+Prs', '+3Sg', '+4Sg/PlO', '+Err/Orth')
  7. # NB: You must invert the labels on the generator because this FST is "upside-down"!
  8. generator = FST.from_file('crk-normative-generator.fomabin', labels='invert')
  9. for wordform in generator.generate('PV/e+' 'wâpamêw' '+V' '+TA' '+Cnj' '+Prs' '+3Sg' '+4Sg/PlO'):
  10. print(wordform)
  11. # prints: ê-wâpamât

Bulk lookups

If you want to generate a large amount of word forms all at once, it is
recommended that you use hfst-optimized-lookup command, as this is the
fastest way to generate lookups.
You will provide analyses, one per line. For example, say I want to
conjugate mîcisow, and I have a file of analyses called conjugations.txt:

  1. mîcisow+V+AI+Ind+Prs+1Sg
  2. mîcisow+V+AI+Ind+Prs+2Sg
  3. mîcisow+V+AI+Ind+Prs+3Sg
  4. PV/e+mîcisow+V+AI+Cnj+Prs+1Sg
  5. PV/e+mîcisow+V+AI+Cnj+Prs+2Sg
  6. PV/e+mîcisow+V+AI+Cnj+Prs+3Sg

You can pipe this into hfst-optimized-lookup:

  1. $ cat conjugations.txt | hfst-optimized-lookup crk-normative-generator.hfstol
  2. mîcisow+V+AI+Ind+Prs+1Sg nimîcison
  3. mîcisow+V+AI+Ind+Prs+2Sg kimîcison
  4. mîcisow+V+AI+Ind+Prs+3Sg mîcisow
  5. PV/e+mîcisow+V+AI+Cnj+Prs+1Sg ê-mîcisoyân
  6. PV/e+mîcisow+V+AI+Cnj+Prs+2Sg ê-mîcisoyan
  7. PV/e+mîcisow+V+AI+Cnj+Prs+3Sg ê-mîcisot

You can use the two-column output to map the input to the generated word
form. This is useful, since some analyses have multiple possible word
forms (e.g., cactus+Pl in English can be “cactuses” or “cacti”).

Working on the FSTs

The following instructions assume you’re working in a Linux/macOS/Unix
command line.

Dependencies

You’ll need (GNU) Make, and HFST. If you’re on macOS/Linux, you probably already have make
installed. HFST can be installed on macOS with Homebrew by typing:

  1. brew install ualbertaaltlab/hfst/hfst

Building

To build the FSTs from scratch, type the following in the root
directory:

  1. make -j fsts

The resultant *.hfstol and *.foma files will be placed in src/.

Explanation:

  • make: run GNU Make
  • -j: run jobs on as many CPU cores as possible
  • fsts: the thing you want to make are the *.hfstol and *.foma FSTs.

If you see the message,

  1. make[1]: Nothing to be done for `fsts'.

This means the FSTs are up-to-date, so there’s no need to remake them.
If you want to remake them anyway, add the -B flag when using make:

  1. make -j -B fsts

Modifying

Change the *.lexc, *.regexp, and *.twolc files in src/, then run
make -j fsts to see the changes.

Citation

If you use this work in an academic context, use this to cite the
morphological FST:

  1. @misc{arppe2019finite,
  2. Author={Arppe, Antti and Harrigan, Atticus and Schmirler, Katherine and Antonsen, Lene and Trosterud, Trond and N{\o}rsteb{\o} Moshagen, Sjur and Silfverberg, Miikka and Wolvengrey, Arok and Snoek, Conor and Lachler, Jordan and Santos, Eddie Antonio and Okim{\=a}sis, Jean and Thunder, Dorothy},
  3. Howpublished={\url{https://gtsvn.uit.no/langtech/trunk/langs/crk/}},
  4. Title={Finite-state transducer-based computational model of {Plains Cree} morphology},
  5. Year={2014--2019}
  6. }

You may also cite these publications:

  1. @inproceedings{snoek2014modeling,
  2. title={Modeling the noun morphology of Plains Cree},
  3. author={Snoek, Conor and Thunder, Dorothy and Loo, Kaidi and Arppe, Antti and Lachler, Jordan and Moshagen, Sjur and Trosterud, Trond},
  4. booktitle={Proceedings of the 2014 Workshop on the Use of Computational Methods in the Study of Endangered Languages},
  5. pages={34--42},
  6. year={2014}
  7. }
  8. @article{harrigan2017learning,
  9. title={Learning from the computational modelling of Plains Cree verbs},
  10. author={Harrigan, Atticus G and Schmirler, Katherine and Arppe, Antti and Antonsen, Lene and Trosterud, Trond and Wolvengrey, Arok},
  11. journal={Morphology},
  12. volume={27},
  13. number={4},
  14. pages={565--598},
  15. year={2017},
  16. publisher={Springer}
  17. }

Maintainer tools

To sync the FST sources with the upstream SVN repository, re-download
the sources list:

  1. make -B src/morphological-fst-sources.mk

Then download all the sources again:

  1. make -j -B download

And make the fsts like normal!

  1. make -j fsts

License

The FST and its sources are distributed under the terms of Affero GPL
license:

Copyright (C) 2015—2019 Alberta Language Technology Lab (ALTLab) altlab@ualberta.ca

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as
published by the Free Software Foundation, either version 3 of the
License, or (at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License
along with this program. If not, see http://www.gnu.org/licenses.