A Foma implementation of Buryat inflectional morphology (currently under development).
You'll need Foma FST installed to run the analyzer.
foma[0]: source dial_morph.foma
Opening file 'dial_morph.foma'.
defined frntShrt: 289 bytes. 2 states, 2 arcs, 2 paths.
defined frntLng: 338 bytes. 2 states, 3 arcs, 3 paths.
...
defined Grammar: 6.3 MB. 10098 states, 413397 arcs, more than 9223372036854775807 paths.
6.3 MB. 10098 states, 413397 arcs, more than 9223372036854775807 paths.
foma[1]: source translit.foma
Opening file 'translit.foma'.
...
defined tsl=Grammar: 154.2 kB. 373 states, 9657 arcs, Cyclic.
154.2 kB. 373 states, 9657 arcs, Cyclic.
foma[2]: source harm_check.foma
Opening file 'harm_check.foma'.
...
defined fsaHarmCheck: 5.6 kB. 8 states, 275 arcs, Cyclic.
5.6 kB. 8 states, 275 arcs, Cyclic.
foma[3]: define myGrammar tsl=Grammar .o. Grammar .o. fsaHarmCheck .o. tsl=Grammar.i ;
defined myGrammar: 12.5 MB. 20057 states, 821063 arcs, more than 9223372036854775807 paths.
foma[3]: push myGrammar
12.5 MB. 20057 states, 821063 arcs, more than 9223372036854775807 paths.
foma[4]: apply down
apply down> басаган<DAT>
басаганда
apply down> хото<PL><DAT><REFL>
хотонуудтаа
apply down> олошор<CAUS3><PASS><PRS>
олошоруулагдана
apply down> гэр<DAT>
гэртэ
apply down> гэр<DAT><ADJ><PL>
гэртэхинүүд
гэртэхид
apply down> гэр<DAT><ADJ><PL><GA2><ACC>
гэртэхинүүдые
гэртэхидые
apply down> гэр<DAT><ADJ><PL><GA2><ACC><1SGn>
гэртэхинүүдыем
гэртэхинүүдыемни
гэртэхидыем
гэртэхидыемни
apply up> хэг
хэг
хэг<.ACC>
хэ<JUSS>
apply up> танилсажа
танилсажа
танилсажан<.C><.ACC>
танилсажа<.ACC>
танилсажа<IMP>
танилса<CONV1>
тани<SOC><CONV1>
In order to reduce the amount of possible tag strings while applying up (performing analysis), the use of a dictionary of lemmas and names is suggested.
This repository contains the following FSM's:
- dial_morph.foma: a finite-state transducer capable of producing morphological tag strings for a given word form when applied up and generating the correct word form for a lemma+tag strings when applied down, for Barguzin Buryat
- stdrd_morph.foma: a similar FST for standard Buryat
- translit.foma: an FST to transform strings from the traditional Cyrillic alphabet to an IPA-based notation and backwards
- harm_check.foma: an FSA ensuring backness and roundedness harmony of the lemma.
The default way to build a transducer from these parts is as follows:
| tsl=Grammar | .o. | fsaHarmCheck | .o. | Grammar | .o. | tsl=Grammar.i |
| translit.foma | harm_check.foma | dial_morph.foma | translit.foma |
However, in some cases, the harmony checking FSA can be omitted, e.g. when dealing with loan words where several harmony domains are possible.
For IPA inputs and/or outputs remove the corresponding transliteration transducer(s) from your FST.
Please note that the harmony checker should only be used to check the harmony of a lemma, since several Buryat affixes open a new harmony domain, making different backness and roundedness values occur in one wordform.
The data for the analyzer was collected during fieldwork in Baragkhan, Buryatia, Russia in 2015-2018.