Skip to content

katja-kolos/bxr_morph

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

bxr_morph

A Foma implementation of Buryat inflectional morphology (currently under development).

You'll need Foma FST installed to run the analyzer.

Simple usage example

foma[0]: source dial_morph.foma
Opening file 'dial_morph.foma'.
defined frntShrt: 289 bytes. 2 states, 2 arcs, 2 paths.
defined frntLng: 338 bytes. 2 states, 3 arcs, 3 paths.
...
defined Grammar: 6.3 MB. 10098 states, 413397 arcs, more than 9223372036854775807 paths.
6.3 MB. 10098 states, 413397 arcs, more than 9223372036854775807 paths.
foma[1]: source translit.foma
Opening file 'translit.foma'.
...
defined tsl=Grammar: 154.2 kB. 373 states, 9657 arcs, Cyclic.
154.2 kB. 373 states, 9657 arcs, Cyclic.
foma[2]: source harm_check.foma
Opening file 'harm_check.foma'.
...
defined fsaHarmCheck: 5.6 kB. 8 states, 275 arcs, Cyclic.
5.6 kB. 8 states, 275 arcs, Cyclic.
foma[3]: define myGrammar tsl=Grammar .o. Grammar .o. fsaHarmCheck .o. tsl=Grammar.i ;
defined myGrammar: 12.5 MB. 20057 states, 821063 arcs, more than 9223372036854775807 paths.
foma[3]: push myGrammar
12.5 MB. 20057 states, 821063 arcs, more than 9223372036854775807 paths.
foma[4]: apply down
apply down> басаган<DAT>
басаганда
apply down> хото<PL><DAT><REFL>
хотонуудтаа
apply down> олошор<CAUS3><PASS><PRS>
олошоруулагдана
apply down> гэр<DAT>
гэртэ
apply down> гэр<DAT><ADJ><PL>
гэртэхинүүд
гэртэхид
apply down> гэр<DAT><ADJ><PL><GA2><ACC>
гэртэхинүүдые
гэртэхидые
apply down> гэр<DAT><ADJ><PL><GA2><ACC><1SGn>
гэртэхинүүдыем
гэртэхинүүдыемни
гэртэхидыем
гэртэхидыемни
apply up> хэг
хэг
хэг<.ACC>
хэ<JUSS>
apply up> танилсажа
танилсажа
танилсажан<.C><.ACC>
танилсажа<.ACC>
танилсажа<IMP>
танилса<CONV1>
тани<SOC><CONV1>

In order to reduce the amount of possible tag strings while applying up (performing analysis), the use of a dictionary of lemmas and names is suggested.

Components

This repository contains the following FSM's:

  • dial_morph.foma: a finite-state transducer capable of producing morphological tag strings for a given word form when applied up and generating the correct word form for a lemma+tag strings when applied down, for Barguzin Buryat
  • stdrd_morph.foma: a similar FST for standard Buryat
  • translit.foma: an FST to transform strings from the traditional Cyrillic alphabet to an IPA-based notation and backwards
  • harm_check.foma: an FSA ensuring backness and roundedness harmony of the lemma.

The default way to build a transducer from these parts is as follows:

tsl=Grammar .o. fsaHarmCheck .o. Grammar .o. tsl=Grammar.i
translit.foma harm_check.foma dial_morph.foma translit.foma

However, in some cases, the harmony checking FSA can be omitted, e.g. when dealing with loan words where several harmony domains are possible.

For IPA inputs and/or outputs remove the corresponding transliteration transducer(s) from your FST.

Please note that the harmony checker should only be used to check the harmony of a lemma, since several Buryat affixes open a new harmony domain, making different backness and roundedness values occur in one wordform.

The data for the analyzer was collected during fieldwork in Baragkhan, Buryatia, Russia in 2015-2018.

About

Bachelor's Project

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors