Shughni language morphological analyzator and generator made with lexd and twol.
hfsttoolslexdcompilerpython(only for testing and metrics scripts, FST binaries are compiled without python)
Compile all .hfstol binaries
make allCompile any specific FST binary, all possible binaries are listed at the end
make sgh_analyze_stem_segm_cyr.hfstCompile all .hfst binaries
make all_hfstRun tests (depends on python)
make testEvaluate all metrics (depends on python, requirements.txt packages)
make metricsThe Makefile was tested on Debian, other platforms are not guaranteed to be compatible. Tested with
hfst 3.16.0lexd 1.3.1python 3.11.2
All files have name with the following format:
<direction>_<lemma type>_<segmentation>_<script>
Where
directioncan be 'gen' or 'analyze'- 'gen':
дарйо<n>><pl>:дарйойен, tagged form -> surface form - 'analyze':
дарйойен:дарйо<n>><pl>, surface form -> tagged form
- 'gen':
lemma typecan be 'stem' or 'rulem'. It sets tagged form's main word type- 'stem':
дарйойен:дарйо<n>><pl>, a shughni stem will be used as a lemma - 'rulem':
дарйойен:река<n>><pl>, a russian lemma will be used as a lemma
- 'stem':
segmentationcan be 'segm' or 'word'. It sets surface form morpheme border behavior- 'segm':
дарйо<n>><pl>:дарйо>йен, surface forms are segmented: they have>separator between all the morphemes - 'word':
дарйо<n>><pl>:дарйойен, surface forms are not segmented
- 'segm':
scriptcan be 'cyr' or 'lat', dictates surface form's script- 'cyr':
дарйо<n>><pl>:дарйойен - 'lat':
дарйо<n>><pl>:daryoyen
- 'cyr':
A .hfstol version is also available for every possible .hfst file.
| File | Input | Output |
|---|---|---|
| Generators | ||
| sgh_gen_stem_segm_cyr | дарйо<n>><pl> |
дарйо>йен |
| sgh_gen_stem_word_cyr | дарйо<n>><pl> |
дарйойен |
| sgh_gen_rulem_segm_cyr | река<n>><pl> |
дарйо>йен |
| sgh_gen_rulem_word_cyr | река<n>><pl> |
дарйойен |
| sgh_gen_stem_segm_lat | дарйо<n>><pl> |
daryo>yen |
| sgh_gen_stem_word_lat | дарйо<n>><pl> |
daryoyen |
| sgh_gen_rulem_segm_lat | река<n>><pl> |
daryo>yen |
| sgh_gen_rulem_word_lat | река<n>><pl> |
daryoyen |
| Analyzers | ||
| sgh_analyze_stem_segm_cyr | дарйо>йен |
дарйо<n>><pl> |
| sgh_analyze_stem_word_cyr | дарйойен |
дарйо<n>><pl> |
| sgh_analyze_rulem_segm_cyr | дарйо>йен |
река<n>><pl> |
| sgh_analyze_rulem_word_cyr | дарйойен |
река<n>><pl> |
| sgh_analyze_stem_segm_lat | daryo>yen |
дарйо<n>><pl> |
| sgh_analyze_stem_word_lat | daryoyen |
дарйо<n>><pl> |
| sgh_analyze_rulem_segm_lat | daryo>yen |
река<n>><pl> |
| sgh_analyze_rulem_word_lat | daryoyen |
река<n>><pl> |
Author: Elen Kartina (elenkartina@proton.me)
Scientific supervisor: George Moroz
Advisor: Max Melenchenko