Skip to content

Normalization pattern for Komi transcription system #4

@nikopartanen

Description

@nikopartanen

The transcription convention (which still needs to be described) needs a normalization model. Basically the goal should be either or both to:

  1. Normalize the Iźva transcription to standard Komi
  2. Normalize the Iźva transcription to exact phonemic level

The intermediate result doesn't need to be stored anywhere necessarily, but we need it as an intermediate stage in different places. FST would improve if the input was closed to standard Komi, and for different phonetic work we discussed this kind of phonemic level could be useful, since then searching different phonemes would be easier.

Should this be implemented as GT preprocessor? Would already existing standard Komi > Molodcov conversion be enough?

Example from ö ~ e alteration

One can probably argue that ö ~ e -distinction is not present in suffixal positions, and is generally rare in non-initial syllables, but we still have cases like:

Висер

And because of this we can't just turn all non-initial syllable e to ö. Normally changes like:

чолэм > чолӧм

Would work 100% time.

Now the question is whether all stems containing non-initial syllable e (I don't think there are verb stems with this property) are present in GT dictionaries or if they could be. Their number is finite anyway, and maybe thereby special rule could be deviced around them.

There are maybe ~15 rules like these that would turn Iźva close to quite normal standard Komi.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions