Skip to content

Refactor the Halkomelem mapping and connect it to sal-apa for path to ipa and arpabet #422

@joanise

Description

@joanise

hur_apa_to_hur_orthog.json has a number of lines that could be factored out with "or"s in the context before and after, e.g.:

    {"in": "y̓", "out": "’y", "context_before": "e", "context_after": "i", "prevent_feeding": true},
    {"in": "y̓", "out": "’y", "context_before": "e", "context_after": "a", "prevent_feeding": true},
    {"in": "y̓", "out": "’y", "context_before": "e", "context_after": "ə", "prevent_feeding": true},
    {"in": "y̓", "out": "’y", "context_before": "i", "context_after": "a", "prevent_feeding": true},
    {"in": "y̓", "out": "’y", "context_before": "i", "context_after": "ə", "prevent_feeding": true},
    {"in": "y̓", "out": "’y", "context_before": "a", "context_after": "ə", "prevent_feeding": true},

Note that this would have to be done carefully not to change which contexts are recognized. data/hur.psv clearly shows that "between any pair of vowels" would not be correct.

Also, this mapping is configured rule_ordering: apply-longest-first, but the ordering between rules of the same input length is critical here for correct results, e.g., all the "in": "y̓", cases quoted above are clearly meant to be applied before {"in": "y̓", "out": "y’"},, so as-written would seem more appropriate to me, which would require analyzing the file to make sure everything else is ordered correctly. When we sort, do we use a stable sort? I.e., with apply-longest-first, is as-written a second-order criterion, either by design or by accident?

Next, from hur_orthog_to_hur_apa.json, it would be easy to write a mapping that feeds into sal-apa to connect hur to IPA and ARPABET for cheap. The requirement would be to make sure every "out": in hur_orthog_to_hur_apa.json is either already handled as in "in": in sal_apa_to_ipa.csv or gets mapped to something that does. In the process, we could make the latter more general.

PS: I'm not sure it was correct to use the code hur for the orthog here, since this mapping is explicitly the Island dialect, but hur covers Island, Upriver/Stó꞉lō and Downriver/Musqueam dialects. We will have to split this code if support for the other hur dialects is requested.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions