hur_apa_to_hur_orthog.json has a number of lines that could be factored out with "or"s in the context before and after, e.g.:
{"in": "y̓", "out": "’y", "context_before": "e", "context_after": "i", "prevent_feeding": true},
{"in": "y̓", "out": "’y", "context_before": "e", "context_after": "a", "prevent_feeding": true},
{"in": "y̓", "out": "’y", "context_before": "e", "context_after": "ə", "prevent_feeding": true},
{"in": "y̓", "out": "’y", "context_before": "i", "context_after": "a", "prevent_feeding": true},
{"in": "y̓", "out": "’y", "context_before": "i", "context_after": "ə", "prevent_feeding": true},
{"in": "y̓", "out": "’y", "context_before": "a", "context_after": "ə", "prevent_feeding": true},
Note that this would have to be done carefully not to change which contexts are recognized. data/hur.psv clearly shows that "between any pair of vowels" would not be correct.
Also, this mapping is configured rule_ordering: apply-longest-first, but the ordering between rules of the same input length is critical here for correct results, e.g., all the "in": "y̓", cases quoted above are clearly meant to be applied before {"in": "y̓", "out": "y’"},, so as-written would seem more appropriate to me, which would require analyzing the file to make sure everything else is ordered correctly. When we sort, do we use a stable sort? I.e., with apply-longest-first, is as-written a second-order criterion, either by design or by accident?
Next, from hur_orthog_to_hur_apa.json, it would be easy to write a mapping that feeds into sal-apa to connect hur to IPA and ARPABET for cheap. The requirement would be to make sure every "out": in hur_orthog_to_hur_apa.json is either already handled as in "in": in sal_apa_to_ipa.csv or gets mapped to something that does. In the process, we could make the latter more general.
PS: I'm not sure it was correct to use the code hur for the orthog here, since this mapping is explicitly the Island dialect, but hur covers Island, Upriver/Stó꞉lō and Downriver/Musqueam dialects. We will have to split this code if support for the other hur dialects is requested.
hur_apa_to_hur_orthog.jsonhas a number of lines that could be factored out with "or"s in the context before and after, e.g.:Note that this would have to be done carefully not to change which contexts are recognized.
data/hur.psvclearly shows that "between any pair of vowels" would not be correct.Also, this mapping is configured
rule_ordering: apply-longest-first, but the ordering between rules of the same input length is critical here for correct results, e.g., all the"in": "y̓",cases quoted above are clearly meant to be applied before{"in": "y̓", "out": "y’"},, soas-writtenwould seem more appropriate to me, which would require analyzing the file to make sure everything else is ordered correctly. When we sort, do we use a stable sort? I.e., withapply-longest-first, isas-writtena second-order criterion, either by design or by accident?Next, from
hur_orthog_to_hur_apa.json, it would be easy to write a mapping that feeds intosal-apato connecthurto IPA and ARPABET for cheap. The requirement would be to make sure every"out":inhur_orthog_to_hur_apa.jsonis either already handled as in"in":insal_apa_to_ipa.csvor gets mapped to something that does. In the process, we could make the latter more general.PS: I'm not sure it was correct to use the code
hurfor the orthog here, since this mapping is explicitly the Island dialect, buthurcovers Island, Upriver/Stó꞉lō and Downriver/Musqueam dialects. We will have to split this code if support for the other hur dialects is requested.