Correctly find the IPA language code for sal-apa by joanise · Pull Request #790 · EveryVoiceTTS/EveryVoice

joanise · 2026-04-24T20:53:44Z

PR Goal?

Correctly find the IPA language code for sal-apa

Fixes?

Fixes #789

Feedback sought?

regular review

Priority?

high

Tests added?

yes

How to test?

run through the wizard with data in sal-apa, and see it pass by the language selection step (exception dumps there before this PR).

Confidence?

high

Version change?

no, but we're due

Related PRs?

NRC-ILT/g2p#489

semanticdiff-com · 2026-04-24T20:53:47Z

Review changes with

Changed Files

File	Status
everyvoice/text/phonemizer.py	28% smaller
.github/workflows/matrix-tests.yml	0% smaller
.github/workflows/test.yml	0% smaller
everyvoice/model/aligner/wav2vec2aligner	0% smaller
everyvoice/tests/test_custom_g2p.py	0% smaller

github-actions · 2026-04-24T20:59:09Z

CLI load time: 0:00.20
Pull Request HEAD: 341983224c5f1c555886c61b39cbc46b04fd6d88
Imports that take more than 0.1 s:
import time: self [us] | cumulative | imported package

codecov · 2026-04-24T20:59:26Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.99%. Comparing base (459f4f1) to head (3419832).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #790      +/-   ##
==========================================
+ Coverage   82.97%   82.99%   +0.02%     
==========================================
  Files          47       47              
  Lines        4158     4163       +5     
  Branches      611      612       +1     
==========================================
+ Hits         3450     3455       +5     
  Misses        576      576              
  Partials      132      132

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

roedoejet

Just a couple comments for now. not requesting these changes necessarily, but let's discuss next week.

roedoejet · 2026-04-24T21:25:55Z

+        sal_apa_g2p = get_g2p_engine("sal-apa")
+        self.assertEqual(sal_apa_g2p("ac"), list("ats"))
+
+        # but iku-sro goes to iku-sro-ipa, not iku-ipa


Hm, why does this not go to iku-ipa?

I guess we kind of assumed this *-ipa convention that isn't enforced

g2p show-mappings | grep iku will tell you that we have iku->iku-equiv->iku-ipa->eng-ipa as the path from syllabics, and the path iku-sro->iku-sri-ipa->iku-sro-ipa->eng-ipa for romanized, and those two paths are just not connected. I don't know why we made the choice, but since we never had an official policy or way to declare "this is the IPA code for language X", whoever wrote the mapping thought that was intuitive to them.

Actually, the git logs tell me that's from back in 2019, with a commit log "first attempt at consolidating langs", so I'm going to guess this might have been an artefact of the merging process. We could change things in g2p, and probably we should add a function to the API that returns the IPA code for any non-IPA code that leads to IPA in a way or another. But my problem would remain that any such solution would be future only, it would not be compatible with older versions of g2p, hence my solution here.

roedoejet · 2026-04-24T21:29:14Z

+        if lang_id + "-ipa" in LANGS_NETWORK.nodes:
+            return lang_id + "-ipa"
+        else:
+            return lang_id[:3] + "-ipa"


hm, what about lang_id.split('-')[0] + "-ipa" ? Do we enforce the initial code will be 3 letters somewhere (I mean, ISO639-3 stipulates this but I don't think we actually test this anywhere.

NRC-ILT/g2p#489 addresses this question. I was hoping you'd review both PRs together. I'm not attached to any given solution, as long as we pick one and apply it identically to the two PRs.

But actually, splitting on dash is a nice idea, more forward looking that taking the first 3 characters, so yeah, I'll make this change in both PRs.

using the technique documented and tested in NRC-ILT/g2p#489 Fixes #789

I was keeping v2 for its speed, but v3.1 uses a cache correctly so it is fast again.

roedoejet

Approving, but please rebase https://github.com/EveryVoiceTTS/wav2vec2aligner/tree/dev.ej/ci-fix-ffmpeg onto https://github.com/EveryVoiceTTS/wav2vec2aligner/tree/main

roedoejet · 2026-04-27T22:02:01Z

needs to rebase onto main

joanise force-pushed the dev.ej/fix-sal-apa branch from 794b117 to c8cd90e Compare April 24, 2026 20:54

joanise changed the title ~~Dev.ej/fix sal apa~~ Correctly find the IPA language code for sal-apa Apr 24, 2026

joanise force-pushed the dev.ej/fix-sal-apa branch from c8cd90e to 7e59f4e Compare April 24, 2026 21:10

joanise requested a review from roedoejet April 24, 2026 21:12

roedoejet reviewed Apr 24, 2026

View reviewed changes

joanise force-pushed the dev.ej/fix-sal-apa branch 2 times, most recently from 8c5b7ad to 0fba6df Compare April 27, 2026 19:12

joanise and others added 4 commits April 27, 2026 16:17

fix: correctly find IPA lang code for sal-apa and oji-syl (#790)

c1eba30

using the technique documented and tested in NRC-ILT/g2p#489 Fixes #789

fix(ci): use setup-ffmpeg v3.1 as v2 seems definitely broken

36bb508

I was keeping v2 for its speed, but v3.1 uses a cache correctly so it is fast again.

chore: update wav2vec2a submodule for fixed ffpmeg in ci

fa9a46a

ci: licensecheck no longer needs to ignore removed pysdtw dep

3419832

joanise force-pushed the dev.ej/fix-sal-apa branch from 0fba6df to 3419832 Compare April 27, 2026 20:20

roedoejet self-requested a review April 27, 2026 20:58

roedoejet approved these changes Apr 27, 2026

View reviewed changes

Comment thread everyvoice/model/aligner/wav2vec2aligner

Copy link
Copy Markdown

Member

roedoejet Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs to rebase onto main

joanise merged commit 3419832 into main Apr 28, 2026
13 checks passed

joanise deleted the dev.ej/fix-sal-apa branch April 28, 2026 12:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correctly find the IPA language code for sal-apa#790

Correctly find the IPA language code for sal-apa#790
joanise merged 4 commits into
mainfrom
dev.ej/fix-sal-apa

joanise commented Apr 24, 2026 •

edited

Loading

Uh oh!

semanticdiff-com Bot commented Apr 24, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 24, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Apr 24, 2026 •

edited

Loading

Uh oh!

roedoejet left a comment

Uh oh!

roedoejet Apr 24, 2026

Uh oh!

roedoejet Apr 24, 2026

Uh oh!

joanise Apr 27, 2026

Uh oh!

roedoejet Apr 24, 2026

Uh oh!

joanise Apr 27, 2026

Uh oh!

joanise Apr 27, 2026

Uh oh!

roedoejet left a comment

Uh oh!

roedoejet Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

joanise commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Goal?

Fixes?

Feedback sought?

Priority?

Tests added?

How to test?

Confidence?

Version change?

Related PRs?

Uh oh!

semanticdiff-com Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

roedoejet left a comment

Choose a reason for hiding this comment

Uh oh!

roedoejet Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

roedoejet Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

joanise Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

roedoejet Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

joanise Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

joanise Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

roedoejet left a comment

Choose a reason for hiding this comment

Uh oh!

roedoejet Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

joanise commented Apr 24, 2026 •

edited

Loading

semanticdiff-com Bot commented Apr 24, 2026 •

edited

Loading

github-actions Bot commented Apr 24, 2026 •

edited

Loading

codecov Bot commented Apr 24, 2026 •

edited

Loading