Multipron align#57
Merged
Merged
Conversation
When the cached sysroot path is missing, set CMAKE_OSX_SYSROOT from xcrun. Before add_subdirectory(src), clear cached FindBLAS, FindLAPACK, and MATH paths that reference a removed SDK or a different SDK tree than the active sysroot. Remove duplicate find_library(MATH_LIBRARY) in src/CMakeLists.txt.
Return -1 from align_frame on bad alignment instead of asserting. On failure, emit a reference outsent line when possible and continue. Log how many outsent fallbacks were written at the end of a run.
Add CFG_MULTIPRON to sphinx_train.cfg: when yes, Baum-Welch and related steps use BASE_DIR/multipron_align/EXPT.multipron.transcription. Extend GetLists and the untied-HMM, lattice, and BW slaves accordingly. Add multipron_align.pl and multipron_align.py to merge dicts, run sphinx3_align, and write that transcript (beam from CFG_FORCE_ALIGN_BEAM or default 1e-308).
Python reduces the dictionary to words in the transcript vocabulary while keeping pronunciation variants. Perl script is invoked from the optional 00a.vocab_dict Makefile step; uses python like other SphinxTrain drivers. Define CFG_VOCAB_DICT and CFG_VOCAB_DICTIONARY in sphinx_train.cfg.
Keeps optional local design notes out of the repository.
Insert training stage 21.multipron_align after CI to run multipron_align.pl when CFG_MULTIPRON is set and not no. Use multipron transcript for CD and later only when the file exists (Util::ShouldUseMultipronTranscript). Default CFG_MULTIPRON to yes in sphinx_train.cfg; omit or unset variable keeps legacy behavior without stage 21. Stage 21 driver: import dirname for lib path; do not pass an extra etc path to multipron_align.pl. sphinxtrain decodes POSIX wait status from os.system() so failed Perl stages fail the shell on Unix.
Describe default multipron alignment after CI, sphinx3_align, and how to disable via CFG_MULTIPRON. Fix Acknowledgments heading spelling.
Contributor
Author
|
trigger ci |
CMake installs SCRIPTDIRS including 22.ci_hmm_multipron; the directory was listed but not in the tree on CI, so cmake --install failed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Multipron alignment, robust builds, optional vocab dict
Summary
Multipron is on by default: after CI (stage 20), stage 21 runs multipron force alignment (
sphinx3_alignviamultipron_align) so CD and later stages can use$CFG_BASE_DIR/multipron_align/$CFG_EXPTNAME.multipron.transcriptionwhen that file exists. SetCFG_MULTIPRON = 'no'inetc/sphinx_train.cfgto skip stage 21 and use only the reference transcript everywhere.Stage 21 driver and driver UX: the Perl wrapper imports
dirnamefor the lib path, does not pass a redundantetcargument intomultipron_align.pl(which would confusemultipron_align.py).scripts/sphinxtraindecodes POSIXos.system()wait status so failed Perl stages return a non-zero shell exit code on Unix.Also tightens CMake on macOS when Xcode or SDK paths change, hardens
sphinx3_align(no assert on bad alignment; fallback outsent), and adds optional vocabulary-restricted dictionary (00a.vocab_dict) /cmusphinx.vocab_dict.Docs:
README.mddescribes default multipron behavior and how to disable it.User-visible behavior
CFG_MULTIPRON: defaultyes.nodisables automatic multipron alignment and multipron transcripts. Training uses the multipron file only when it exists (after stage 21); CI still usesCFG_TRANSCRIPTFILEuntil then.sphinx3_align: alignment failures do not abort the whole run where fallback applies.CFG_VOCAB_DICT/CFG_VOCAB_DICTIONARY: optional vocabulary-restricted dict step; defaults documented insphinx_train.cfg.Migration
Projects that want the old single-transcript-only behavior: set
CFG_MULTIPRON = 'no'inetc/sphinx_train.cfg.Testing
sphinx3_align.if(APPLE); Ubuntu builds should behave as before.