You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
These are some of the features I'd like to eventually have in mimic-core. Anyone is very welcome to contribute adding their ideas to the list or implementing them 😃 .
Good documentation for the Heterogeneous Relation Graph formalism, that lives at src/hrg. It is the core of the core of mimic. This means documenting the following mimic objects and how to manage them: Utterance, Relation, Item, Feature and Value. Use the Edinburgh Speech Tools documentation as a starting point. The Flite manual is also worth reading.
Better SSML parsing. Consider using a standard XML parsing library (written in ANSI C) instead of Flite's inherited custom XML parser to prevent security bugs due to corrupted input. Map all the concepts expressed in the SSML spec to the HRG formalism, by adding each of the SSML tags and attributes to the corresponding relations and items in the Utterance.
Web frontend for correcting phonetic transcriptions, part of speech tagging, syllabification, etc. Study the voice models to visualize the most important linguistic features for each of them, and provide a way for our users to report errors. At least it would be nice to have it for phonetic transcription errors, but potentially it could be used for considering local dialectal variants.
Provide a way for mimic plugins to declare their features: Currently, at the mimic-core initialization all the installed plugins are loaded (code). Instead, each plugin should state its dependencies and what functionality it provides (e.g: English language support or a French female HTS voice at 44kHz...) in some sort of metadata file, so mimic-core knows the features it can use. Maybe the metadata could be in XML format, if the SSML parser also uses a XML library.
Convert all the audio modules to mimic plugins. We can keep them in mimic-core, but we could work as the sox package works where there are libsox-fmt-alsa, libsox-fmt-pulse ... packages that can be installed or not. With this the end user would be able to choose at run time the audio module.
Drop cst_wchar.h, if it is used anywhere.
Allow for phoneset mappings. SSML mandates that we should accept the IPA alphabet for phonetic transcriptions. This means that if the IPA alphabet is given, SSML phonetic transcriptions should be mapped to each of the SAMPA language phonesets.
Avoid global variables. The mimic_core_init() would create a structure with the "state of the app" and pass it along the functions as needed. Mimic context #13
Config file. With those many settings (audio module, language, voice...) having a config file to choose the default language, voice, audio module, soundcard to use... would be great. Even defining custom phonetic transcriptions to easily correct transcription errors. Probably something layered (command line > config at $HOME > systemwide config > mimic defaults) would be needed. https://github.com/benhoyt/inih
Document the voice types: Diphone, clunits, clustergen and hts are the ones supported. Each of those may support a different SSML feature subset and that should be documented. (For instance, some voice types might not allow pitch modification)
Improve voice types. I'd like to try HTS speech synthesis with the World vocoder.
These are some of the features I'd like to eventually have in mimic-core. Anyone is very welcome to contribute adding their ideas to the list or implementing them 😃 .
Good documentation for the Heterogeneous Relation Graph formalism, that lives at
src/hrg. It is the core of the core ofmimic. This means documenting the following mimic objects and how to manage them:Utterance,Relation,Item,FeatureandValue. Use the Edinburgh Speech Tools documentation as a starting point. The Flite manual is also worth reading.Better SSML parsing. Consider using a standard XML parsing library (written in ANSI C) instead of Flite's inherited custom XML parser to prevent security bugs due to corrupted input. Map all the concepts expressed in the SSML spec to the HRG formalism, by adding each of the SSML tags and attributes to the corresponding relations and items in the Utterance.
Web frontend for correcting phonetic transcriptions, part of speech tagging, syllabification, etc. Study the voice models to visualize the most important linguistic features for each of them, and provide a way for our users to report errors. At least it would be nice to have it for phonetic transcription errors, but potentially it could be used for considering local dialectal variants.
Provide a way for mimic plugins to declare their features: Currently, at the mimic-core initialization all the installed plugins are loaded (code). Instead, each plugin should state its dependencies and what functionality it provides (e.g: English language support or a French female HTS voice at 44kHz...) in some sort of metadata file, so mimic-core knows the features it can use. Maybe the metadata could be in XML format, if the SSML parser also uses a XML library.
Convert all the audio modules to mimic plugins. We can keep them in mimic-core, but we could work as the
soxpackage works where there arelibsox-fmt-alsa,libsox-fmt-pulse... packages that can be installed or not. With this the end user would be able to choose at run time the audio module.Drop
cst_wchar.h, if it is used anywhere.Allow for phoneset mappings. SSML mandates that we should accept the IPA alphabet for phonetic transcriptions. This means that if the IPA alphabet is given, SSML phonetic transcriptions should be mapped to each of the SAMPA language phonesets.
Avoid global variables. The
mimic_core_init()would create a structure with the "state of the app" and pass it along the functions as needed. Mimic context #13Config file. With those many settings (audio module, language, voice...) having a config file to choose the default language, voice, audio module, soundcard to use... would be great. Even defining custom phonetic transcriptions to easily correct transcription errors. Probably something layered (command line > config at $HOME > systemwide config > mimic defaults) would be needed. https://github.com/benhoyt/inih
Document the voice types: Diphone, clunits, clustergen and hts are the ones supported. Each of those may support a different SSML feature subset and that should be documented. (For instance, some voice types might not allow pitch modification)
Improve voice types. I'd like to try HTS speech synthesis with the World vocoder.