Wishlist features for mimic-core

These are some of the features I'd like to eventually have in mimic-core. Anyone is very welcome to contribute adding their ideas to the list or implementing them :smiley: .

- [ ] Good documentation for the *Heterogeneous Relation Graph* formalism, that lives at `src/hrg`. It is the core of the core of `mimic`. This means documenting the following mimic objects and how to manage them: `Utterance`, `Relation`, `Item`, `Feature` and `Value`. Use the [Edinburgh Speech Tools documentation](http://zeehio.github.io/speech-tools/estling.html) as a starting point. The [Flite manual](http://www.festvox.org/flite/doc/flite.pdf) is also worth reading.

- [ ] Better SSML parsing. Consider using a standard XML parsing library (written in ANSI C) instead of Flite's inherited custom XML parser to prevent security bugs due to corrupted input. Map all the concepts expressed in the [SSML spec](https://www.w3.org/TR/speech-synthesis/) to the HRG formalism, by adding each of the SSML tags and attributes to the corresponding relations and items in the Utterance.

- [ ] Web frontend for correcting phonetic transcriptions, part of speech tagging, syllabification, etc. Study the voice models to visualize the most important linguistic features for each of them, and provide a way for our users to report errors. At least it would be nice to have it for phonetic transcription errors, but potentially it could be used for considering local dialectal variants.

- [ ] Provide a way for mimic plugins to declare their features: Currently, at the mimic-core initialization all the installed plugins are loaded ([code](https://github.com/MycroftAI/mimic-core/blob/development/src/utils/cst_plugins.c#L272)). Instead, each plugin should state its dependencies and what functionality it provides (e.g: English language support or a French female HTS voice at 44kHz...) in some sort of metadata file, so mimic-core knows the features it can use. Maybe the metadata could be in XML format, if the SSML parser also uses a XML library.

- [ ] Convert all the audio modules to mimic plugins. We can keep them in mimic-core, but we could work as the `sox` package works where there are `libsox-fmt-alsa`, `libsox-fmt-pulse` ... packages that can be installed or not. With this the end user would be able to choose at run time the audio module.

- [ ] Drop `cst_wchar.h`, if it is used anywhere.

- [ ] Allow for phoneset mappings. SSML mandates that we should accept the IPA alphabet for phonetic transcriptions. This means that if the IPA alphabet is given, SSML phonetic transcriptions should be mapped to each of the SAMPA language phonesets.

- [ ] Avoid global variables. The [`mimic_core_init()`](https://github.com/MycroftAI/mimic-core/blob/development/src/synth/mimic.c#L64) would create a structure with the "state of the app" and pass it along the functions as needed. https://github.com/MycroftAI/mimic-core/pull/13

- [ ] Config file. With those many settings (audio module, language, voice...) having a config file to choose the default language, voice, audio module, soundcard to use... would be great. Even defining custom phonetic transcriptions to easily correct transcription errors. Probably something layered (command line > config at $HOME > systemwide config > mimic defaults) would be needed. https://github.com/benhoyt/inih

- [ ] Document the voice types: Diphone, clunits, clustergen and hts are the ones supported. Each of those may support a different SSML feature subset and that should be documented. (For instance, some voice types might not allow pitch modification)

- [ ] Improve voice types. I'd like to try HTS speech synthesis with the World vocoder.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wishlist features for mimic-core #8

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Wishlist features for mimic-core #8

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions