Skip to content

Wishlist features for mimic-core #8

@zeehio

Description

@zeehio

These are some of the features I'd like to eventually have in mimic-core. Anyone is very welcome to contribute adding their ideas to the list or implementing them 😃 .

  • Good documentation for the Heterogeneous Relation Graph formalism, that lives at src/hrg. It is the core of the core of mimic. This means documenting the following mimic objects and how to manage them: Utterance, Relation, Item, Feature and Value. Use the Edinburgh Speech Tools documentation as a starting point. The Flite manual is also worth reading.

  • Better SSML parsing. Consider using a standard XML parsing library (written in ANSI C) instead of Flite's inherited custom XML parser to prevent security bugs due to corrupted input. Map all the concepts expressed in the SSML spec to the HRG formalism, by adding each of the SSML tags and attributes to the corresponding relations and items in the Utterance.

  • Web frontend for correcting phonetic transcriptions, part of speech tagging, syllabification, etc. Study the voice models to visualize the most important linguistic features for each of them, and provide a way for our users to report errors. At least it would be nice to have it for phonetic transcription errors, but potentially it could be used for considering local dialectal variants.

  • Provide a way for mimic plugins to declare their features: Currently, at the mimic-core initialization all the installed plugins are loaded (code). Instead, each plugin should state its dependencies and what functionality it provides (e.g: English language support or a French female HTS voice at 44kHz...) in some sort of metadata file, so mimic-core knows the features it can use. Maybe the metadata could be in XML format, if the SSML parser also uses a XML library.

  • Convert all the audio modules to mimic plugins. We can keep them in mimic-core, but we could work as the sox package works where there are libsox-fmt-alsa, libsox-fmt-pulse ... packages that can be installed or not. With this the end user would be able to choose at run time the audio module.

  • Drop cst_wchar.h, if it is used anywhere.

  • Allow for phoneset mappings. SSML mandates that we should accept the IPA alphabet for phonetic transcriptions. This means that if the IPA alphabet is given, SSML phonetic transcriptions should be mapped to each of the SAMPA language phonesets.

  • Avoid global variables. The mimic_core_init() would create a structure with the "state of the app" and pass it along the functions as needed. Mimic context #13

  • Config file. With those many settings (audio module, language, voice...) having a config file to choose the default language, voice, audio module, soundcard to use... would be great. Even defining custom phonetic transcriptions to easily correct transcription errors. Probably something layered (command line > config at $HOME > systemwide config > mimic defaults) would be needed. https://github.com/benhoyt/inih

  • Document the voice types: Diphone, clunits, clustergen and hts are the ones supported. Each of those may support a different SSML feature subset and that should be documented. (For instance, some voice types might not allow pitch modification)

  • Improve voice types. I'd like to try HTS speech synthesis with the World vocoder.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions