-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Hi there,
Thank you for your work on langram it seems very cool and promising and thank you for responding at odilia-app/odilia#21 (comment). I wanted to potentially use langram as a replacement for lingua-rs in this PR odilia-app/ssip-client-async#22 and wanted to clarify a few things
- Do you have any recommendations for how I could adapt this library to use ISO language codes? (If that is possible)
- For SSIP on Linux, you need to tell it which language to speech for the given text based on either a 2/3 letter language code
- I saw that
One language can be written in multiple scripts, so it will be detected as a different ScriptLanguagewhich makes sense, I guess I'm just not sure how to map it to ISO without enumerating all of them and mapping them
- Would it be possible to provide any short example programs in this repo to show idiomatic API usage?
- I suppose I was curious as well if you are using this library for other projects you are working on and if there are projects I could look at
- Does this library support detecting multiple languages in the same block of text and returning the indices where they start / stop?
- i.e.
你好 Hello->Chinese: 1-2, English 3-7
- i.e.
My background context
I am trying to get automatic language detection for speech dispatcher, which is the speech synthesis / tts server for Linux. This is relied upon by those who are blind or use a screen reader. So in other words, I'm writing a library which would take in a sentence that could be multilingual, have it tell me which languages each part of it is spoken in.
I am essentially looking for a solution that can disambiguate very small amounts of text at a time if possible. i.e. if a blind user were learning a language they could use my library to have their TTS server speech multiple languages intelligently.
Lingua did not work for me since it failed to disambiguate languages in short sentences, even in trivial cases like 你好 Hello