The proposal is to integrate a fully local voice assistant to speed up task scheduling for users. The planned architecture is an end-to-end pipeline: audio input → ASR using fine-tuned Wav2Vec 2.0 → text → NLU via BERT-based intent classification and Named Entity Recognition → RL/Transformer-based dialogue policy → task execution engine → response generation.
Voice output would be handled via neural TTS using Tacotron2 and HiFi-GAN, with memory powered by vector embeddings (FAISS) for context retention and personalization. This architecture is designed to run entirely locally with no dependency on external services, with room for further adjustments and refinements.
@Charushi06 could you please assign this task to me?
The proposal is to integrate a fully local voice assistant to speed up task scheduling for users. The planned architecture is an end-to-end pipeline: audio input → ASR using fine-tuned Wav2Vec 2.0 → text → NLU via BERT-based intent classification and Named Entity Recognition → RL/Transformer-based dialogue policy → task execution engine → response generation.
Voice output would be handled via neural TTS using Tacotron2 and HiFi-GAN, with memory powered by vector embeddings (FAISS) for context retention and personalization. This architecture is designed to run entirely locally with no dependency on external services, with room for further adjustments and refinements.
@Charushi06 could you please assign this task to me?