feat(aura): restore long-term memory thats broken from recent changes, persona stability, and async prewarm#34
Open
feat(aura): restore long-term memory thats broken from recent changes, persona stability, and async prewarm#34
Conversation
Introduce a Provider Abstraction Layer and settings management: add provider registry and provider adapters (OpenAI-compatible and Anthropic), with retry/fallback logic and provider inference. - Add settings_service (Supabase-backed) and FastAPI /api/v1/settings routes to get/update settings and API keys; wire settings router into app. Refactor LLMService to delegate to provider_registry. - Enhance memory_service to support OpenAI, OpenRouter and local Ollama embeddings and detect Ollama runtime. - Update prompter to respect admin/system_prompt from settings. - Add dashboard ApiKeys UI, test fixtures, and minor model/asset tweaks. Also add new env variables to .env.example for Anthropic, Groq and Ollama.
Expose real-time streaming and async LLM support across backend and frontend. Backend: - chat API: add SSE streamingResponse for streaming conversations, include identity and stream flags, emit emotion then incremental text deltas, persist interactions asynchronously. - make emotion and generation nodes async and integrate immediate user-interaction persistence; scrub bracketed tokens before storing. - LLMService: convert generate to async and add stream helper that proxies provider streams. - Provider registry: make generate async, add stream() to route streaming providers, use asyncio.to_thread for sync provider.generate with retry/backoff, and use async sleeps. - MemoryService: improved embedding provider logging, allow nullable assistant_text on add_interaction, persist user/assistant chunks safely, and add get_long_term_memories for RAG context. - SettingsService: add simple in-memory caching for settings and API keys with TTL and invalidation on updates. - Providers/openai_compat: handle streaming chunks more robustly and filter internal 'reasoning' tokens. - General: better error logging and stream error handling. Frontend/UI: - Updated AvatarRenderer with numerous animation/behavior fixes (mouth lock during tongue, blink/expression handling, scale adjustments) and smaller style/format cleanups. - CallOverlay: refactor to use shared getOrCreateIdentity, accept conversationId, improve LiveKit connect flow, UI redesign for centered avatar and controls, and refactors for robustness. - ChatFeed: visual redesign for empty state and message list, improved tool display and bubble styling. - package.json: bump pixi.js to ^8.17.1. - Added new dashboard components and user utility (Presence, SlideOver, lib/user.js) and other UI tweaks. Voice agent: - Multiple voice-agent scripts updated (agent, tts, token server, memory, vtube controller, deps) to align with backend changes and improve runtime behavior. Why: these changes enable low-latency streaming responses from LLMs to the dashboard, improve persistence and memory retrieval for RAG, reduce blocking sync calls, and refresh the UI/avatar experience for live interactions.
Move TTS warmup off the main init path by running warmup in a background thread and switching to a thread-safe asyncio.Event (_tts_ready_event). prewarm now schedules a non-blocking background warmup and falls back to set the event on failure; voice_session awaits the event with a 60s timeout to avoid hanging. Track and manage avatar expression tasks in aura_tts: maintain expr_tasks, add done callbacks to remove completed tasks, handle cancellation by resetting neutral, and cancel/await pending tasks at the end to ensure the avatar returns to neutral reliably. Tighten VTube reset logic: add debug logging, explicitly toggle off active expressions, reset injected parameters including common 'sticking' params (TongueOut, MouthOpen, EyeOpenLeft/Right), and clear injected state. Also simplify/adjust bilingual emotion keyword lists and clarify comments. These changes reduce blocking, prevent stuck expressions, and make resets more robust.
Collaborator
Author
|
Laid out groundwork for #27 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Restores long-term memory recall, stabilizes AURA's expressive persona, and implements an asynchronous prewarm mechanism to prevent worker initialization timeouts. This PR also synchronizes all environment and dependency files across Windows and macOS.
Changes
httpx,pyvts) and updated ai-service/requirements.txt.Testing
TimeoutError.