Add microphone processing, media player, screen management and transcript improvements#76
Add microphone processing, media player, screen management and transcript improvements#76Mugl3 wants to merge 2 commits into
Conversation
…r improvements - Microphone: add noise suppressor, automatic gain control, and acoustic echo canceler support; configurable audio source selection and software gain (0-24 dB) for far-field pick-up; probability cutoff and sliding window size overrides for wake word detection sensitivity - Media player: expose a full media player entity to Home Assistant with album artwork (via Coil), playback controls (play/pause, skip prev/next), track title/ artist, and a seek progress indicator - Timers: add cancel and +1 minute action buttons directly on timer cards Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Screen management: - Remove the 60-second wake lock timeout so the screen stays on for the entire duration of a voice interaction (Listening/Processing/Responding) - Add a configurable idle timeout (0-300 s, default 30 s): after an interaction ends the screen stays on for this long before being released - Add a screen rotation option (FULL_SENSOR) so the device can be used upside down when charging with the cable at the top Voice transcript: - Extract STT result text from VOICE_ASSISTANT_STT_END events - Extract TTS response text from VOICE_ASSISTANT_TTS_START events; also handle incremental streaming text via VOICE_ASSISTANT_INTENT_PROGRESS - Display a transcript card on the home screen showing "You: …" and "Ava: …" updating in real time as the LLM streams its response - Add an eye-icon toggle in the top bar to show/hide the transcript card Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
brownard
left a comment
There was a problem hiding this comment.
Hi. Thanks for this, I appreciate it, there's some good stuff in here but it's a bit too large and wholesale for me to consider in one go. I have added some initial comments, but had to stop as there's just too much. If you can break the PR into separate PRs for individual features I can properly review them, e.g. a separate PR for each of
- Media playback display
- Microphone processing (but see feedback)
- Timer improvements
- etc
| val sessionId = audioRecord.audioSessionId | ||
| if (enableNoiseSuppressor && NoiseSuppressor.isAvailable()) { | ||
| noiseSuppressor = NoiseSuppressor.create(sessionId)?.also { it.enabled = true } | ||
| Timber.d("NoiseSuppressor enabled: ${noiseSuppressor != null}") | ||
| } | ||
| if (enableAutomaticGainControl && AutomaticGainControl.isAvailable()) { | ||
| automaticGainControl = AutomaticGainControl.create(sessionId)?.also { it.enabled = true } | ||
| Timber.d("AutomaticGainControl enabled: ${automaticGainControl != null}") | ||
| } | ||
| if (enableAcousticEchoCanceler && AcousticEchoCanceler.isAvailable()) { | ||
| acousticEchoCanceler = AcousticEchoCanceler.create(sessionId)?.also { it.enabled = true } | ||
| Timber.d("AcousticEchoCanceler enabled: ${acousticEchoCanceler != null}") | ||
| } |
There was a problem hiding this comment.
Have you confirmed this actually works on any devices? I have tried implementing something similar but it had no effect on the 2 (Samsung and Lenovo) devices I tried it on? I could only get the effects to work by changing the audio source of the microphone to VOICE_COMMUNICATION and setting the AudioManager mode to MODE_IN_COMMUNICATION, however this had the caveat of reducing audio playback quality, a known issue on Android devices.
| private fun fetchArtworkFromItunes(artist: String?, title: String?) { | ||
| if (artist == null || title == null) return | ||
| artworkJob?.cancel() | ||
| artworkJob = artworkScope.launch { | ||
| runCatching { | ||
| val query = URLEncoder.encode("$artist $title", "UTF-8") | ||
| val json = JSONObject(URL("https://itunes.apple.com/search?term=$query&entity=song&limit=1&media=music").readText()) | ||
| val results = json.getJSONArray("results") | ||
| if (results.length() > 0) { | ||
| val artwork = results.getJSONObject(0) | ||
| .optString("artworkUrl100") | ||
| .replace("100x100bb", "600x600bb") // Upscale to 600px | ||
| .takeIf { it.isNotEmpty() } | ||
| if (artwork != null) { | ||
| Timber.d("iTunes artwork: $artwork") | ||
| _artworkUri.value = artwork | ||
| } | ||
| } | ||
| }.onFailure { Timber.w(it, "iTunes artwork lookup failed") } | ||
| } |
There was a problem hiding this comment.
This is outside the scope of this project, I want to avoid 'phoning' out to external services, everything should be fully local to fit the ethos of Home Assistant.
| override fun startWakeWordListening() { | ||
| _isWakeWordListening = true | ||
| if (!_muted.value && !_isDucked) mediaPlayer.volume = effectiveMediaVolume() | ||
| } | ||
|
|
||
| override fun stopWakeWordListening() { | ||
| _isWakeWordListening = false | ||
| if (!_muted.value && !_isDucked) mediaPlayer.volume = effectiveMediaVolume() |
There was a problem hiding this comment.
Can you explain the logic for this? I can't seem to find where its actually called? The comment on the interface definition appears to indicate it will be called when the satellite is connected, but given that the satellite will only ever be playing audio when its connected (it must be connected to even play media) it would always be enabled (if its actually called anywhere?), making it superfluous?
Edit: I've seen the call was added in the follow on commit, ideally changes like this need to be self-contained in their own PRs otherwise its too easy to miss stuff when its thrown in with lots of other changes. The above comment still stands regarding it seemingly being superfluous.
|
I am closing this now given there has been no further response. Please feel free to open other pull requests in future. |
Summary
Microphone audio processing
Media player
Timer improvements
Screen management
Voice transcript display
VOICE_ASSISTANT_STT_END, TTS text fromVOICE_ASSISTANT_TTS_STARTand incremental streaming viaVOICE_ASSISTANT_INTENT_PROGRESSThis was mostly AI assisted development. These changes have made this Ava wife approved in my home. Especially the improvements on the mic pickup. I am using this branch for my home now, as it has a much improved feel overall. Feel free to critique, ask for changes or close the PR if you prefer. Cheers






attached some screenshots of how the app looks with these changes
This may close #48