Skip to content

Add microphone processing, media player, screen management and transcript improvements#76

Closed
Mugl3 wants to merge 2 commits into
brownard:masterfrom
Mugl3:feature/voice-satellite-improvements
Closed

Add microphone processing, media player, screen management and transcript improvements#76
Mugl3 wants to merge 2 commits into
brownard:masterfrom
Mugl3:feature/voice-satellite-improvements

Conversation

@Mugl3
Copy link
Copy Markdown

@Mugl3 Mugl3 commented Apr 7, 2026

Summary

Microphone audio processing

  • Hardware noise suppressor, AGC, and AEC can be individually toggled in settings
  • Configurable audio source (Voice Recognition, Mic, Voice Communication, Unprocessed)
  • Software gain (0–24 dB) for far-field pick-up — parallel un-boosted detector guards against clipping
  • Wake word probability cutoff and sliding window size overrides

Media player

  • Full media player entity exposed to Home Assistant
  • Album artwork (via Coil), play/pause, skip prev/next, track title/artist, seek progress bar

Timer improvements

  • Cancel and +1 minute buttons on each timer card

Screen management

  • Removed the 60-second wake lock timeout — screen stays on for the full duration of every voice interaction
  • Configurable idle screen timeout (0–300 s, default 30 s) before the lock is released after an interaction ends
  • Screen rotation option (FULL_SENSOR) so the device works upside-down when charging

Voice transcript display

  • STT result from VOICE_ASSISTANT_STT_END, TTS text from VOICE_ASSISTANT_TTS_START and incremental streaming via VOICE_ASSISTANT_INTENT_PROGRESS
  • Transcript card on the home screen shows "You: …" / "Ava: …", updating live during LLM streaming
  • Eye-icon toggle in the top bar to show/hide the card

This was mostly AI assisted development. These changes have made this Ava wife approved in my home. Especially the improvements on the mic pickup. I am using this branch for my home now, as it has a much improved feel overall. Feel free to critique, ask for changes or close the PR if you prefer. Cheers
attached some screenshots of how the app looks with these changes
Screenshot_20260408-091607
Screenshot_20260408-091611
Screenshot_20260408-091640
Screenshot_20260408-091518
Screenshot_20260408-091534
Screenshot_20260408-091539

This may close #48

Gustav and others added 2 commits April 8, 2026 09:09
…r improvements

- Microphone: add noise suppressor, automatic gain control, and acoustic echo
  canceler support; configurable audio source selection and software gain (0-24 dB)
  for far-field pick-up; probability cutoff and sliding window size overrides for
  wake word detection sensitivity
- Media player: expose a full media player entity to Home Assistant with album
  artwork (via Coil), playback controls (play/pause, skip prev/next), track title/
  artist, and a seek progress indicator
- Timers: add cancel and +1 minute action buttons directly on timer cards

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Screen management:
- Remove the 60-second wake lock timeout so the screen stays on for the
  entire duration of a voice interaction (Listening/Processing/Responding)
- Add a configurable idle timeout (0-300 s, default 30 s): after an
  interaction ends the screen stays on for this long before being released
- Add a screen rotation option (FULL_SENSOR) so the device can be used
  upside down when charging with the cable at the top

Voice transcript:
- Extract STT result text from VOICE_ASSISTANT_STT_END events
- Extract TTS response text from VOICE_ASSISTANT_TTS_START events;
  also handle incremental streaming text via VOICE_ASSISTANT_INTENT_PROGRESS
- Display a transcript card on the home screen showing "You: …" and "Ava: …"
  updating in real time as the LLM streams its response
- Add an eye-icon toggle in the top bar to show/hide the transcript card

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Owner

@brownard brownard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi. Thanks for this, I appreciate it, there's some good stuff in here but it's a bit too large and wholesale for me to consider in one go. I have added some initial comments, but had to stop as there's just too much. If you can break the PR into separate PRs for individual features I can properly review them, e.g. a separate PR for each of

  • Media playback display
  • Microphone processing (but see feedback)
  • Timer improvements
  • etc

Comment on lines +69 to +81
val sessionId = audioRecord.audioSessionId
if (enableNoiseSuppressor && NoiseSuppressor.isAvailable()) {
noiseSuppressor = NoiseSuppressor.create(sessionId)?.also { it.enabled = true }
Timber.d("NoiseSuppressor enabled: ${noiseSuppressor != null}")
}
if (enableAutomaticGainControl && AutomaticGainControl.isAvailable()) {
automaticGainControl = AutomaticGainControl.create(sessionId)?.also { it.enabled = true }
Timber.d("AutomaticGainControl enabled: ${automaticGainControl != null}")
}
if (enableAcousticEchoCanceler && AcousticEchoCanceler.isAvailable()) {
acousticEchoCanceler = AcousticEchoCanceler.create(sessionId)?.also { it.enabled = true }
Timber.d("AcousticEchoCanceler enabled: ${acousticEchoCanceler != null}")
}
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you confirmed this actually works on any devices? I have tried implementing something similar but it had no effect on the 2 (Samsung and Lenovo) devices I tried it on? I could only get the effects to work by changing the audio source of the microphone to VOICE_COMMUNICATION and setting the AudioManager mode to MODE_IN_COMMUNICATION, however this had the caveat of reducing audio playback quality, a known issue on Android devices.

Comment on lines +196 to +215
private fun fetchArtworkFromItunes(artist: String?, title: String?) {
if (artist == null || title == null) return
artworkJob?.cancel()
artworkJob = artworkScope.launch {
runCatching {
val query = URLEncoder.encode("$artist $title", "UTF-8")
val json = JSONObject(URL("https://itunes.apple.com/search?term=$query&entity=song&limit=1&media=music").readText())
val results = json.getJSONArray("results")
if (results.length() > 0) {
val artwork = results.getJSONObject(0)
.optString("artworkUrl100")
.replace("100x100bb", "600x600bb") // Upscale to 600px
.takeIf { it.isNotEmpty() }
if (artwork != null) {
Timber.d("iTunes artwork: $artwork")
_artworkUri.value = artwork
}
}
}.onFailure { Timber.w(it, "iTunes artwork lookup failed") }
}
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is outside the scope of this project, I want to avoid 'phoning' out to external services, everything should be fully local to fit the ethos of Home Assistant.

Comment on lines +203 to +210
override fun startWakeWordListening() {
_isWakeWordListening = true
if (!_muted.value && !_isDucked) mediaPlayer.volume = effectiveMediaVolume()
}

override fun stopWakeWordListening() {
_isWakeWordListening = false
if (!_muted.value && !_isDucked) mediaPlayer.volume = effectiveMediaVolume()
Copy link
Copy Markdown
Owner

@brownard brownard Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain the logic for this? I can't seem to find where its actually called? The comment on the interface definition appears to indicate it will be called when the satellite is connected, but given that the satellite will only ever be playing audio when its connected (it must be connected to even play media) it would always be enabled (if its actually called anywhere?), making it superfluous?

Edit: I've seen the call was added in the follow on commit, ideally changes like this need to be self-contained in their own PRs otherwise its too easy to miss stuff when its thrown in with lots of other changes. The above comment still stands regarding it seemingly being superfluous.

@brownard
Copy link
Copy Markdown
Owner

I am closing this now given there has been no further response. Please feel free to open other pull requests in future.

@brownard brownard closed this May 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add microphone audio preprocessing

2 participants