Problem
Smaller findings from the review that didn't make the bug-fix PR. Listed here so they don't get lost.
1. Double-trigger race in manual summary (app.py → handle_trigger_summary)
The handler checks if summary_pending: return but never sets summary_pending = True before spawning its background thread (it only sets it in the is_processing branch). Two quick clicks on "Generate summary" run two generations concurrently — both fight over the GPU/model-rotation lock.
Fix: set summary_pending = True before starting the thread and clear it in a finally.
2. Misleading log window (app.py → run_pending_summary)
Logs say "last 10 minutes" / "10min" but the actual window is current_time - 300 (5 minutes). Cosmetic, but confusing when debugging summary content.
3. Hardcoded "Speaker 1" when diarization is disabled (app.py → transcribe_and_translate)
With ENABLE_DIARIZATION=False (or when Whisper returns no word chunks), the whole batch is labelled Speaker 1 even in bot mode where caption-based speaker hints exist (_current_bot_speaker / the caption timeline). The partial-transcription path already uses the hint; the final path could fall back to it too.
4. Caption signal can't split overlapping speech (meet-bot/speaker.js) — known limitation
During genuine cross-talk, both speakers' caption blocks update simultaneously and both stay "active", so caption intervals fully overlap. The overlap vote in resolve_speaker_identity then decides on accumulated milliseconds, which is weak evidence. Possible improvement: emit periodic per-speaker caption-activity samples (heartbeats with text-growth deltas) instead of just start/end intervals, and weight the vote by activity density.
5. Already fixed in passing (PR #16)
The Llama-3 summarization prompt ended with <|end_header_id|} (an f-string }} typo) instead of <|end_header_id|> — malformed special token at the generation boundary.
Status
Items 1–4 open; item 5 fixed in PR #16.
Problem
Smaller findings from the review that didn't make the bug-fix PR. Listed here so they don't get lost.
1. Double-trigger race in manual summary (
app.py→handle_trigger_summary)The handler checks
if summary_pending: returnbut never setssummary_pending = Truebefore spawning its background thread (it only sets it in theis_processingbranch). Two quick clicks on "Generate summary" run two generations concurrently — both fight over the GPU/model-rotation lock.Fix: set
summary_pending = Truebefore starting the thread and clear it in afinally.2. Misleading log window (
app.py→run_pending_summary)Logs say "last 10 minutes" / "10min" but the actual window is
current_time - 300(5 minutes). Cosmetic, but confusing when debugging summary content.3. Hardcoded
"Speaker 1"when diarization is disabled (app.py→transcribe_and_translate)With
ENABLE_DIARIZATION=False(or when Whisper returns no word chunks), the whole batch is labelledSpeaker 1even in bot mode where caption-based speaker hints exist (_current_bot_speaker/ the caption timeline). The partial-transcription path already uses the hint; the final path could fall back to it too.4. Caption signal can't split overlapping speech (
meet-bot/speaker.js) — known limitationDuring genuine cross-talk, both speakers' caption blocks update simultaneously and both stay "active", so caption intervals fully overlap. The overlap vote in
resolve_speaker_identitythen decides on accumulated milliseconds, which is weak evidence. Possible improvement: emit periodic per-speaker caption-activity samples (heartbeats with text-growth deltas) instead of just start/end intervals, and weight the vote by activity density.5. Already fixed in passing (PR #16)
The Llama-3 summarization prompt ended with
<|end_header_id|}(an f-string}}typo) instead of<|end_header_id|>— malformed special token at the generation boundary.Status
Items 1–4 open; item 5 fixed in PR #16.