Screen Commentator captures your screen on a schedule, sends frames to a local vision‑capable LLM, and reads a short persona‑based reaction aloud.
- Scheduled screen capture (single shot or multi‑frame burst)
- Multiple narration personas (Gen Z hype, noir detective, nature doc, etc.)
- Optional random TTS voices
- Global hotkey to pause/resume (even when unfocused)
- Anti‑repetition retry and sampling penalties
- Language: Python
- Capture: mss, Pillow
- LLM: LM Studio (OpenAI‑compatible endpoint)
- TTS: pocket_tts + sounddevice
- Hotkeys: keyboard (optional)
Enable audio
demo.mp4
- Python ≥3.9
- LM Studio installed and running a vision‑capable model with the OpenAI‑compatible API enabled
pip install --upgrade pillow mss requests pocket-tts sounddevice keyboard- Download and install LM Studio from https://lmstudio.ai/
- LM Studio is used to fetch models and run a local API server
- Recommended: Gemma 3 12B QAT (requires ~7GB VRAM)
- Lower VRAM alternatives: Qwen3 VL 4B or 8B
Edit the config at the top of ScreenCommentator.py:
API_URLMODELINTERVAL_SECFRAME_COUNTandFRAME_INTERVAL_SEC(how many screenshots to capture and how they are spaced)MAX_IMAGES_PER_REQUEST(caps how many frames are sent to the LLM)MAX_SCREEN_W/MAX_SCREEN_HPERSONASandDEFAULT_PERSONA
How multi‑frame capture works:
- The script captures
FRAME_COUNTscreenshots per cycle. - It waits
FRAME_INTERVAL_SECseconds between captures. - If more frames are captured than allowed, it sends up to
MAX_IMAGES_PER_REQUESTevenly‑spaced frames.
Example: 10 screenshots over ~10 seconds:
FRAME_COUNT = 10
FRAME_INTERVAL_SEC = 10.0 / 9.0
python ScreenCommentator.pyExample:
python ScreenCommentator.py --persona gen_z_hype
[2026-02-04 18:43:30] Hotkey 'f6' toggles pause/resume
[2026-02-04 18:43:30] Loading TTS model (first run may download weights)...
[2026-02-04 18:43:30] Capturing screenshot
[2026-02-04 18:43:31] TTS model loaded
...
[2026-02-04 18:43:34] Capturing screenshot
PowerShell? Bet, that's cooked!
[2026-02-04 18:43:35] Capturing screenshot
[2026-02-04 18:43:36] Starting audio playback
...
[2026-02-04 18:44:32] Capturing screenshot
[2026-02-04 18:44:33] Capturing screenshot
BBC news? No cap!
PS C:\dev\LiveChatLLM> python ScreenCommentator.py --persona witty
...
The GitHub repository's README file is overflowing with details—apparently, this "Screen Commentator" has more features than my dating profile.- Open LM Studio
- Navigate to the Developer tab
- Toggle Start server (top left)
python ScreenCommentator.py --persona gen_z_hype
python ScreenCommentator.py --voice-prompt jean
python ScreenCommentator.py --random-voice --voice-prompts alba,jean,azelma
python ScreenCommentator.py --hotkey f6
python ScreenCommentator.py --single-shot--personaSelects a narration persona.--voice-promptSets the base TTS voice prompt.--random-voiceRandomizes the voice per iteration.--voice-promptsComma‑separated list used for random voice.--hotkeyGlobal hotkey to pause/resume.--single-shotCapture one screenshot per interval instead of a burst.
- Screenshots are not saved to disk; they are encoded in memory and sent as data URLs.
- If you hit LM Studio image decode errors, reduce
MAX_SCREEN_W/Hor the number of images sent.
If you run into LM Studio errors like "failed to process image" or "no memory slot," try these:
- In LM Studio Advanced settings, lower Evaluation Batch Size (e.g., 512 → 128 or 64).
- Set Max Concurrent Predictions to
1to avoid parallel VRAM spikes. - Reduce GPU Offload layers a bit if you’re still running out of memory.
- In this app, lower
MAX_IMAGES_PER_REQUEST, use--single-shot, and/or reduceMAX_SCREEN_W/H.
- Fork it
- Create a branch (
git checkout -b feature/XYZ) - Commit (
git commit -m "feat: add XYZ") - Push (
git push origin feature/XYZ) - Open a MR
- Thanks to kyutai-labs/pocket-tts for TTS.
- Thanks to EposNix/TwitchChatLLM for inspiration and starting point.
MIT © Robert McElhinney