Screen Commentator

Screen Commentator captures your screen on a schedule, sends frames to a local vision‑capable LLM, and reads a short persona‑based reaction aloud.

🚀 Features

Scheduled screen capture (single shot or multi‑frame burst)
Multiple narration personas (Gen Z hype, noir detective, nature doc, etc.)
Optional random TTS voices
Global hotkey to pause/resume (even when unfocused)
Anti‑repetition retry and sampling penalties

🔧 Tech Stack

Language: Python
Capture: mss, Pillow
LLM: LM Studio (OpenAI‑compatible endpoint)
TTS: pocket_tts + sounddevice
Hotkeys: keyboard (optional)

Demo

Enable audio

demo.mp4

⚙️ Getting Started

Prerequisites

Python ≥3.9
LM Studio installed and running a vision‑capable model with the OpenAI‑compatible API enabled

Install

pip install --upgrade pillow mss requests pocket-tts sounddevice keyboard

Install LM Studio

Download and install LM Studio from https://lmstudio.ai/
LM Studio is used to fetch models and run a local API server

Choose and Download a VLM Model

Recommended: Gemma 3 12B QAT (requires ~7GB VRAM)
Lower VRAM alternatives: Qwen3 VL 4B or 8B

Configure

Edit the config at the top of ScreenCommentator.py:

API_URL
MODEL
INTERVAL_SEC
FRAME_COUNT and FRAME_INTERVAL_SEC (how many screenshots to capture and how they are spaced)
MAX_IMAGES_PER_REQUEST (caps how many frames are sent to the LLM)
MAX_SCREEN_W / MAX_SCREEN_H
PERSONAS and DEFAULT_PERSONA

How multi‑frame capture works:

The script captures FRAME_COUNT screenshots per cycle.
It waits FRAME_INTERVAL_SEC seconds between captures.
If more frames are captured than allowed, it sends up to MAX_IMAGES_PER_REQUEST evenly‑spaced frames.

Example: 10 screenshots over ~10 seconds:

FRAME_COUNT = 10
FRAME_INTERVAL_SEC = 10.0 / 9.0

Run

python ScreenCommentator.py

Example:

python ScreenCommentator.py --persona gen_z_hype
[2026-02-04 18:43:30] Hotkey 'f6' toggles pause/resume
[2026-02-04 18:43:30] Loading TTS model (first run may download weights)...
[2026-02-04 18:43:30] Capturing screenshot
[2026-02-04 18:43:31] TTS model loaded
...
[2026-02-04 18:43:34] Capturing screenshot
PowerShell? Bet, that's cooked!
[2026-02-04 18:43:35] Capturing screenshot
[2026-02-04 18:43:36] Starting audio playback
...
[2026-02-04 18:44:32] Capturing screenshot
[2026-02-04 18:44:33] Capturing screenshot
BBC news? No cap!

PS C:\dev\LiveChatLLM> python ScreenCommentator.py --persona witty
...
The GitHub repository's README file is overflowing with details—apparently, this "Screen Commentator" has more features than my dating profile.

Start the LM Studio Server

Open LM Studio
Navigate to the Developer tab
Toggle Start server (top left)

Useful Flags

python ScreenCommentator.py --persona gen_z_hype
python ScreenCommentator.py --voice-prompt jean
python ScreenCommentator.py --random-voice --voice-prompts alba,jean,azelma
python ScreenCommentator.py --hotkey f6
python ScreenCommentator.py --single-shot

Command Line Options

--persona Selects a narration persona.
--voice-prompt Sets the base TTS voice prompt.
--random-voice Randomizes the voice per iteration.
--voice-prompts Comma‑separated list used for random voice.
--hotkey Global hotkey to pause/resume.
--single-shot Capture one screenshot per interval instead of a burst.

🛠️ Notes

Screenshots are not saved to disk; they are encoded in memory and sent as data URLs.
If you hit LM Studio image decode errors, reduce MAX_SCREEN_W/H or the number of images sent.

🧠 Low VRAM Tips

If you run into LM Studio errors like "failed to process image" or "no memory slot," try these:

In LM Studio Advanced settings, lower Evaluation Batch Size (e.g., 512 → 128 or 64).
Set Max Concurrent Predictions to 1 to avoid parallel VRAM spikes.
Reduce GPU Offload layers a bit if you’re still running out of memory.
In this app, lower MAX_IMAGES_PER_REQUEST, use --single-shot, and/or reduce MAX_SCREEN_W/H.

🤝 Contributing

Fork it
Create a branch (git checkout -b feature/XYZ)
Commit (git commit -m "feat: add XYZ")
Push (git push origin feature/XYZ)
Open a MR

🙏 Acknowledgements

Thanks to kyutai-labs/pocket-tts for TTS.
Thanks to EposNix/TwitchChatLLM for inspiration and starting point.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
LICENSE		LICENSE
README.md		README.md
ScreenCommentator.py		ScreenCommentator.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Screen Commentator

🚀 Features

🔧 Tech Stack

Demo

⚙️ Getting Started

Prerequisites

Install

Install LM Studio

Choose and Download a VLM Model

Configure

Run

Start the LM Studio Server

Useful Flags

Command Line Options

🛠️ Notes

🧠 Low VRAM Tips

🤝 Contributing

🙏 Acknowledgements

📜 License

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Screen Commentator

🚀 Features

🔧 Tech Stack

Demo

⚙️ Getting Started

Prerequisites

Install

Install LM Studio

Choose and Download a VLM Model

Configure

Run

Start the LM Studio Server

Useful Flags

Command Line Options

🛠️ Notes

🧠 Low VRAM Tips

🤝 Contributing

🙏 Acknowledgements

📜 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages