Skip to content

robmcelhinney/screen-commentator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Screen Commentator

Screen Commentator captures your screen on a schedule, sends frames to a local vision‑capable LLM, and reads a short persona‑based reaction aloud.

🚀 Features

  • Scheduled screen capture (single shot or multi‑frame burst)
  • Multiple narration personas (Gen Z hype, noir detective, nature doc, etc.)
  • Optional random TTS voices
  • Global hotkey to pause/resume (even when unfocused)
  • Anti‑repetition retry and sampling penalties

🔧 Tech Stack

  • Language: Python
  • Capture: mss, Pillow
  • LLM: LM Studio (OpenAI‑compatible endpoint)
  • TTS: pocket_tts + sounddevice
  • Hotkeys: keyboard (optional)

Demo

Enable audio

demo.mp4

⚙️ Getting Started

Prerequisites

  • Python ≥3.9
  • LM Studio installed and running a vision‑capable model with the OpenAI‑compatible API enabled

Install

pip install --upgrade pillow mss requests pocket-tts sounddevice keyboard

Install LM Studio

  • Download and install LM Studio from https://lmstudio.ai/
  • LM Studio is used to fetch models and run a local API server

Choose and Download a VLM Model

  • Recommended: Gemma 3 12B QAT (requires ~7GB VRAM)
  • Lower VRAM alternatives: Qwen3 VL 4B or 8B

Configure

Edit the config at the top of ScreenCommentator.py:

  • API_URL
  • MODEL
  • INTERVAL_SEC
  • FRAME_COUNT and FRAME_INTERVAL_SEC (how many screenshots to capture and how they are spaced)
  • MAX_IMAGES_PER_REQUEST (caps how many frames are sent to the LLM)
  • MAX_SCREEN_W / MAX_SCREEN_H
  • PERSONAS and DEFAULT_PERSONA

How multi‑frame capture works:

  • The script captures FRAME_COUNT screenshots per cycle.
  • It waits FRAME_INTERVAL_SEC seconds between captures.
  • If more frames are captured than allowed, it sends up to MAX_IMAGES_PER_REQUEST evenly‑spaced frames.

Example: 10 screenshots over ~10 seconds:

FRAME_COUNT = 10
FRAME_INTERVAL_SEC = 10.0 / 9.0

Run

python ScreenCommentator.py

Example:

python ScreenCommentator.py --persona gen_z_hype
[2026-02-04 18:43:30] Hotkey 'f6' toggles pause/resume
[2026-02-04 18:43:30] Loading TTS model (first run may download weights)...
[2026-02-04 18:43:30] Capturing screenshot
[2026-02-04 18:43:31] TTS model loaded
...
[2026-02-04 18:43:34] Capturing screenshot
PowerShell? Bet, that's cooked!
[2026-02-04 18:43:35] Capturing screenshot
[2026-02-04 18:43:36] Starting audio playback
...
[2026-02-04 18:44:32] Capturing screenshot
[2026-02-04 18:44:33] Capturing screenshot
BBC news? No cap!

PS C:\dev\LiveChatLLM> python ScreenCommentator.py --persona witty
...
The GitHub repository's README file is overflowing with details—apparently, this "Screen Commentator" has more features than my dating profile.

Start the LM Studio Server

  1. Open LM Studio
  2. Navigate to the Developer tab
  3. Toggle Start server (top left)

Useful Flags

python ScreenCommentator.py --persona gen_z_hype
python ScreenCommentator.py --voice-prompt jean
python ScreenCommentator.py --random-voice --voice-prompts alba,jean,azelma
python ScreenCommentator.py --hotkey f6
python ScreenCommentator.py --single-shot

Command Line Options

  • --persona Selects a narration persona.
  • --voice-prompt Sets the base TTS voice prompt.
  • --random-voice Randomizes the voice per iteration.
  • --voice-prompts Comma‑separated list used for random voice.
  • --hotkey Global hotkey to pause/resume.
  • --single-shot Capture one screenshot per interval instead of a burst.

🛠️ Notes

  • Screenshots are not saved to disk; they are encoded in memory and sent as data URLs.
  • If you hit LM Studio image decode errors, reduce MAX_SCREEN_W/H or the number of images sent.

🧠 Low VRAM Tips

If you run into LM Studio errors like "failed to process image" or "no memory slot," try these:

  • In LM Studio Advanced settings, lower Evaluation Batch Size (e.g., 512 → 128 or 64).
  • Set Max Concurrent Predictions to 1 to avoid parallel VRAM spikes.
  • Reduce GPU Offload layers a bit if you’re still running out of memory.
  • In this app, lower MAX_IMAGES_PER_REQUEST, use --single-shot, and/or reduce MAX_SCREEN_W/H.

🤝 Contributing

  1. Fork it
  2. Create a branch (git checkout -b feature/XYZ)
  3. Commit (git commit -m "feat: add XYZ")
  4. Push (git push origin feature/XYZ)
  5. Open a MR

🙏 Acknowledgements

📜 License

MIT © Robert McElhinney

About

Local desktop tool that captures your screen on a schedule, sends frames to a vision‑capable LLM, and speaks short persona‑based narration aloud.

Topics

Resources

License

Stars

Watchers

Forks

Contributors

Languages