Inspired by the original code
A small desktop utility to select a region of the screen, run OCR via OpenRouter vision models, and copy the result to the clipboard.
For the latest release details, see docs/releases/2.6.0.md.
Architecture overview, directory structure, prerequisites, and Linux CLI details are documented in docs/README.md.
-
Create a
.envfile in the same directory as the executable with the following required keys:OPENROUTER_API_KEY=MODEL=(vision-capable model, e.g.,qwen/qwen3-vl-235b-a22b-instruct)- Optional:
OPENROUTER_API_KEY_FILE=(default key-file path is/run/secrets/api_keys/openrouter)
Alternatively, you can set each of these as an environment variable.
-
Alternatively, you can point the app to a config file via an environment variable:
- Set
SCREEN_OCR_LLMto the full path of a.env-format file. If.envis not found in the executable directory, the app will load configuration from this path.
- Set
-
You can also add these optional keys to your
.envfile to customize behavior:HOTKEY=Ctrl+Alt+q- Supported modifiers:
Ctrl,Alt,Shift,Win/Cmd/Super - Supported keys:
A-Z,0-9,F1-F24, and common special keys - Example:
HOTKEY=F13
- Supported modifiers:
ENABLE_FILE_LOGGING=truePROVIDERS=providerA,providerBOCR_DEADLINE_SEC=20(default is 20 seconds if unset)DEFAULT_MODE=rectangle(accepted:rect,rectangle,lasso; default is rectangle)SINGLEINSTANCE_PORT_START=49500SINGLEINSTANCE_PORT_END=49550
Detailed source resolution and precedence rules are documented in docs/README.md under Runtime Configuration Sources and Precedence.
-
Using Go directly:
- On Windows (no console window):
go build -ldflags "-H=windowsgui" -o screen-ocr-llm.exe ./src/main - On Linux/macOS:
go build -o screen-ocr-llm ./src/main
- On Windows (no console window):
-
Using the Makefile (for a Windows GUI binary):
make build-windows
This creates a
screen-ocr-llm.exefile that runs without a console window.
The application offers two primary modes of operation:
This is the standard mode for continuous, everyday use. The application runs quietly in the background, accessible via a system tray icon and a global hotkey.
- How to run: Execute the binary without any command-line flags.
./screen-ocr-llm.exe
- Functionality:
- Manages a system tray icon with "About" and "Exit" options.
- Listens for a global hotkey (default:
Ctrl+Alt+q) to start a screen capture. - In the selection overlay, drag for rectangle mode, press
Spaceto toggle lasso mode, and pressEscto cancel. - In lasso mode, complete the selection by releasing the mouse near the start point to close the loop.
- After a region is selected, the extracted text is automatically copied to your clipboard and shown in a brief popup notification. Lasso captures are still sent as rectangular images, with pixels outside the lasso filled solid white.
- It ensures that only one instance of the application is running at any time.
This mode is intended for single, on-demand captures initiated from the command line or within scripts.
- How to run: Execute the binary using the
--run-onceflag../screen-ocr-llm.exe --run-once
- Supported arguments:
--run-once--api-key-path <path>--default-mode <rect|rectangle|lasso>- Legacy compatibility: single-dash long forms (
-run-once,-api-key-path,-default-mode)
- Optional key path override:
./screen-ocr-llm.exe --run-once --api-key-path /run/secrets/api_keys/openrouter_key
- Optional initial selection mode override:
./screen-ocr-llm.exe --run-once --default-mode lasso
- Functionality:
- Bypasses the system tray and immediately prompts you to select a region on the screen (same rectangle/lasso controls as resident mode).
- Copies the resulting text to the clipboard.
- Exits silently as soon as the capture and OCR process is finished.
The two modes are designed to work together intelligently to prevent conflicts and ensure smooth operation.
- When you start a new capture with
--run-once, the application first checks if a resident instance is already running. - If a resident instance is found, the
--run-onceprocess delegates the capture request to the running instance and exits. The resident application then takes over, presenting the screen selection UI. - If
--api-key-pathis provided on a delegated--run-onceclient, the client still delegates and the resident instance configuration remains authoritative. - If
--default-modeis provided on a delegated--run-onceclient, the client still delegates and the resident instance configuration remains authoritative. - If no resident instance is active, the
--run-onceprocess will handle the capture itself in a temporary standalone mode before exiting. - Startup validation: On launch, the app performs a minimal LLM connectivity check (1-token ping). If it fails, a blocking error dialog is shown and the app exits. In
--run-once, if a resident is detected and the request is delegated, the client does not ping. - High-DPI: The app enables DPI awareness and uses the full virtual screen for overlays and screenshots to work correctly on scaled multi-monitor setups.
- Logging: Controlled by
ENABLE_FILE_LOGGING. Whenfalse, logs are suppressed; whentrue, logs are written toscreen_ocr_debug.log(size-rotated). In GUI builds, stdout/stderr are hidden, so enable file logging for diagnostics.
This delegation mechanism ensures a stable and predictable user experience by guaranteeing that only one screen selection process can be active at a time.
- Logging: Controlled by
ENABLE_FILE_LOGGING. Whenfalse, logs are suppressed; whentrue, logs are written toscreen_ocr_debug.logwith size-based rotation. In GUI builds, stdout/stderr are hidden, so enable file logging for diagnostics. - Single Instance: The tool uses a loopback TCP port to enforce a single resident instance and to manage delegation from
--run-onceclients. - Configuration precedence: See
Configuration and Precedenceabove for.env, CLI, and delegation behavior.