Skip to content

Vcza5/opencode-auto-ocr-image

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

opencode-auto-ocr-image

CI License: MIT

Intelligent image/PDF handler for OpenCode. When you paste an image into chat, this plugin automatically:

  • Multimodal models (GPT-5, Claude 4, Gemini 3, etc.) — keeps the image as a native FilePart so the model sees it directly
  • Text-only models — runs built-in OCR via tesseract.js and replaces the image with extracted text
  • PDFs — passes through to PDF-capable models or provides the file path as fallback

Zero system dependencies. tesseract.js runs in pure JS/WASM — no Tesseract binary, no Python, no Docker. The language model (~5 MB) auto-downloads from CDN on first OCR call and caches for subsequent use.

How it works

User pastes image
  ↓
chat.message hook fires
  ├─ Multimodal model? → Keep FilePart (model sees the image directly)
  └─ Text-only model?  → OCR via tesseract.js → replace with text
  ↓
messages.transform hook fires (before LLM send)
  ├─ Multimodal model? → Re-inject FilePart from saved temp file
  └─ Text-only model?  → Clean up any stray FileParts

Installation

Add to your opencode.json:

{
  "plugin": ["opencode-auto-ocr-image"]
}

Restart OpenCode. The plugin activates automatically — no configuration needed.

First run notes

  • On the first image paste with a text-only model, tesseract.js will download the OCR language data (~5 MB for Chinese + English). This takes a few seconds and happens once.
  • Model capabilities are auto-detected from OpenCode's provider API. A built-in fallback list covers known multimodal models (Claude, GPT-5, Gemini, etc.).

Usage

Just paste images as you normally would. The plugin handles everything transparently:

You: [paste screenshot.png]
  → Multimodal model: image sent as FilePart (native vision)
  → Text-only model:  image OCR'd, text replaces the image

Diagnostic command

Type !test-ocr in chat to see current status:

=== [auto-ocr-image diagnostic] ===
Model: claude-sonnet-4-5
Capabilities: image=true, pdf=false
OCR engine: tesseract.js (chi_sim+eng)
OCR status: ready
Cached models: 47
Temp dir: /tmp/opencode/ocr-data
=== done ===

Configuration

Set the OCR_LANG environment variable to override OCR languages:

# English only
OCR_LANG=eng

# Japanese
OCR_LANG=jpn

# Multiple languages (improves accuracy for mixed content)
OCR_LANG=chi_sim+eng+jpn

Default: chi_sim+eng (Chinese Simplified + English)

Data files

Path Purpose
$TMPDIR/opencode/ocr-data/ Saved pasted images for re-injection
$TMPDIR/opencode/auto-ocr-image.log Debug log (auto-trimmed at 50 KB)

All files are stored in the OS temp directory — nothing persists across reboots.

Limitations

  • PDFs are not OCR'd (only the file path is passed through for text-only models)
  • First OCR call is slower (~5-10s) due to model download
  • OCR accuracy depends on image quality and tesseract.js capabilities

How model detection works

  1. Provider API: queries OpenCode's client.config.providers() for per-model capability flags
  2. Override list: known multimodal models are hardcoded as fallback when the API doesn't report capabilities
  3. Graceful degradation: text-only models always get OCR'd text instead of raw images

License

MIT

About

Intelligent image/PDF handler for OpenCode — keeps FileParts for multimodal models, auto-OCR via tesseract.js for text-only models. Zero system dependencies, auto-downloads language data.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors