Intelligent image/PDF handler for OpenCode. When you paste an image into chat, this plugin automatically:
- ✅ Multimodal models (GPT-5, Claude 4, Gemini 3, etc.) — keeps the image as a native
FilePartso the model sees it directly - ✅ Text-only models — runs built-in OCR via tesseract.js and replaces the image with extracted text
- ✅ PDFs — passes through to PDF-capable models or provides the file path as fallback
Zero system dependencies. tesseract.js runs in pure JS/WASM — no Tesseract binary, no Python, no Docker. The language model (~5 MB) auto-downloads from CDN on first OCR call and caches for subsequent use.
User pastes image
↓
chat.message hook fires
├─ Multimodal model? → Keep FilePart (model sees the image directly)
└─ Text-only model? → OCR via tesseract.js → replace with text
↓
messages.transform hook fires (before LLM send)
├─ Multimodal model? → Re-inject FilePart from saved temp file
└─ Text-only model? → Clean up any stray FileParts
Add to your opencode.json:
{
"plugin": ["opencode-auto-ocr-image"]
}Restart OpenCode. The plugin activates automatically — no configuration needed.
- On the first image paste with a text-only model, tesseract.js will download the OCR language data (~5 MB for Chinese + English). This takes a few seconds and happens once.
- Model capabilities are auto-detected from OpenCode's provider API. A built-in fallback list covers known multimodal models (Claude, GPT-5, Gemini, etc.).
Just paste images as you normally would. The plugin handles everything transparently:
You: [paste screenshot.png]
→ Multimodal model: image sent as FilePart (native vision)
→ Text-only model: image OCR'd, text replaces the image
Type !test-ocr in chat to see current status:
=== [auto-ocr-image diagnostic] ===
Model: claude-sonnet-4-5
Capabilities: image=true, pdf=false
OCR engine: tesseract.js (chi_sim+eng)
OCR status: ready
Cached models: 47
Temp dir: /tmp/opencode/ocr-data
=== done ===
Set the OCR_LANG environment variable to override OCR languages:
# English only
OCR_LANG=eng
# Japanese
OCR_LANG=jpn
# Multiple languages (improves accuracy for mixed content)
OCR_LANG=chi_sim+eng+jpnDefault: chi_sim+eng (Chinese Simplified + English)
| Path | Purpose |
|---|---|
$TMPDIR/opencode/ocr-data/ |
Saved pasted images for re-injection |
$TMPDIR/opencode/auto-ocr-image.log |
Debug log (auto-trimmed at 50 KB) |
All files are stored in the OS temp directory — nothing persists across reboots.
- PDFs are not OCR'd (only the file path is passed through for text-only models)
- First OCR call is slower (~5-10s) due to model download
- OCR accuracy depends on image quality and tesseract.js capabilities
- Provider API: queries OpenCode's
client.config.providers()for per-model capability flags - Override list: known multimodal models are hardcoded as fallback when the API doesn't report capabilities
- Graceful degradation: text-only models always get OCR'd text instead of raw images
MIT