You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
(2026-04-21) Strategy restructured into Track A/B/C/D: Track A = investigation (3 reproductions), Track B = GLM-OCR-specific fix, Track C = MLX upstream tracking, Track D = close-path decision depending on A.
(2026-04-21) Key finding in code: MacDoc+OCR.swift:60-65 already silently routes config.ocrDefaultModel == "glm-ocr" to Qwen3-VL — compatibility shim already in place, effectively side-stepping OCR: GLM-OCR model produces garbage output via MLXVLM #66 for the default path.
(2026-04-21) Complexity = SDD-warranted (6 open questions: Track A ordering, GLM-OCR deprecation vs fix, MLX-vs-Ollama default, test coverage, pdf-ocr-vs-top-ocr sharing, issue scope). User chose /spectra-discuss.
(2026-03-28 R1) Original diagnosis recommended Step 1-4: test Qwen2.5-VL first, then isolate GLM-OCR specific vs pipeline. Never actioned.
(2026-04-21 R2) Investigation gate added — Track A 3 reproductions must complete before Track B/C/D can proceed.
Blocking
Waiting for /spectra-discuss — 6 open questions, primary being "is scope narrow (GLM-OCR only) or broad (MLX VLM path)". Cannot proceed to /spectra-propose without this alignment + Track A evidence.
Track A requires local Ollama server + multi-GB model downloads; may need separate session with that setup.
Commits (relevant — since issue opened 2026-03-28)
Problem
macdoc ocrpipeline runs end-to-end without crashes, but the OCR output is meaningless token repetitions instead of recognized text.Type
bug
Expected
Running
macdoc ocr /tmp/ocr-test.pngon an image containing "Hello OCR Test 你好世界" should produce readable Markdown text.Actual
Output is repeated garbage tokens:
Or with longer max-tokens:
Context
EZCon/GLM-OCR-8bit-mlx(8-bit quantized GLM-OCR for MLX)OCRPipeline→VLMModelFactory.shared.loadContainer()→MLXLMCommon.generate()GlmOcr.swiftimplementation, so model architecture should be correctEZCon/GLM-OCR-8bit-mlx) itself may be brokenmlx-community/Qwen2.5-VL-3B-Instruct-4bit)Impact
macdoc ocris unusable until this is resolved — the entire OCR feature depends on correct VLM output.Next Steps
mlx-community/Qwen2.5-VL-3B-Instruct-4bitto verify the pipeline works with a different modelOCRPipeline's image/prompt handlingCurrent Status
Phase: diagnosed (round 2)
Last updated: 2026-04-21 by idd-diagnose
Key Decisions
mlx-swift-lm#191broadcast_shapes upstream bug per MLXBackend.swift docstring).MacDoc+OCR.swift:60-65already silently routesconfig.ocrDefaultModel == "glm-ocr"to Qwen3-VL — compatibility shim already in place, effectively side-stepping OCR: GLM-OCR model produces garbage output via MLXVLM #66 for the default path./spectra-discuss.Scope Changes
Blocking
/spectra-discuss— 6 open questions, primary being "is scope narrow (GLM-OCR only) or broad (MLX VLM path)". Cannot proceed to/spectra-proposewithout this alignment + Track A evidence.Commits (relevant — since issue opened 2026-03-28)
1a95969(fix: PageOCRRunner uses OCRCore.backend API (unblocks main build) #84) wip: swap default OCR model to Qwen3-VL + OCRCore backend abstraction — side-stepped OCR: GLM-OCR model produces garbage output via MLXVLM #66 at default level51e8a07feat(config): addconfig ocrsubcommand with host profile support — Ollama host managementf1b975frefactor: simplify PDF pipeline — replace block segmentation with page-level GLM-OCR —pdf ocrstill defaults to GLM-OCR692dc17(refactor: extract pdf-to-latex-swift + ocr-swift to PsychQuant repos; commit Tests/WordToMDTests fixtures (continues #78 pattern) #79) ocr-swift package extracted from macdoc — code now lives in PsychQuant/ocr-swiftRelated Issues
#68(CLOSED) — hf xet download produced corrupted safetensors (referenced in original diagnosis Step 4)#78— note packages extraction (unrelated, same era)#79— ocr-swift extraction (relocated the code)ml-explore/mlx-swift-lm#191— broadcast_shapes crash in VLM inference (actively affects OCR: GLM-OCR model produces garbage output via MLXVLM #66)