Skip to content

feat: Qwen3-ASR stateful CoreML decoder conversion#18

Draft
Alex-Wengg wants to merge 4 commits intomainfrom
feature/qwen3-asr-coreml
Draft

feat: Qwen3-ASR stateful CoreML decoder conversion#18
Alex-Wengg wants to merge 4 commits intomainfrom
feature/qwen3-asr-coreml

Conversation

@Alex-Wengg
Copy link
Contributor

Summary

  • Add convert_stateful_decoder.py for converting Qwen3-ASR-0.6B decoder to a stateful CoreML model with GPU-resident KV cache
  • Include the compiled qwen3_asr_decoder_stateful.mlpackage (LFS-tracked)
  • Add QWEN3_ASR_COREML.md reference doc covering architecture analysis, conversion approach, and integration notes

Test plan

  • Verify stateful decoder loads and runs in FluidAudio (qwen3-asr branch)
  • Confirm 2.9x RTFx and 0.8% WER on LibriSpeech test-clean
  • Validate pre-compiled .mlmodelc from HuggingFace repo matches local conversion

🤖 Generated with Claude Code

Stateful decoder with GPU-resident KV cache for Qwen3-ASR-0.6B.
Includes conversion script and mlpackage output.
- convert-qwen3-asr.py: main CLI for exporting all components
  (audio_encoder, embedding, lm_head, decoder variants)
- convert_decoder_fused.py: fused decoder with lmHead for faster inference
- individual_components.py: wrapper modules for each component
- pyproject.toml: uv dependencies
- README.md: basic usage
- Stateful CoreML decoder with GPU-resident KV cache (1.1x → 2.9x RTFx)
- Fused lmHead into decoder (~8ms/tok savings)
- Benchmark warm-up for consistent timing
- Int8 quantization revisited (899 MB, same quality)
- WER outlier analysis (proper noun limitations)
- MLX comparison benchmarks
@Alex-Wengg Alex-Wengg force-pushed the feature/qwen3-asr-coreml branch from db0bc45 to 42ea7c9 Compare February 3, 2026 14:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant