Skip to content

chore: merge upstream MNN 3.4.1#1

Merged
pruthvikar merged 358 commits intocleanup/remove-unused-apps-and-projectsfrom
chore/update-to-3.4.1
Mar 26, 2026
Merged

chore: merge upstream MNN 3.4.1#1
pruthvikar merged 358 commits intocleanup/remove-unused-apps-and-projectsfrom
chore/update-to-3.4.1

Conversation

@pruthvikar
Copy link
Copy Markdown
Member

What

  • Merges alibaba/MNN tag 3.4.1 (357 upstream commits since fork point) into the cleanup branch
  • Resolves merge conflicts by preserving the cleanup deletion of apps/ and project/ directories

Why

Upstream MNN 3.4.0 and 3.4.1 contain significant performance and stability improvements relevant to our inference workloads:

Performance (3.4.0)

  • Metal TensorAPI support — significant perf boost on M-series chips
  • CPU MatMul/LayerNorm/broadcast optimization + ThreadPool overhead reduction
  • KleidiAI fp32 depthwise conv kernels — faster ARM CPU inference (iOS/Android)
  • Loop Op GPU optimization — pure Metal/OpenCL path, no CPU fallback
  • RISC-V Vector (RVV) optimization — comprehensive intrinsic optimization

Stability (3.4.1)

  • Metal INT8/INT4 Conv2D fix — correct quantized inference on Metal
  • MetalConvolutionDepthwise Clone support
  • 7 memory safety fixes in shape/execution operators (OOB access, zero-stride, duplicate index)

How

  • Shallow-cloned the cleanup branch, fetched upstream 3.4.1 tag
  • git merge 3.4.1 — all 333 conflicts were modify/delete in apps/ and project/ (files deleted by cleanup, modified by upstream)
  • Resolved by git rm on all conflicted files + removed any new upstream additions to those dirs

Testing

  • Merge conflict resolution verified (0 remaining conflicts)
  • Full build verification will be done downstream in mnn-sys and engine-rust PRs

🤖 Generated with Claude Code

wangzhaode and others added 30 commits December 23, 2025 16:37
[BugFix] fix a bug in compute mGroupWithComputeRate

GitOrigin-RevId: 0a30b5c040bc34aff1de94e7fa571ebb8f2c20fa
Feature/smallmodel opt

GitOrigin-RevId: 5610add6e64c6d49f8b984d0d744c85f206f2be7
Title: [Metal Feature] check UI Status for metal command commit

本次代码评审主要增加了对执行状态的检查和错误处理,并引入了新的日志打印方式以提高调试和监控能力。
Link: https://code.alibaba-inc.com/AliNN/AliNNPrivate/codereview/24965986
GitOrigin-RevId: b7ad051c324c1b7d4aa231fc062f2f5d8e7f7a0f
Title: [Bugfix:CI] Fix duplicate msg when sync to github.

这段代码在 `copybara_sync.sh` 脚本中新增了一个功能,用于检测并跳过从 GitHub 导入的 commits,通过识别包含 `GitOrigin-RevId` 的 commit 来确定上次同步点,并从该点之后的第一个非导入 commit 开始进行同步。
Link: https://code.alibaba-inc.com/AliNN/AliNNPrivate/codereview/25060238
GitOrigin-RevId: ca65f11f52c1b76a826cbdc260a063d1467a8f35
Title: [feature:opencl]opencl支持将权重存储到单个文件中

本次代码评审的主要内容是对OpenCL后端进行了优化,引入了`MmapPool`以支持内存映射池管理,并在多个执行单元中增加了对内存映射错误的检查与处理,同时调整了部分数据传输和转换逻辑以提高性能和稳定性。
Link: https://code.alibaba-inc.com/AliNN/AliNNPrivate/codereview/24702294
GitOrigin-RevId: 1a83d5da23cbb011d0cf522cdc6d49f5778c0999
GitOrigin-RevId: 59c693e6995609611fb7197f2288cc929370bdd6
opt(RVV): Optimize blitter functions with intrinsics

GitOrigin-RevId: 880fb7a3a8e93edb188ef4804f24cd88ea29c76c
opt(RVV): Optimize resize functions with intrinsics

GitOrigin-RevId: c9e9ac1362e1613acb11e924268b4e1284c9f142
opt(RVV): Optimize top1 functions with intrinsics

GitOrigin-RevId: fc3cad1eae2ea3c93fe34b8bfec58a2f7201de9c
opt(RVV): Optimize Softmax and ReluWithSlopeChannel with intrinsics

GitOrigin-RevId: bb4fb7cd6ac13a67582556277c91d38a958f2da8
opt(RVV): Optimize conv and strassen functions with intrinsics

GitOrigin-RevId: 4c8794c50d00acf88baee977d20694a3f9b8b1cf
opt(RVV): Optimize max and min float functions with intrinsics

GitOrigin-RevId: d246089d9de5602aeb58e91d1169923d58ed9712
opt(RVV): Optimize core math and stride functions with intrinsics

GitOrigin-RevId: 767fdd24db8ead2a04086edab37f3785dd0e80df
…nctions

opt(RVV): Optimize transpose functions with intrinsics

GitOrigin-RevId: e643e7c3e1cda978161cc3921355cb4d1d3eec69
opt(RVV): Optimize pack and unpack functions with intrinsics

GitOrigin-RevId: d786e4f5f353fa1e29319783f0bf7c3d2df00eb7
fix(diffusion): simplify export logic and fix dynamic axes

GitOrigin-RevId: 24b3e6fb92a32c193260fc39d82e70e70abba762
mnn lib库自动化build脚本

GitOrigin-RevId: cb0a6d77c72cf6c04cd256355dd5989460821ceb
Add a compile option and macro to default enable kleidiAI

GitOrigin-RevId: 96323077925a4788927649b4d262dc3d8288a66d
Title: [Doc:Update] update dingtalk in README.

本次代码评审的主要改动是对README文件中的钉钉群信息进行了更新,包括群号、状态以及删除了一些过时的信息。
Link: https://code.alibaba-inc.com/AliNN/AliNNPrivate/codereview/25029869
GitOrigin-RevId: da3eed28af8d3cf35cd2578f76ad40d75f00158b
[BugFix] fix a bug in compute mGroupWithComputeRate

GitOrigin-RevId: 5d5b47cfb2c6278818dde17c4efc8ffbbb9b779a
Feature/smallmodel opt

GitOrigin-RevId: 99ed7ba5bc1eefac17236785e3fadde5d0f372e8
Title: [Metal Feature] check UI Status for metal command commit

本次代码评审主要增加了对执行状态的检查和错误处理,并引入了新的日志打印方式以提高调试和监控能力。
Link: https://code.alibaba-inc.com/AliNN/AliNNPrivate/codereview/24965986
GitOrigin-RevId: 7b85c75bae9cab7744a249ff10510e581c808e94
Title: [Bugfix:CI] Fix duplicate msg when sync to github.

这段代码在 `copybara_sync.sh` 脚本中新增了一个功能,用于检测并跳过从 GitHub 导入的 commits,通过识别包含 `GitOrigin-RevId` 的 commit 来确定上次同步点,并从该点之后的第一个非导入 commit 开始进行同步。
Link: https://code.alibaba-inc.com/AliNN/AliNNPrivate/codereview/25060238
GitOrigin-RevId: 9caa49c6127112c4dc317584143e8ef041bab77d
GitOrigin-RevId: 804774d5836618d85384e4f0ce815ec94dec02de
wangzhaode and others added 26 commits February 2, 2026 16:02
Title: [Attention Feature] Support metal flash attention with lower memory and speedup

本次代码评审的主要改动包括引入了新的 `flash_softmax`、`flash_matmul_qkv`、`flash_scale` 和 `flash_attention_fused` 内核函数以优化注意力机制计算,并调整了相关参数和缓冲区管理逻辑,同时更新了掩码处理方式以支持更灵活的键值序列长度控制。
Link: https://code.alibaba-inc.com/AliNN/AliNNPrivate/codereview/25100824
GitOrigin-RevId: 4bec555e71af031749770b66a29c9fb5f28f438e
ORIGINAL_AUTHOR=MNNSyncBot <hi@zhaode.wang>
…gfix mask shape.

1. LLM's mask is scalar when mask is lower triangular and use cpu backend.
2. Bugfix CPU LLM supports any shape mask.

Signed-off-by: jingbang.yjb <jingbang.yjb@alibaba-inc.com>
Discussed-in: Merge-Request 25279138 , URL: https://code.alibaba-inc.com/AliNN/AliNNPrivate/codereview/25279138
GitOrigin-RevId: ce5d6d2fa7fbebd151b68c021dce9af98b383cdb
Discussed-in: Merge-Request 25804622 , URL: https://code.alibaba-inc.com/AliNN/AliNNPrivate/codereview/25804622
GitOrigin-RevId: 7ebc965b93a9fd1e8a54f53c05bc6bdb37a432b4
本次代码评审的主要内容是对扩散模型引擎进行了重构,引入了新的Sana Diffusion模型,并对现有Stable Diffusion模型进行了优化,包括统一的生成接口定义、新增工厂方法创建不同类型的扩散模型实例以及相应的演示程序和文档说明。
Link: https://code.alibaba-inc.com/AliNN/AliNNPrivate/codereview/25760822
* feat: 增加 sana diffusion, 重构代码

* chore: 保留run接口,保留diffusion_demo不变

* docs: update diffusion usage

* chore: drop tokenizer.cpp

* fix: sana_llm.hpp使用llm.hpp采用包引入,否则下游报错

* fix: 修复sana_llm.hpp 引入问题

* fix: 打包framework的时候包含sana_llm.hpp

GitOrigin-RevId: 00b1fe902ce163a6a567169b1f964f541aeba2ee
GitOrigin-RevId: dbfc4a7db7d3ad4fea182aac413fe5f0bf2db031
Discussed-in: Merge-Request 25729546 , URL: https://code.alibaba-inc.com/AliNN/AliNNPrivate/codereview/25729546
GitOrigin-RevId: 6bd567d39b386132e094db11e44d8434b6681fa0
Discussed-in: Merge-Request 25913990 , URL: https://code.alibaba-inc.com/AliNN/AliNNPrivate/codereview/25913990
GitOrigin-RevId: 290df25642c6fc47bb49f7facdf42e81b5e0cd41
Discussed-in: Merge-Request 25939666 , URL: https://code.alibaba-inc.com/AliNN/AliNNPrivate/codereview/25939666
GitOrigin-RevId: fdc2f722daa7e156757279bad607bdb946de416c
Discussed-in: Merge-Request 25961499 , URL: https://code.alibaba-inc.com/AliNN/AliNNPrivate/codereview/25961499
GitOrigin-RevId: af16a87d90931191281da636738dccff171c7f3e
[VULKAN] Support configuring coopMat when creating VkDevice.

Optimize codes related to creating VkDevice.

[VULKAN] Support setting extra spec consts when creating compute pipelines.

[VULKAN][BUFFER] Support using coopMat in Conv1x1.

[VULKAN][BUFFER] Add device check to coopMat branch conditions.

Modify local size setting for shaders in VulkanConv1x1Coop.

[VULKAN][BUFFER] Support onClone in <VulkanConv1x1Coop>. And check subgroupSize before creating Conv Op.

[VULKAN][BUFFER] Update compile results.

Discussed-in: Merge-Request 25276376 , URL: https://code.alibaba-inc.com/AliNN/AliNNPrivate/codereview/25276376
GitOrigin-RevId: 1e4249e3b8d3b7a81057d91717cdabd2509d6c73
Discussed-in: Merge-Request 25982500 , URL: https://code.alibaba-inc.com/AliNN/AliNNPrivate/codereview/25982500
GitOrigin-RevId: b0b36b349c52bf8f1fcd2bd5172c4afab9a4f810
Title: [LLM:Bugfix] Fix HQQ OOM and Embedding overflow.

针对 Qwen3.5-27B 支持过程中的两个问题进行了修复:

1. HQQ 量化优化:
   - 修复了在单卡 3090 上量化 `lm_head` 权重时显存不足(OOM)的问题。
   - 解决方案:采用分块(Chunk-based)量化策略降低显存峰值占用。

2. DiskEmbedding 溢出修复:
   - 修复了当词表较大(如 24w)时,默认 `int` 类型保存 offset 导致的整型溢出问题。
   - 解决方案:将相关索引和 offset 变量类型修改为 `size_t`。
Link: https://code.alibaba-inc.com/AliNN/AliNNPrivate/codereview/26078666
GitOrigin-RevId: aa8545496f2c96913afc3217d5949de0930b8f3d
…rators.

Discussed-in: Merge-Request 26095381 , URL: https://code.alibaba-inc.com/AliNN/AliNNPrivate/codereview/26095381
GitOrigin-RevId: 024a6654879090799368764bd2b2e0a23c9cb428
…libaba#4189)

* [feat] use new markdown view and streaming message

* [feat] support useMarkdown

* [update] preview message

* [reformat] formate with swift format

* [refactor]: change DispatchQueue.main.async to MainActor.run

* [fix] send sequence

* [feat] MTL config, batch file test and support local models

- change config to support MTL
- support more local models
- support batch file test

* [update] gitignore

* [add] local model json

* [add] local batch test

* [feat] support text, image and audio batch test

* [refactor] batch test view and model

* [update] localizations

* [feat] support video input

* [feat] support backend, precision and thread config

* feat: Add multimodal processing support and configuration options

* update: support switch use multimodal prompt API

* update: support video with imgs

* feat: support audio output

* delete: unused code and comments

* feat: support sana diffusion

* feat: add support for Sana Diffusion style transfer model

# Conflicts:
#	apps/iOS/MNNLLMChat/MNNLLMiOS/Chat/ViewModels/LLMChatViewModel.swift
#	apps/iOS/MNNLLMChat/MNNLLMiOS/Chat/Views/LLMChatView.swift
#	apps/iOS/MNNLLMChat/MNNLLMiOS/Chat/Views/ModelSettingsView.swift
#	apps/iOS/MNNLLMChat/MNNLLMiOS/Localizable.xcstrings
#	apps/iOS/MNNLLMChat/MNNLLMiOS/MainTab/ModelList/Models/ModelListViewModel.swift
#	apps/iOS/MNNLLMChat/MNNLLMiOS/Service/Util/AssetExtractor.swift

* refactor: remove cfgPrompt from style transfer and update prompt processing

- Removed cfgPrompt parameter from runStyleTransfer and related methods.
- Updated processPrompt to processSinglePrompt for improved clarity and functionality.
- Added benchmark result saving functionality to track performance metrics.
- Cleaned up unused code and comments related to cfgPrompt.

* feat: use sana llm as engine

* update: readme version

* update: iOS backend

* update: Chat Package

* update: set iterations to 10

* update: set default seed to 42 for sana diffusion

* update: load diffusion on background thread

* update: sana diffusion api

* feat: 增加根据exif信息旋转输入图像的操作

* feat: show diffusion progress

* add: diffusion total cost time

* update: Chat version

* update: change image and text position

---------

Co-authored-by: 游薪渝(揽清) <azure.yxy@alibaba-inc.com>
Co-authored-by: 蔚山 <weishan.wyf@alibaba-inc.com>
…_bench

Discussed-in: Merge-Request 26143023 , URL: https://code.alibaba-inc.com/AliNN/AliNNPrivate/codereview/26143023
GitOrigin-RevId: cfe809ec16e10ff1ff6bf16a75daede9cd6bd50c
Title: [LLM:Fix] Fix JSON merge issue in merge_and_clear for jinja config

将 `merge_json` 函数从 `llmconfig.cpp` 移动到 `llmconfig.hpp` 并定义为 `static inline` 函数,同时更新其在 `llmconfig.hpp` 中的实现以支持递归合并 JSON 对象。
Link: https://code.alibaba-inc.com/AliNN/AliNNPrivate/codereview/26158563
GitOrigin-RevId: 583824991d06804eaf75e37471918fbf1215eb70
Discussed-in: Merge-Request 25973150 , URL: https://code.alibaba-inc.com/AliNN/AliNNPrivate/codereview/25973150
GitOrigin-RevId: 83adccd7e28417c68280e8b4c8d0392816cee994
Discussed-in: Merge-Request 26168056 , URL: https://code.alibaba-inc.com/AliNN/AliNNPrivate/codereview/26168056
GitOrigin-RevId: 0506c76c610e1b297c38e8fc9e237e31ea7a36ac
Discussed-in: Merge-Request 26193851 , URL: https://code.alibaba-inc.com/AliNN/AliNNPrivate/codereview/26193851
GitOrigin-RevId: bec038578d1f788fea87657c01f660e4282eee69
Merge alibaba/MNN tag 3.4.1 into fork, preserving the cleanup
of removed apps/ and project/ directories.

Key improvements from upstream:
- Metal TensorAPI support (M-series perf boost)
- CPU MatMul/LayerNorm/broadcast optimization + ThreadPool overhead reduction
- KleidiAI fp32 depthwise conv kernels (ARM)
- Loop Op GPU optimization (Metal/OpenCL) - pure GPU path
- Metal INT8/INT4 Conv2D fix
- 7 memory safety fixes in shape/execution operators
- Vulkan CoopMat Conv1x1 acceleration

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Seems you are using me but didn't get OPENAI_API_KEY seted in Variables/Secrets for this repo. you could follow readme for more information

pruthvikar added a commit to getcarv/mnn-sys that referenced this pull request Mar 26, 2026
Update MNN submodule from post-3.3.0 (a5d3b04) to 3.4.1 merge (a1803f7).

Depends on: getcarv/MNN#1

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@pruthvikar pruthvikar merged commit 8f9264c into cleanup/remove-unused-apps-and-projects Mar 26, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.