⚡ open-xdna

Running AI on the AMD NPU that AMD says you can't.

The first-generation XDNA1 NPU — "Phoenix" / "Hawk Point" — in millions of Ryzen laptops, brought up on Linux with 100% open source. Models running on the silicon. Custom vector kernels hand-authored in the AIE ISA.

FastFlowLM and AMD Lemonade — the stacks everyone points to for "LLMs on the Ryzen AI NPU" — support XDNA2 only. If you own a Ryzen 7040 / 8040 laptop, AMD's answer for your NPU on Linux is "not available — use the GPU."

open-xdna is the answer that says "here's how."

$ xrt-smi examine
  [0000:06:00.1]   RyzenAI-npu1   aie2   6x5      ←  the chip they left behind, alive on Linux

🏆 Proven on silicon (every row measured, on a real Ryzen 7 8845HS)

Milestone	Result
🔌 Gen-1 NPU driven by fully open-source XRT + driver + firmware	✅
♻️ Survives reboot + kernel updates (DKMS)	✅
⚙️ `512³` int16 matmul on the NPU	✅ ~68 GFLOPS
🧠 A real model's forward pass (2-layer MLP) on the NPU	✅ bit-exact
✂️ Non-bijunctive collapse (keep-strong / prune-weak)	✅ prune 50→92%
🛠️ Hand-authored AIE vector kernel (`aie::ge`+`aie::select`)	✅ first compile
🔁 Fused `reduce_max`→dynamic-τ collapse in one kernel	✅ data-adaptive
🔀 `aie::reverse` (vec_perm / shuffle) on the NPU	✅
🎯 Full collapse (reduce+threshold+compact) in one AIE kernel	✅
📈 Measured FFN net-win (NPU prune → dense down_proj)	✅ 1.6× @ cos 0.998
🎲 Attention KV-prune (NPU evicts low-mass keys)	✅ 4× @ cos 0.976
🖼️ Vision pipeline (rgba2gray→3×3 conv→threshold) on the NPU	✅ PASS — the NPU's native CNN strength
🧩 SigLIP/ViT patch-embed wired on the NPU	✅ bit-exact (offload play; ~3.2× slower than CPU at base size — honest)

😈 Why this exists

Hardware vendors retire silicon with software, not screwdrivers. AMD moved on to XDNA2 and told gen-1 owners the NPU "isn't available" for LLMs on Linux. The chip is fine. The driver's in the kernel. The compiler exists. What was missing was a recipe — so here's one, end to end, with receipts.

🔧 The four fixes nobody documented together

Getting a kernel onto an XDNA1 NPU on a current Linux needs four non-obvious moves (full detail: docs/BRINGUP.md, error-string fixes in the FAQ):

Install libxrt_driver_xdna.so (not just libvxdna.so) — else xrt-smi reports "0 devices found".
Load the matched staging amdxdna.ko — mainline (≤6.17) lacks ioctls the SHIM needs.
Put llvm-objcopy on PATH — GNU objcopy can't parse the AIE2 ELF.
Match the NPU firmware — stale firmware aborts commands (ERT_CMD_STATE_ABORT).

Upstream gap writeup: docs/UPSTREAM_amdxdna_ioctls.md — mainline amdxdna (≤6.17) is missing the GET_ARRAY/telemetry ioctls the current XRT SHIM expects (why fix #2 is needed).

🚀 Quick start

bash scripts/setup_iron.sh                      # IRON/mlir-aie + Peano
sudo bash scripts/swap_driver.sh                # matched staging driver
sudo bash scripts/install_firmware.sh 1.5.5.391 # matching NPU firmware
sudo bash scripts/run_example.sh basic/matrix_multiplication/single_core   # 68 GFLOPS on the NPU

Run the hand-authored kernels:

python3 examples/npu_tiny_mlp.py            # a model's matmuls on the NPU
python3 examples/npu_collapse_fused.py      # fused reduce_max -> dynamic-tau collapse
python3 examples/npu_collapse_runtime.py    # runtime-tau collapse (on-NPU aie::sub shift)
python3 examples/npu_shuffle_demo.py        # aie::reverse (vec_perm) on the NPU
python3 examples/npu_pse_collapse.py        # FULL collapse (reduce+threshold+compact) in ONE kernel
python3 examples/npu_ffn_prune.py           # MEASURED FFN net-win + accuracy tradeoff
python3 examples/npu_attention_prune.py     # MEASURED attention KV-prune (both GEMMs shrink)
python3 mlir-aie/programming_examples/vision/edge_detect/edge_detect.py -W 512 -H 512  # 2D conv vision pipeline on NPU
python3 examples/npu_patch_embed.py         # SigLIP/ViT patch-embed on the NPU (measured, honest)

🧬 The flex: you can hand-write AIE kernels like AltiVec

The AIE2 vector ISA is the same primitive class as PowerPC AltiVec/VSX — just new mnemonics. The non-bijunctive collapse, hand-authored (examples/kernels/collapse.cc):

aie::vector<bfloat16,32> x   = aie::load_v<32>(a + i);    // vec_ld
aie::mask<32>            keep = aie::ge(x, tau_v);         // vec_cmpge  → mask
aie::vector<bfloat16,32> out  = aie::select(zero_v, x, keep); // vec_sel: keep ? x : 0
aie::store_v(c + i, out);                                 // vec_st

Porting your own SIMD kernels? docs/ALTIVEC_TO_AIE.md maps the whole vocabulary.

📒 Full benchmark log (by model)

Every measurement, organized by model/architecture — what was tested, the result, measured-vs-projected: BENCHMARKS.md. Includes the honest negatives (gemma4 layer-prune net loss, vision offload-not-speed).

📊 The honest part

This is not a turnkey LLM server, and the NPU is not a fast matmul engine — measured, it's ~6× slower than the integrated Radeon 780M at dense GEMM and loses on energy-per-GFLOP for dense work (RESULTS.md). Its real value is a ~6.6 W power floor and cheap pruning/selection that lets a stronger device do less work. A prune-then-shrink FFN nets ~1.3–1.6× less work (not the naive 4× — you still pay the producer projection), gated on accuracy. We measure before we claim, and we mark every frontier.

🗺️ Does my chip have an XDNA1 NPU?

lspci -d 1022:1502 → "AMD IPU Device" = yes. 7840HS/U · 8840HS/U · 8845HS · 7940HS · 8945HS · 7640HS/U · 8640HS · 8645HS · 8600G/8700G. (Desktop 8500G/8300G have no NPU — Zen4c die.)

🔗 Related work — Elyan Labs

Heterogeneous-compute research (PSE non-bijunctive collapse, RAM coffers / NUMA weight banking, neuromorphic device routing) by Elyan Labs. See also ram-coffers · pse-vcipher-collapse.

🖼️ Multimodal: vision front-end on the NPU

The NPU was built for vision/CNN inference — a multimodal model's image tower (patch-embed convs, preprocessing) is a far better NPU fit than text decode. A full conv pipeline already runs on the NPU; see docs/VISION_OFFLOAD.md for the offload, and docs/MULTIMODAL_3WAY.md for the full 3-processor picture (incl gemma4:26b @ 16.5 tok/s on the 780M via 96 GB — a 17 GB model the 8 GB discrete can't hold).

📜 License

AGPLv3 (this repo's original work) — see LICENSE. A commercial / proprietary license (no copyleft) is available for closed-source or SaaS use: COMMERCIAL.md. AMD's XDNA/XRT and IRON/mlir-aie toolchains are Apache-2.0 WITH LLVM-exception and installed, not redistributed.

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
docs		docs
examples		examples
scripts		scripts
.gitignore		.gitignore
BCOS.md		BCOS.md
BENCHMARKS.md		BENCHMARKS.md
COMMERCIAL.md		COMMERCIAL.md
FAQ.md		FAQ.md
LICENSE		LICENSE
README.md		README.md
RESULTS.md		RESULTS.md
amd_dev_community_showcase.md		amd_dev_community_showcase.md
discord_post_amd_devs.txt		discord_post_amd_devs.txt
llms.txt		llms.txt
xdna_telemetry_graceful_degrade.patch		xdna_telemetry_graceful_degrade.patch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

⚡ open-xdna

Running AI on the AMD NPU that AMD says you can't.

🏆 Proven on silicon (every row measured, on a real Ryzen 7 8845HS)

😈 Why this exists

🔧 The four fixes nobody documented together

🚀 Quick start

🧬 The flex: you can hand-write AIE kernels like AltiVec

📒 Full benchmark log (by model)

📊 The honest part

🗺️ Does my chip have an XDNA1 NPU?

🔗 Related work — Elyan Labs

🖼️ Multimodal: vision front-end on the NPU

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

⚡ open-xdna

Running AI on the AMD NPU that AMD says you can't.

🏆 Proven on silicon (every row measured, on a real Ryzen 7 8845HS)

😈 Why this exists

🔧 The four fixes nobody documented together

🚀 Quick start

🧬 The flex: you can hand-write AIE kernels like AltiVec

📒 Full benchmark log (by model)

📊 The honest part

🗺️ Does my chip have an XDNA1 NPU?

🔗 Related work — Elyan Labs

🖼️ Multimodal: vision front-end on the NPU

📜 License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages