-
Notifications
You must be signed in to change notification settings - Fork 229
Open
Description
I have the Foundry Local CLI installed which behaves as expected, nice and easy, handles NPU for my Intel OpenVino boosted device.
The issue I run into is the SDK does not list the same models in the catalog, even when I attempt to use the WebUrl and point it to the local endpoint for my Foundry, which prevents me from using NPU models through the c# SDK in my application.
I am using the Foundry Local Windows SDK for C#
The list of models from foundry cli:
foundry --version
0.8.113+70167233c5
foundry model list
Alias Device Task File Size License Model ID
-----------------------------------------------------------------------------------------------
phi-4 GPU chat 8.83 GB MIT phi-4-openvino-gpu:1
GPU chat 8.37 GB MIT Phi-4-generic-gpu:1
CPU chat 10.16 GB MIT Phi-4-generic-cpu:1
----------------------------------------------------------------------------------------------------------
phi-3.5-mini GPU chat 1.95 GB MIT Phi-3.5-mini-instruct-openvino-gpu:1
GPU chat 2.16 GB MIT Phi-3.5-mini-instruct-generic-gpu:1
CPU chat 2.53 GB MIT Phi-3.5-mini-instruct-generic-cpu:1
--------------------------------------------------------------------------------------------------------------------------
phi-3-mini-128k GPU chat 2.27 GB MIT Phi-3-mini-128k-instruct-openvino-gpu:1
GPU chat 2.13 GB MIT Phi-3-mini-128k-instruct-generic-gpu:1
CPU chat 2.54 GB MIT Phi-3-mini-128k-instruct-generic-cpu:2
-----------------------------------------------------------------------------------------------------------------------------
phi-3-mini-4k NPU chat 2.13 GB MIT Phi-3-mini-4k-instruct-openvino-npu:1
GPU chat 2.01 GB MIT Phi-3-mini-4k-instruct-openvino-gpu:1
GPU chat 2.13 GB MIT Phi-3-mini-4k-instruct-generic-gpu:1
CPU chat 2.53 GB MIT Phi-3-mini-4k-instruct-generic-cpu:2
---------------------------------------------------------------------------------------------------------------------------
mistral-7b-v0.2 NPU chat 4.07 GB apache-2.0 Mistral-7B-Instruct-v0-2-openvino-npu:1
GPU chat 4.27 GB apache-2.0 Mistral-7B-Instruct-v0-2-openvino-gpu:1
GPU chat 4.07 GB apache-2.0 mistralai-Mistral-7B-Instruct-v0-2-generic-gpu:1
CPU chat 4.07 GB apache-2.0 mistralai-Mistral-7B-Instruct-v0-2-generic-cpu:2
---------------------------------------------------------------------------------------------------------------------------------------
deepseek-r1-14b GPU chat 7.87 GB MIT DeepSeek-R1-Distill-Qwen-14B-openvino-gpu:1
GPU chat 10.27 GB MIT deepseek-r1-distill-qwen-14b-generic-gpu:3
CPU chat 11.51 GB MIT deepseek-r1-distill-qwen-14b-generic-cpu:3
---------------------------------------------------------------------------------------------------------------------------------
deepseek-r1-7b NPU chat 5.58 GB MIT DeepSeek-R1-Distill-Qwen-7B-openvino-npu:1
GPU chat 4.19 GB MIT DeepSeek-R1-Distill-Qwen-7B-openvino-gpu:1
GPU chat 5.58 GB MIT deepseek-r1-distill-qwen-7b-generic-gpu:3
CPU chat 6.43 GB MIT deepseek-r1-distill-qwen-7b-generic-cpu:3
--------------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-0.5b NPU chat, tools 0.52 GB apache-2.0 qwen2.5-coder-0.5b-instruct-openvino-npu:3
GPU chat, tools 0.36 GB apache-2.0 qwen2.5-coder-0.5b-instruct-openvino-gpu:2
GPU chat, tools 0.52 GB apache-2.0 qwen2.5-coder-0.5b-instruct-generic-gpu:4
CPU chat, tools 0.80 GB apache-2.0 qwen2.5-coder-0.5b-instruct-generic-cpu:4
--------------------------------------------------------------------------------------------------------------------------------
phi-4-mini-reasoning NPU chat 2.78 GB MIT Phi-4-mini-reasoning-openvino-npu:2
GPU chat 2.47 GB MIT Phi-4-mini-reasoning-openvino-gpu:2
GPU chat 3.15 GB MIT Phi-4-mini-reasoning-generic-gpu:3
CPU chat 4.52 GB MIT Phi-4-mini-reasoning-generic-cpu:3
-------------------------------------------------------------------------------------------------------------------------
qwen2.5-0.5b NPU chat, tools 0.52 GB apache-2.0 qwen2.5-0.5b-instruct-openvino-npu:3
GPU chat, tools 0.36 GB apache-2.0 qwen2.5-0.5b-instruct-openvino-gpu:2
GPU chat, tools 0.68 GB apache-2.0 qwen2.5-0.5b-instruct-generic-gpu:4
CPU chat, tools 0.80 GB apache-2.0 qwen2.5-0.5b-instruct-generic-cpu:4
--------------------------------------------------------------------------------------------------------------------------
qwen2.5-1.5b NPU chat, tools 1.51 GB apache-2.0 qwen2.5-1.5b-instruct-openvino-npu:3
GPU chat, tools 1.00 GB apache-2.0 qwen2.5-1.5b-instruct-openvino-gpu:2
GPU chat, tools 1.51 GB apache-2.0 qwen2.5-1.5b-instruct-generic-gpu:4
CPU chat, tools 1.78 GB apache-2.0 qwen2.5-1.5b-instruct-generic-cpu:4
--------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-1.5b NPU chat, tools 0.52 GB apache-2.0 qwen2.5-coder-1.5b-instruct-openvino-npu:3
GPU chat, tools 0.99 GB apache-2.0 qwen2.5-coder-1.5b-instruct-openvino-gpu:2
GPU chat, tools 1.25 GB apache-2.0 qwen2.5-coder-1.5b-instruct-generic-gpu:4
CPU chat, tools 1.78 GB apache-2.0 qwen2.5-coder-1.5b-instruct-generic-cpu:4
--------------------------------------------------------------------------------------------------------------------------------
phi-4-mini NPU chat, tools 3.60 GB MIT phi-4-mini-instruct-openvino-npu:2
GPU chat, tools 2.15 GB MIT phi-4-mini-instruct-openvino-gpu:2
GPU chat, tools 3.72 GB MIT Phi-4-mini-instruct-generic-gpu:5
CPU chat, tools 4.80 GB MIT Phi-4-mini-instruct-generic-cpu:5
------------------------------------------------------------------------------------------------------------------------
qwen2.5-14b GPU chat, tools 4.79 GB apache-2.0 qwen2.5-14b-instruct-openvino-gpu:2
GPU chat, tools 9.30 GB apache-2.0 qwen2.5-14b-instruct-generic-gpu:4
CPU chat, tools 11.06 GB apache-2.0 qwen2.5-14b-instruct-generic-cpu:4
-------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-14b GPU chat, tools 9.08 GB apache-2.0 qwen2.5-coder-14b-instruct-openvino-gpu:2
GPU chat, tools 8.79 GB apache-2.0 qwen2.5-coder-14b-instruct-generic-gpu:4
CPU chat, tools 11.06 GB apache-2.0 qwen2.5-coder-14b-instruct-generic-cpu:4
-------------------------------------------------------------------------------------------------------------------------------
qwen2.5-coder-7b NPU chat, tools 4.73 GB apache-2.0 qwen2.5-coder-7b-instruct-openvino-npu:2
GPU chat, tools 4.80 GB apache-2.0 qwen2.5-coder-7b-instruct-openvino-gpu:2
GPU chat, tools 4.73 GB apache-2.0 qwen2.5-coder-7b-instruct-generic-gpu:4
CPU chat, tools 6.16 GB apache-2.0 qwen2.5-coder-7b-instruct-generic-cpu:4
------------------------------------------------------------------------------------------------------------------------------
qwen2.5-7b NPU chat, tools 5.20 GB apache-2.0 qwen2.5-7b-instruct-openvino-npu:2
GPU chat, tools 4.79 GB apache-2.0 qwen2.5-7b-instruct-openvino-gpu:2
GPU chat, tools 5.20 GB apache-2.0 qwen2.5-7b-instruct-generic-gpu:4
CPU chat, tools 6.16 GB apache-2.0 qwen2.5-7b-instruct-generic-cpu:4
------------------------------------------------------------------------------------------------------------------------
gpt-oss-20b CPU chat 12.26 GB MIT gpt-oss-20b-generic-cpu:1
VS using the demo code and its available models in the catalog:
Available models for your hardware:
- Alias: deepseek-r1-14b (Id: deepseek-r1-distill-qwen-14b-generic-gpu:3)
- Alias: deepseek-r1-7b (Id: deepseek-r1-distill-qwen-7b-generic-gpu:3)
- Alias: gpt-oss-20b (Id: gpt-oss-20b-generic-cpu:1)
- Alias: mistral-7b-v0.2 (Id: mistralai-Mistral-7B-Instruct-v0-2-generic-gpu:1)
- Alias: phi-3-mini-128k (Id: Phi-3-mini-128k-instruct-generic-gpu:1)
- Alias: phi-3-mini-4k (Id: Phi-3-mini-4k-instruct-generic-gpu:1)
- Alias: phi-3.5-mini (Id: Phi-3.5-mini-instruct-generic-gpu:1)
- Alias: phi-4 (Id: Phi-4-generic-gpu:1)
- Alias: phi-4-mini (Id: Phi-4-mini-instruct-generic-gpu:5)
- Alias: phi-4-mini-reasoning (Id: Phi-4-mini-reasoning-generic-gpu:3)
- Alias: qwen2.5-0.5b (Id: qwen2.5-0.5b-instruct-generic-gpu:4)
- Alias: qwen2.5-1.5b (Id: qwen2.5-1.5b-instruct-generic-gpu:4)
- Alias: qwen2.5-14b (Id: qwen2.5-14b-instruct-generic-gpu:4)
- Alias: qwen2.5-7b (Id: qwen2.5-7b-instruct-generic-gpu:4)
- Alias: qwen2.5-coder-0.5b (Id: qwen2.5-coder-0.5b-instruct-generic-gpu:4)
- Alias: qwen2.5-coder-1.5b (Id: qwen2.5-coder-1.5b-instruct-generic-gpu:4)
- Alias: qwen2.5-coder-14b (Id: qwen2.5-coder-14b-instruct-generic-gpu:4)
- Alias: qwen2.5-coder-7b (Id: qwen2.5-coder-7b-instruct-generic-gpu:4)
- Alias: whisper-base (Id: openai-whisper-base-generic-cpu:1)
- Alias: whisper-large-v3-turbo (Id: openai-whisper-large-v3-turbo-generic-cpu:2)
- Alias: whisper-medium (Id: openai-whisper-medium-generic-cpu:1)
- Alias: whisper-small (Id: openai-whisper-small-generic-cpu:1)
- Alias: whisper-tiny (Id: openai-whisper-tiny-generic-cpu:2)
Even when passing in the ID of an NPU model I want to use, it does not work.
I am unsure if I am doing something wrong or there is a discrepancy in the API and SDK
lebakken and tonioyoon
Metadata
Metadata
Assignees
Labels
No labels