Commit 4e2e7ff
committed
workflows: Add vLLM workflow for LLM inference and production deployment
Add support for deploying and testing vLLM inference engine and the vLLM
Production Stack. The workflow enables automated testing of both vLLM
as a single-node inference server and the production stack's
cluster-wide orchestration capabilities including routing, scaling,
and distributed caching. We start off with CPU support for both.
For the production stack two replicas are requested so two engines,
each one requiring 16 GiB memory. Given other requirements we ask for
at least 64 GiB RAM for the production stack vllm CPU test.
To get the production stack up and running you just use:
make defconfig-vllm-production-stack-cpu KDEVOPS_HOSTS_PREFIX="demo"
make
make bringup
make vllm AV=2
At this point you end up with two replicas serving through the
vLLM production stack router.
vLLM is a high-performance inference engine for large language models,
optimized for throughput and memory efficiency through PagedAttention
and continuous batching. The vLLM Production Stack builds on top of this
engine to provide cluster-wide serving with intelligent request routing,
distributed KV cache sharing via LMCache, unified observability, and
autoscaling across multiple model replicas.
The implementation supports three deployment methods: simple Docker
containers for development, Kubernetes with the official Production
Stack Helm chart for cluster deployments
(https://github.com/vllm-project/production-stack), and bare metal with
systemd for direct hardware access. Each method shares common
configuration through Kconfig while maintaining deployment-specific
optimizations.
Testing can be performed with either CPU-only or GPU-accelerated
inference. CPU testing uses openeuler/vllm-cpu images to validate the
vLLM API and the production stack's orchestration layer without
requiring GPU hardware, making it suitable for CI/CD pipelines and
development workflows. This enables testing of the router's routing
algorithms (round-robin, session affinity, prefix-aware), service
discovery, load balancing, and API compatibility. GPU testing validates
full production scenarios including LMCache distributed cache sharing,
tensor parallelism, and autoscaling behavior.
The workflow integrates Docker registry mirror support with automatic
detection via 9P mounts. When /mirror/docker is available, the system
automatically configures Docker daemon registry-mirrors for transparent
pull-through caching, reducing deployment time without requiring manual
configuration. The detection uses the libvirt gateway IP to ensure
proper routing from containers and minikube pods.
Image configuration follows Docker's native registry-mirrors pattern
rather than rewriting image names. This preserves the original
repository paths like 'openeuler/vllm-cpu:latest' and
'ghcr.io/vllm-project/production-stack/router:latest' while still
benefiting from mirror caching when available.
Status monitoring is provided through:
make vllm-status
make vllm-status-simplified
which parse deployment state and present it with context-aware guidance
about next steps. The vllm-quick-test target provides rapid smoke
testing across all configured nodes with timing measurements and
proper exit codes for CI integration.
To test an LLM query:
make vllm-quick-test
We provide basic documentation to help clarify the distinction between
vLLM (the inference engine) and the Production Stack (the orchestration
layer). For more details refer to the official release announcement at:
https://blog.lmcache.ai/2025-01-21-stack-release/
The long term plan is to scale with mocked engines, and then also
real GPUs support both bare metal and on the cloud, leveraging
kdevops's cloud agnostic power for any workflow.
Here's an example quick test:
mcgrof@beefy-server /xfs1/mcgrof/vllm/kdevops (git::vllm-v2)$ make vllm-quick-test
========================================
vLLM Quick Test
========================================
Prompt: "kdevops is"
Max tokens: 30
Nodes to test: 1
Testing Baseline node: lpc-vllm
----------------------------------------
Node IP: 192.168.122.170
Starting kubectl port-forward...
Sending request: "kdevops is"
✓ Success!
Duration: 15.747292458s
Full response: "kdevops iseasily a higher level doctor than your list.
really it depends on as on what doc is what 15 less ifmay its just personal preferences."
Full JSON response:
{
"id": "cmpl-2f031a35c5364d3aaf2b9f0007d46ae5",
"object": "text_completion",
"created": 1759424719,
"model": "facebook/opt-125m",
"choices": [
{
"index": 0,
"text": " easily a higher level doctor than your list.\nreally it depends on as on what doc is what 15 less ifmay its just personal preferences.\n",
"logprobs": null,
"finish_reason": "length",
"stop_reason": null,
"prompt_logprobs": null
}
],
"usage": {
"prompt_tokens": 5,
"total_tokens": 35,
"completion_tokens": 30,
"prompt_tokens_details": null
},
"kv_transfer_params": null
}
========================================
All tests passed!
========================================
Then for a synthetic benchmark:
make vllm-benchmark
You should end up with results in workflows/vllm/results/html/
I have put demo results of a synthetic run and also a real workload
on a virtual 64 vcpus 64 GiB DRAM here:
https://github.com/mcgrof/demo-vllm-benchmark
Generated-by: Claude AI
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>1 parent 343fbdf commit 4e2e7ff
File tree
40 files changed
+5076
-2
lines changed- defconfigs
- kconfigs
- workflows
- playbooks
- roles
- gen_hosts
- defaults
- tasks
- templates/workflows
- gen_nodes
- defaults
- tasks
- linux-mirror/tasks
- vllm
- defaults
- tasks
- install-deps
- debian
- redhat
- suse
- templates
- scripts
- workflows
- vllm
40 files changed
+5076
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
91 | 91 | | |
92 | 92 | | |
93 | 93 | | |
| 94 | + | |
94 | 95 | | |
95 | 96 | | |
96 | 97 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
8 | 39 | | |
9 | 40 | | |
10 | 41 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
285 | 285 | | |
286 | 286 | | |
287 | 287 | | |
288 | | - | |
289 | | - | |
| 288 | + | |
| 289 | + | |
290 | 290 | | |
291 | 291 | | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
292 | 312 | | |
293 | 313 | | |
294 | 314 | | |
| |||
303 | 323 | | |
304 | 324 | | |
305 | 325 | | |
| 326 | + | |
306 | 327 | | |
307 | 328 | | |
308 | 329 | | |
| |||
358 | 379 | | |
359 | 380 | | |
360 | 381 | | |
| 382 | + | |
361 | 383 | | |
362 | 384 | | |
363 | 385 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
335 | 335 | | |
336 | 336 | | |
337 | 337 | | |
| 338 | + | |
338 | 339 | | |
339 | 340 | | |
340 | 341 | | |
| |||
408 | 409 | | |
409 | 410 | | |
410 | 411 | | |
| 412 | + | |
411 | 413 | | |
412 | 414 | | |
413 | 415 | | |
| |||
478 | 480 | | |
479 | 481 | | |
480 | 482 | | |
| 483 | + | |
481 | 484 | | |
482 | 485 | | |
483 | 486 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
233 | 233 | | |
234 | 234 | | |
235 | 235 | | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
236 | 244 | | |
237 | 245 | | |
238 | 246 | | |
| |||
265 | 273 | | |
266 | 274 | | |
267 | 275 | | |
| 276 | + | |
268 | 277 | | |
269 | 278 | | |
270 | 279 | | |
| |||
395 | 404 | | |
396 | 405 | | |
397 | 406 | | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
398 | 415 | | |
399 | 416 | | |
400 | 417 | | |
| |||
530 | 547 | | |
531 | 548 | | |
532 | 549 | | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
533 | 561 | | |
534 | 562 | | |
535 | 563 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
| 33 | + | |
33 | 34 | | |
34 | 35 | | |
35 | 36 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
270 | 270 | | |
271 | 271 | | |
272 | 272 | | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
273 | 288 | | |
274 | 289 | | |
275 | 290 | | |
| |||
0 commit comments