Skip to content

Commit 7435487

Browse files
committed
vllm: Add DECLARE_HOSTS support for bare metal and existing infrastructure
Following the pattern established by the MinIO workflow in commit 533be4c ("minio: add MinIO Warp S3 benchmarking with declared hosts support"), add DECLARE_HOSTS support to the vLLM workflow to enable testing on pre-existing infrastructure including bare metal servers with GPUs. This enables users to leverage existing GPU infrastructure without requiring kdevops to provision new systems. Two new defconfigs are provided for different deployment scenarios. New defconfigs: 1. defconfig-vllm-declared-hosts - Bare metal deployment using Docker containers - Targets single-node GPU servers - Uses systemd service management for vLLM - Configurable GPU type (nvidia-a100, etc.) and count - Direct port 8000 access without Kubernetes overhead - Suitable for direct hardware access scenarios 2. defconfig-vllm-production-stack-declared-hosts - Production Stack deployment on existing Kubernetes clusters - Full Production Stack with router and monitoring - Autoscaling support (2-5 replicas) - Grafana/Prometheus observability stack - Suitable for production GPU clusters Both configurations automatically: - Set CONFIG_SKIP_BRINGUP=y to skip infrastructure provisioning - Set CONFIG_KDEVOPS_USE_DECLARED_HOSTS=y for pre-existing systems - Enable benchmarking for performance validation - Support HuggingFace model deployment (default: facebook/opt-125m) Implementation changes: Bare-metal deployment improvements (deploy-bare-metal.yml): - Remove legacy kubectl/minikube/helm installation for bare-metal - Implement Docker image mirror fallback (try mirror, fall back to public) - Replace handler-based service restart with direct systemd management - Make config file creation optional (template may not exist) - Support both Docker and Podman container runtimes - Automatic GPU detection and container image selection (CPU vs GPU) Main task routing (main.yml): - Remove 260+ line legacy deployment block that was causing unwanted Kubernetes installation even with when: false due to tags override - Change configure-docker-data.yml to only run for Kubernetes deployments - Convert deployment method routing from import_tasks to include_tasks with apply parameter to properly respect when conditions and tags - Add bare-metal specific conditions to benchmark tasks (skip kubectl port-forward, connect directly to port 8000) - Add bare-metal specific conditions to monitoring tasks (skip Kubernetes service queries, show systemd journal instructions instead) - Add new bare-metal monitoring info task with journalctl instructions Cleanup support (cleanup-bare-metal.yml - NEW): - Add vllm-cleanup target: Remove containers and systemd services - Add vllm-cleanup-full target: Also remove kubectl/helm/minikube binaries - Add vllm-cleanup-purge target: Complete purge including data directories - Essential for declared hosts since 'make destroy' doesn't apply Testing improvements (vllm-quick-test.sh): - Detect CONFIG_KDEVOPS_USE_DECLARED_HOSTS for declared hosts mode - Detect CONFIG_VLLM_BARE_METAL for deployment type - Read actual hostnames from kdevops_declared_hosts in extra_vars.yaml - Support comma-separated host lists for multiple declared hosts - Skip kubectl port-forward setup for bare-metal deployments - Direct connection to port 8000 for bare-metal API access - Maintain backward compatibility with provisioned VMs Makefile additions (workflows/vllm/Makefile): - Add vllm-cleanup target for basic cleanup - Add vllm-cleanup-full target for complete cleanup with binaries - Add vllm-cleanup-purge target for purging all data - Update help text for new cleanup targets Dependency fixes (install-deps/debian/main.yml): - Make kubectl/minikube installation conditional on deployment type - Skip Kubernetes tools for bare-metal deployments Example usage for bare metal GPU server: make defconfig-vllm-declared-hosts DECLARE_HOSTS=gpu-server-01 make make vllm # Deploy vLLM as systemd service make vllm-quick-test # Verify API endpoint make vllm-benchmark # Run performance benchmarks make vllm-cleanup # Clean up when done Example usage for existing Kubernetes cluster: make defconfig-vllm-production-stack-declared-hosts DECLARE_HOSTS=k8s-cluster make make vllm # Deploy via Helm make vllm-status # Check deployment status make vllm-monitor # Access Grafana/Prometheus make vllm-cleanup # Clean up namespace Key architectural decisions: 1. Avoid fragile hostvars access patterns - use configuration variables that are globally accessible across execution contexts (localhost vs target nodes) 2. Use include_tasks instead of import_tasks for conditional execution since import_tasks is static and evaluated at parse time, while include_tasks is dynamic and respects when conditions 3. Apply tags properly to included tasks using the apply parameter, otherwise tags only apply to the include statement itself 4. Implement graceful fallbacks for infrastructure dependencies (Docker mirror → public registry, Kubernetes → bare-metal) 5. Provide cleanup targets for declared hosts since standard 'make destroy' only applies to provisioned infrastructure This implementation mirrors the approach used for MinIO declared hosts support and enables vLLM testing on any infrastructure where GPUs are available, whether bare metal servers or existing Kubernetes clusters. Generated-by: Claude AI Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
1 parent 0cbd4f5 commit 7435487

File tree

9 files changed

+421
-324
lines changed

9 files changed

+421
-324
lines changed

defconfigs/vllm-declared-hosts

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
#
2+
# vLLM with declared hosts (bare metal or pre-existing infrastructure)
3+
#
4+
# Automatically generated file; DO NOT EDIT.
5+
# kdevops 5.0.2 Configuration
6+
#
7+
CONFIG_WORKFLOWS=y
8+
CONFIG_WORKFLOWS_TESTS=y
9+
CONFIG_WORKFLOWS_LINUX_TESTS=y
10+
CONFIG_WORKFLOWS_DEDICATED_WORKFLOW=y
11+
CONFIG_KDEVOPS_WORKFLOW_DEDICATE_VLLM=y
12+
13+
# Skip bringup for declared hosts
14+
CONFIG_SKIP_BRINGUP=y
15+
CONFIG_KDEVOPS_USE_DECLARED_HOSTS=y
16+
17+
# vLLM specific configuration - Using bare metal deployment for declared hosts
18+
CONFIG_VLLM_BARE_METAL=y
19+
CONFIG_VLLM_BARE_METAL_USE_CONTAINER=y
20+
CONFIG_VLLM_BARE_METAL_DOCKER=y
21+
CONFIG_VLLM_BARE_METAL_SERVICE_NAME="vllm"
22+
CONFIG_VLLM_BARE_METAL_DATA_DIR="/var/lib/vllm"
23+
CONFIG_VLLM_BARE_METAL_LOG_DIR="/var/log/vllm"
24+
25+
# GPU configuration for declared hosts
26+
CONFIG_VLLM_BARE_METAL_DECLARE_HOST_GPU_TYPE="nvidia-a100"
27+
CONFIG_VLLM_BARE_METAL_DECLARE_HOST_GPU_COUNT=1
28+
29+
# Model configuration
30+
CONFIG_VLLM_MODEL_URL="facebook/opt-125m"
31+
CONFIG_VLLM_MODEL_NAME="opt-125m"
32+
33+
# Engine configuration
34+
CONFIG_VLLM_VERSION_STABLE=y
35+
CONFIG_VLLM_ENGINE_IMAGE_TAG="v0.10.2"
36+
CONFIG_VLLM_REQUEST_CPU=8
37+
CONFIG_VLLM_REQUEST_MEMORY="16Gi"
38+
CONFIG_VLLM_REQUEST_GPU=1
39+
CONFIG_VLLM_MAX_MODEL_LEN=2048
40+
CONFIG_VLLM_DTYPE="auto"
41+
CONFIG_VLLM_GPU_MEMORY_UTILIZATION="0.9"
42+
CONFIG_VLLM_TENSOR_PARALLEL_SIZE=1
43+
44+
# API configuration
45+
CONFIG_VLLM_API_PORT=8000
46+
CONFIG_VLLM_API_KEY=""
47+
CONFIG_VLLM_HF_TOKEN=""
48+
49+
# Benchmarking
50+
CONFIG_VLLM_BENCHMARK_ENABLED=y
51+
CONFIG_VLLM_BENCHMARK_DURATION=60
52+
CONFIG_VLLM_BENCHMARK_CONCURRENT_USERS=10
53+
CONFIG_VLLM_BENCHMARK_RESULTS_DIR="/data/vllm-benchmark"
Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
#
2+
# vLLM Production Stack with declared hosts (bare metal with GPU)
3+
#
4+
# Automatically generated file; DO NOT EDIT.
5+
# kdevops 5.0.2 Configuration
6+
#
7+
CONFIG_WORKFLOWS=y
8+
CONFIG_WORKFLOWS_TESTS=y
9+
CONFIG_WORKFLOWS_LINUX_TESTS=y
10+
CONFIG_WORKFLOWS_DEDICATED_WORKFLOW=y
11+
CONFIG_KDEVOPS_WORKFLOW_DEDICATE_VLLM=y
12+
13+
# Skip bringup for declared hosts
14+
CONFIG_SKIP_BRINGUP=y
15+
CONFIG_KDEVOPS_USE_DECLARED_HOSTS=y
16+
17+
# vLLM Production Stack with Kubernetes on declared hosts
18+
CONFIG_VLLM_PRODUCTION_STACK=y
19+
CONFIG_VLLM_K8S_EXISTING=y
20+
CONFIG_VLLM_VERSION_STABLE=y
21+
CONFIG_VLLM_ENGINE_IMAGE_TAG="v0.10.2"
22+
CONFIG_VLLM_HELM_RELEASE_NAME="vllm-prod"
23+
CONFIG_VLLM_HELM_NAMESPACE="vllm-system"
24+
25+
# Production Stack components
26+
CONFIG_VLLM_PROD_STACK_REPO="https://vllm-project.github.io/production-stack"
27+
CONFIG_VLLM_PROD_STACK_CHART_VERSION="latest"
28+
CONFIG_VLLM_PROD_STACK_ROUTER_IMAGE="ghcr.io/vllm-project/production-stack/router"
29+
CONFIG_VLLM_PROD_STACK_ROUTER_TAG="latest"
30+
CONFIG_VLLM_PROD_STACK_ENABLE_MONITORING=y
31+
CONFIG_VLLM_PROD_STACK_ENABLE_AUTOSCALING=y
32+
CONFIG_VLLM_PROD_STACK_MIN_REPLICAS=2
33+
CONFIG_VLLM_PROD_STACK_MAX_REPLICAS=5
34+
CONFIG_VLLM_PROD_STACK_TARGET_GPU_UTILIZATION=80
35+
36+
# Model configuration
37+
CONFIG_VLLM_MODEL_URL="facebook/opt-125m"
38+
CONFIG_VLLM_MODEL_NAME="opt-125m"
39+
40+
# Engine configuration for GPU
41+
CONFIG_VLLM_REPLICA_COUNT=2
42+
CONFIG_VLLM_REQUEST_CPU=8
43+
CONFIG_VLLM_REQUEST_MEMORY="16Gi"
44+
CONFIG_VLLM_REQUEST_GPU=1
45+
CONFIG_VLLM_MAX_MODEL_LEN=2048
46+
CONFIG_VLLM_DTYPE="auto"
47+
CONFIG_VLLM_GPU_MEMORY_UTILIZATION="0.9"
48+
CONFIG_VLLM_TENSOR_PARALLEL_SIZE=1
49+
50+
# Router and observability
51+
CONFIG_VLLM_ROUTER_ENABLED=y
52+
CONFIG_VLLM_ROUTER_ROUND_ROBIN=y
53+
CONFIG_VLLM_OBSERVABILITY_ENABLED=y
54+
CONFIG_VLLM_GRAFANA_PORT=3000
55+
CONFIG_VLLM_PROMETHEUS_PORT=9090
56+
57+
# API configuration
58+
CONFIG_VLLM_API_PORT=8000
59+
CONFIG_VLLM_API_KEY=""
60+
CONFIG_VLLM_HF_TOKEN=""
61+
62+
# Benchmarking
63+
CONFIG_VLLM_BENCHMARK_ENABLED=y
64+
CONFIG_VLLM_BENCHMARK_DURATION=60
65+
CONFIG_VLLM_BENCHMARK_CONCURRENT_USERS=10
66+
CONFIG_VLLM_BENCHMARK_RESULTS_DIR="/data/vllm-benchmark"
Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
---
2+
# Cleanup tasks for bare metal vLLM deployment
3+
# Removes all installed components and data
4+
5+
- name: Stop and remove vLLM systemd service
6+
ansible.builtin.systemd:
7+
name: "{{ vllm_bare_metal_service_name | default('vllm') }}"
8+
state: stopped
9+
enabled: no
10+
become: yes
11+
ignore_errors: yes
12+
13+
- name: Remove vLLM systemd service file
14+
ansible.builtin.file:
15+
path: "/etc/systemd/system/{{ vllm_bare_metal_service_name | default('vllm') }}.service"
16+
state: absent
17+
become: yes
18+
19+
- name: Reload systemd daemon
20+
ansible.builtin.systemd:
21+
daemon_reload: yes
22+
become: yes
23+
24+
- name: Stop all vLLM Docker containers
25+
ansible.builtin.command:
26+
cmd: docker stop $(docker ps -a -q --filter ancestor={{ vllm_bare_metal_image_final }})
27+
ignore_errors: yes
28+
changed_when: false
29+
30+
- name: Remove all vLLM Docker containers
31+
ansible.builtin.command:
32+
cmd: docker rm $(docker ps -a -q --filter ancestor={{ vllm_bare_metal_image_final }})
33+
ignore_errors: yes
34+
changed_when: false
35+
36+
- name: Remove vLLM Docker images
37+
ansible.builtin.command:
38+
cmd: docker rmi {{ vllm_bare_metal_image_final }}
39+
ignore_errors: yes
40+
changed_when: false
41+
42+
- name: Stop minikube if running
43+
ansible.builtin.command:
44+
cmd: minikube stop
45+
ignore_errors: yes
46+
changed_when: false
47+
become: no
48+
49+
- name: Delete minikube cluster
50+
ansible.builtin.command:
51+
cmd: minikube delete
52+
ignore_errors: yes
53+
changed_when: false
54+
become: no
55+
56+
- name: Remove kubectl binary
57+
ansible.builtin.file:
58+
path: /usr/local/bin/kubectl
59+
state: absent
60+
become: yes
61+
when: vllm_cleanup_remove_binaries | default(false)
62+
63+
- name: Remove minikube binary
64+
ansible.builtin.file:
65+
path: /usr/local/bin/minikube
66+
state: absent
67+
become: yes
68+
when: vllm_cleanup_remove_binaries | default(false)
69+
70+
- name: Remove helm binary
71+
ansible.builtin.file:
72+
path: /usr/local/bin/helm
73+
state: absent
74+
become: yes
75+
when: vllm_cleanup_remove_binaries | default(false)
76+
77+
- name: Remove vLLM data directories
78+
ansible.builtin.file:
79+
path: "{{ item }}"
80+
state: absent
81+
become: yes
82+
loop:
83+
- "{{ vllm_bare_metal_data_dir | default('/var/lib/vllm') }}"
84+
- "{{ vllm_bare_metal_log_dir | default('/var/log/vllm') }}"
85+
- "{{ vllm_local_path | default('/data/vllm') }}"
86+
- "{{ vllm_results_dir | default('/data/vllm/results') }}"
87+
when: vllm_cleanup_remove_data | default(false)
88+
89+
- name: Remove /data/minikube directory
90+
ansible.builtin.file:
91+
path: /data/minikube
92+
state: absent
93+
become: yes
94+
when: vllm_cleanup_remove_data | default(false)
95+
96+
- name: Display cleanup completion message
97+
debug:
98+
msg: |
99+
vLLM bare metal cleanup completed.
100+
101+
Removed:
102+
- vLLM systemd service
103+
- vLLM Docker containers and images
104+
- Minikube cluster
105+
106+
To also remove binaries (kubectl, minikube, helm), run:
107+
make vllm-cleanup-full
108+
109+
To remove all data directories, run:
110+
make vllm-cleanup-purge

playbooks/roles/vllm/tasks/deploy-bare-metal.yml

Lines changed: 85 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -47,11 +47,24 @@
4747
set_fact:
4848
container_runtime: "{{ 'docker' if vllm_bare_metal_docker | default(true) else 'podman' }}"
4949

50-
- name: Ensure container runtime is installed
51-
package:
52-
name: "{{ container_runtime }}"
53-
state: present
50+
- name: Ensure Docker service is started and enabled
51+
ansible.builtin.systemd:
52+
name: docker
53+
state: started
54+
enabled: yes
5455
become: yes
56+
when: container_runtime == 'docker'
57+
58+
- name: Add current user to docker group
59+
ansible.builtin.user:
60+
name: "{{ ansible_user_id }}"
61+
groups: docker
62+
append: yes
63+
become: yes
64+
when: container_runtime == 'docker'
65+
66+
- name: Reset connection to apply docker group membership
67+
meta: reset_connection
5568

5669
- name: Install nvidia-container-toolkit for GPU support
5770
when: has_nvidia_gpu
@@ -75,35 +88,65 @@
7588
state: restarted
7689
become: yes
7790

78-
- name: Set vLLM bare metal container image with Docker mirror if enabled
91+
- name: Set vLLM bare metal container images
7992
ansible.builtin.set_fact:
80-
vllm_bare_metal_image_final: >-
81-
{%- if use_docker_mirror | default(false) | bool -%}
82-
{%- if not has_nvidia_gpu -%}
83-
localhost:{{ docker_mirror_port | default(5000) }}/vllm:v0.6.3-cpu
84-
{%- else -%}
85-
localhost:{{ docker_mirror_port | default(5000) }}/vllm-openai:latest
86-
{%- endif -%}
93+
vllm_bare_metal_image_mirror: >-
94+
{%- if not has_nvidia_gpu -%}
95+
localhost:{{ docker_mirror_port | default(5000) }}/vllm:v0.6.3-cpu
8796
{%- else -%}
88-
{%- if not has_nvidia_gpu -%}
89-
substratusai/vllm:v0.6.3-cpu
90-
{%- else -%}
91-
vllm/vllm-openai:latest
92-
{%- endif -%}
97+
localhost:{{ docker_mirror_port | default(5000) }}/vllm-openai:latest
9398
{%- endif -%}
99+
vllm_bare_metal_image_public: >-
100+
{%- if not has_nvidia_gpu -%}
101+
substratusai/vllm:v0.6.3-cpu
102+
{%- else -%}
103+
vllm/vllm-openai:latest
104+
{%- endif -%}
105+
106+
- name: Set initial image to try (mirror if enabled, otherwise public)
107+
ansible.builtin.set_fact:
108+
vllm_bare_metal_image_final: "{{ vllm_bare_metal_image_mirror if (use_docker_mirror | default(false) | bool) else vllm_bare_metal_image_public }}"
109+
110+
- name: Check if vLLM container image already exists
111+
ansible.builtin.command:
112+
cmd: "docker images -q {{ vllm_bare_metal_image_final }}"
113+
register: image_exists
114+
changed_when: false
115+
failed_when: false
94116

95-
- name: Pull vLLM container image
96-
community.docker.docker_image:
97-
name: "{{ vllm_bare_metal_image_final }}"
98-
source: pull
117+
- name: Try pulling from Docker mirror first (if configured)
118+
ansible.builtin.command:
119+
cmd: "docker pull {{ vllm_bare_metal_image_mirror }}"
120+
register: docker_pull_mirror
121+
when:
122+
- use_docker_mirror | default(false) | bool
123+
- image_exists.stdout == ""
124+
failed_when: false
125+
changed_when: "'Downloaded' in docker_pull_mirror.stdout or 'Pull complete' in docker_pull_mirror.stdout"
126+
127+
- name: Fall back to public registry if mirror failed
128+
ansible.builtin.command:
129+
cmd: "docker pull {{ vllm_bare_metal_image_public }}"
130+
register: docker_pull_public
131+
when:
132+
- image_exists.stdout == ""
133+
- (not (use_docker_mirror | default(false) | bool)) or (docker_pull_mirror is defined and docker_pull_mirror.rc != 0)
134+
changed_when: "'Downloaded' in docker_pull_public.stdout or 'Pull complete' in docker_pull_public.stdout"
135+
136+
- name: Update final image name if we used public registry
137+
ansible.builtin.set_fact:
138+
vllm_bare_metal_image_final: "{{ vllm_bare_metal_image_public }}"
139+
when:
140+
- docker_pull_public is defined
141+
- docker_pull_public.rc == 0
99142

100143
- name: Create vLLM systemd service for container
101144
template:
102145
src: vllm-container.service.j2
103146
dest: "/etc/systemd/system/{{ vllm_bare_metal_service_name | default('vllm') }}.service"
104147
mode: '0644'
105148
become: yes
106-
notify: restart vllm
149+
register: systemd_service_container
107150

108151
# Direct installation (pip/source)
109152
- name: Deploy vLLM with direct installation
@@ -155,21 +198,39 @@
155198
dest: "/etc/systemd/system/{{ vllm_bare_metal_service_name | default('vllm') }}.service"
156199
mode: '0644'
157200
become: yes
158-
notify: restart vllm
201+
register: systemd_service_direct
202+
203+
- name: Check if vLLM configuration template exists
204+
stat:
205+
path: "{{ role_path }}/templates/vllm.conf.j2"
206+
register: vllm_conf_template
207+
delegate_to: localhost
159208

160209
- name: Create vLLM configuration file
161210
template:
162211
src: vllm.conf.j2
163212
dest: /etc/vllm/vllm.conf
164213
mode: '0644'
165214
become: yes
166-
notify: restart vllm
215+
register: vllm_config
216+
when: vllm_conf_template.stat.exists
167217

168218
- name: Reload systemd daemon
169219
systemd:
170220
daemon_reload: yes
171221
become: yes
172222

223+
- name: Restart vLLM service if configuration changed
224+
systemd:
225+
name: "{{ vllm_bare_metal_service_name | default('vllm') }}"
226+
state: restarted
227+
daemon_reload: yes
228+
become: yes
229+
when: >-
230+
(systemd_service_container is defined and systemd_service_container.changed) or
231+
(systemd_service_direct is defined and systemd_service_direct.changed) or
232+
(vllm_config is defined and vllm_config.changed)
233+
173234
- name: Start and enable vLLM service
174235
systemd:
175236
name: "{{ vllm_bare_metal_service_name | default('vllm') }}"
@@ -218,10 +279,3 @@
218279
- Stop: sudo systemctl stop {{ vllm_bare_metal_service_name | default('vllm') }}
219280
- Status: sudo systemctl status {{ vllm_bare_metal_service_name | default('vllm') }}
220281
- Logs: sudo journalctl -u {{ vllm_bare_metal_service_name | default('vllm') }} -f
221-
222-
# Handler for restarting vLLM
223-
- name: restart vllm
224-
systemd:
225-
name: "{{ vllm_bare_metal_service_name | default('vllm') }}"
226-
state: restarted
227-
become: yes

0 commit comments

Comments
 (0)