Skip to content

Vllm support#17

Open
IRENEKO wants to merge 5 commits into
IBM:mainfrom
IRENEKO:vllm-support
Open

Vllm support#17
IRENEKO wants to merge 5 commits into
IBM:mainfrom
IRENEKO:vllm-support

Conversation

@IRENEKO

@IRENEKO IRENEKO commented Apr 7, 2026

Copy link
Copy Markdown
Contributor

AISteer360 steers models by registering PyTorch forward hooks (pre/forward/backward) on specific layers. This works great when you own the model object. But vLLM spawns worker subprocesses that each load their own copy of the model, and you can't just pass a hooked model across process boundaries.

Main components:

  1. backend.py
  • Serializes the StateControl into a JSON "recipe" (via recipe.py)
  • Writes it to a temp file, sets VLLM_AISTEER360_CONFIG env var
  • Registers AISteer360Worker in vLLM-Hook's plugin registry
  • Boots a HookLLM engine that will use that worker
  1. recipe.py
  • serialize_state_control() : Extracts everything needed to rebuild the StateControl, including

    • The class path (e.g. aisteer360.algorithms.state_control.caa.control.CAAControl)
    • Constructor args from the dataclass
    • Pre-computed steering vectors saved as .svec files
    • Resolved layer IDs/names (so the worker doesn't re-compute them)
  • reconstruct_state_control(): Rebuilds it by dynamically importing the class, loading vectors from disk, and swapping prompt args with pre-computed vectors to skip extraction.

  1. worker.py

AISteer360Worker extends vLLM's V1Worker. After load_model():

  • Reads the recipe JSON from the env var path
  • Reconstructs the StateControl
  • Calls state_control.steer(model) on the real model
  • Gets hooks and wraps them with _gated_hook() which does two things:
    • Flag-file gating: HookLLM can toggle steering on/off by creating/removing a flag file
    • Tensor shape normalization: AISteer360 hooks expect [B, T, H] (3-D) but vLLM passes [N, H] (2-D), so it unsqueezes before the hook and squeezes after

Additional revisions:

  1. Minor helpers for vllm I/O
  2. Backend reroute and support in aisteer360/algorithms/core/steering_pipeline.py for backend="vllm"
  3. Tiny dummy example test_vllm_state.py showing how to use

@emiehling Please advise

cyko@ibm.com;6J3007897;Irene Ko added 5 commits April 7, 2026 15:54
Signed-off-by: cyko@ibm.com;6J3007897;Irene Ko <cyko1@ccc-login5.pok.ibm.com>
…r vllm

Signed-off-by: cyko@ibm.com;6J3007897;Irene Ko <cyko1@ccc-login5.pok.ibm.com>
Signed-off-by: cyko@ibm.com;6J3007897;Irene Ko <cyko1@ccc-login5.pok.ibm.com>
Signed-off-by: cyko@ibm.com;6J3007897;Irene Ko <cyko1@ccc-login5.pok.ibm.com>
Signed-off-by: cyko@ibm.com;6J3007897;Irene Ko <cyko1@ccc-login5.pok.ibm.com>
@emiehling

Copy link
Copy Markdown
Collaborator

Thanks @IRENEKO I'll dig into this shortly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants