fix(inference): save merged LoRA adapter in flat layout for vLLM by Manuscrit · Pull Request #58 · longtermrisk/openweights

Manuscrit · 2026-04-04T16:50:47Z

Summary

- When merging multiple LoRA adapters via PEFT, save_pretrained creates subdirectories per adapter (e.g. /tmp/merged_lora/combined/adapter_config.json), but vLLM expects a flat layout (/tmp/merged_lora/adapter_config.json)
- Fix: delete the source adapters before saving so only the "combined" adapter remains, producing the flat layout vLLM expects
- Without this fix, multi-LoRA inference jobs crash immediately with FileNotFoundError: No such file or directory: /tmp/merged_lora/adapter_config.json

## Test plan
- [ ] Run a multi-LoRA inference job (2+ adapters) and verify it completes without the FileNotFoundError
- [ ] Verify /tmp/merged_lora/adapter_config.json exists at the top level after merge
- [ ] Verify single-adapter inference (no merge path) still works unchanged

…nference When `lora_adapters` (List[str]) is supplied, the job merges all adapters into a single combined adapter via PEFT linear combination on CPU before vLLM is initialised. This keeps the merged rank identical to the input rank so vLLM's max_lora_rank constraint is never violated. Key changes: - `InferenceConfig`: new `lora_adapters` field; validated to require ≥ 2 entries (single adapter stays in `model` as before, preserving compat). - `InferenceJobs.create()`: client-side rank-equality assertion across all adapters, with a clear error before any GPU time is spent. - `cli.py`: new `download_adapter()` helper (handles org/repo/subfolder paths); new `merge_lora_adapters()` runs PEFT `add_weighted_adapter` (combination_type="linear") on CPU, saves the combined adapter to /tmp/merged_lora/, then frees memory before vLLM loads. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

When multiple LoRA adapters are loaded, PEFT's save_pretrained creates subdirectories per adapter (e.g. /tmp/merged_lora/combined/). vLLM expects adapter_config.json at the top level. Delete the source adapters before saving so only "combined" remains, producing a flat layout. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

slacki-ai and others added 2 commits April 4, 2026 10:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(inference): save merged LoRA adapter in flat layout for vLLM#58

fix(inference): save merged LoRA adapter in flat layout for vLLM#58
Manuscrit wants to merge 2 commits intolongtermrisk:v0.9from
slacki-ai:fix/multi-lora-save-flat

Manuscrit commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Manuscrit commented Apr 4, 2026

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants