Skip to content

Releases: LegionIO/legion-llm

v0.8.28

25 Apr 20:42
fcd4973

Choose a tag to compare

Fixed

  • Model/provider mismatch when clients send a model name (e.g., qwen3.5:latest) without an explicit provider. The fallback paths blindly paired it with default_provider (typically bedrock), causing RubyLLM::ModelNotFoundError. Now infers the correct provider from model naming patterns before falling back to the global default.
  • arbitrage_fallback hardcoded :cloud tier and :bedrock provider when inference failed. Now uses PROVIDER_TIER to resolve the correct tier for the inferred provider.

Added

  • Router.infer_provider_for_model(model) — public method that maps model naming patterns to providers. Recognizes Ollama-style models (: or / in name), Bedrock (us.*), OpenAI (gpt-*, o1-*/o3-*/o4-*), Anthropic (claude-*), and Gemini (gemini-*).

v0.8.27

24 Apr 07:58
2067bbf

Choose a tag to compare

Fixed

  • vLLM provider sent developer message role (OpenAI convention) which Qwen's chat template rejects. Added Vllm::Chat module that overrides format_messages and format_role to always send system.
  • vLLM provider called OpenAI::Chat.render_payload as a module function without provider instance context, causing NoMethodError on openai_use_system_role. Rewrote to use super with instance method overrides.
  • Audit events included the full conversation history in every message — quadratic payload growth. Now caps at the last 20 messages (configurable via compliance.audit_max_messages). Full conversation reconstructable via conversation_id.

Added

  • vLLM chat_template_kwargs with enable_thinking sent on every request so vLLM separates reasoning into the reasoning response field instead of inline <think> tags.
  • providers.vllm.enable_thinking setting (default: true). Controls whether thinking is enabled for vLLM requests. Per-request thinking param overrides.

v0.8.26

24 Apr 06:55
f772d7f

Choose a tag to compare

Added

  • First-class vLLM provider support. vLLM exposes an OpenAI-compatible API and is registered as a new RubyLLM provider (:vllm). Configured via providers.vllm.base_url in settings. Mapped to :fleet tier in the router.
  • vLLM discovery via /v1/models endpoint. Caches model list with max_model_len (context window size) using the same TTL as Ollama discovery. Health checks via /health endpoint.
  • Context overflow escalation: when vLLM rejects a request due to context length limits (32k on V100 hardware), the executor automatically falls back to cloud/frontier providers.

Changed

  • find_fallback_provider in Executor now skips all local providers (:ollama and :vllm) when searching for fallbacks, not just :ollama. Ensures context overflow escalates to cloud/frontier.
  • Router::PROVIDER_ORDER updated: :vllm inserted after :ollama and before :bedrock.
  • default_provider_for_tier(:fleet) returns :vllm when vLLM is enabled, falls back to :ollama.

v0.8.25

24 Apr 06:01
2c36211

Choose a tag to compare

Fixed

  • StructuredOutput.generate, handle_parse_error, and retry_with_instruction used hash-style access (result[:content], result[:model]) on the return value of chat_single, but chat_single returns a RubyLLM::Message object which only supports method access (.content, .model_id). All four access sites now use respond_to? duck-typing so both hash and Message objects work. Visible as undefined method '[]' for an instance of RubyLLM::Message in Apollo's llm_detects_conflict? and any structured output caller using non-schema-capable models (e.g. ollama/qwen).
  • Call::Embeddings.generate crashed with NoMethodError on .size when response.vectors was a flat array ([0.007, ...]) instead of nested ([[0.007, ...]]). RubyLLM's OpenAI provider unwraps single-input embedding responses. Added normalize_vectors_first to detect and handle both flat and nested vector formats before dimension enforcement.

v0.8.24

23 Apr 15:06
adb01e1

Choose a tag to compare

Fixed

  • All AMQP transport messages (audit, metering, tool, escalation) now include identity headers (x-legion-identity, x-legion-credential, x-legion-hostname) extracted from the caller field. Previously only prompt audit events carried identity in the body — tool audit and metering messages had no identity at all.
  • Embedding metering events now include caller context.
  • Non-pipeline chat_single metering events now include caller context from kwargs.

v0.8.23

23 Apr 05:55
94e6425

Choose a tag to compare

Fixed

  • Call::StructuredOutput prompt-fallback path passed messages: (plural) to chat_single which only accepts message: (singular), leaking the unknown kwarg into RubyLLM::Chat.new. Visible as repeated "unknown keyword: :messages" warnings during dream cycle contradiction detection. Flattened instruction + messages into a single string via extract_user_content.

v0.8.22

23 Apr 05:20
1ebb572

Choose a tag to compare

Fixed

  • Error paths in Executor#run_provider_call_single and #step_provider_call_stream now emit audit events (Audit.emit_prompt) before re-raising RateLimitError, ProviderError, and ProviderDown. Previously these errors produced no audit trail.
  • Escalation exhaustion (EscalationExhausted) in the pipeline executor now emits an audit event with status: 'escalation_exhausted' before raising.
  • assert_external_allowed! in the Inference module now emits an audit event with status: 'privacy_blocked' before raising PrivacyModeError, so enterprise privacy blocks are observable in the audit trail.
  • step_metering in Executor now passes request_id: and caller: to Steps::Metering.build_event so every metering event carries caller identity and request correlation.
  • Steps::Metering.identity_fields updated to include request_id and caller fields in the emitted metering event payload.
  • Call::Embeddings.generate now emits a metering event via Metering.emit after each successful RubyLLM.embed call, covering the previously unmetered embedding path.
  • chat_single in Inference now calls emit_non_pipeline_metering after a direct (non-pipeline) session.ask so token usage is recorded when the pipeline is disabled.
  • Call::StructuredOutput.generate now logs info on successful parse and warn on JSON::ParserError for observability.

v0.8.21

23 Apr 05:06
1f8b746

Choose a tag to compare

Fixed

  • Tool audit events are now published to llm.audit exchange via Audit.emit_tools after each tool execution completes. Previously emit_tools was defined but never called — the llm.audit.tools queue was always empty.
  • Metering events now include request_type: 'chat' so the routing key is metering.chat instead of metering. (empty suffix).

v0.8.19

23 Apr 04:10
497fda9

Choose a tag to compare

Fixed

  • Skills::Base#emit_event passed a positional Hash to Legion::Events.emit(**payload), causing ArgumentError on every skill activation. Now uses keyword splat correctly.
  • file_edit client tool crashed with TypeError: no implicit conversion of nil into String when the LLM passed nil old_text/new_text. Now returns an error message to the LLM instead of crashing.
  • tool_trigger_defaults[:tool_limit] reduced from 50 to 10 to prevent trigger word matching from injecting dozens of unrelated extension tools on normal user messages.

v0.8.18

23 Apr 02:50
e8013a9

Choose a tag to compare

Fixed

  • API caller identity no longer hardcoded as api:inference. The inference route now resolves the actual user via env['legion.principal'] (from Identity::Middleware), Legion::Identity::Process (LDAP/Kerberos), or OS username (with email domain stripped). Adds username and hostname to the requested_by hash in audit trails.