Releases · LegionIO/legion-llm

25 Apr 20:42

github-actions

v0.8.28

fcd4973

v0.8.28 Latest

Latest

Fixed

Model/provider mismatch when clients send a model name (e.g., qwen3.5:latest) without an explicit provider. The fallback paths blindly paired it with default_provider (typically bedrock), causing RubyLLM::ModelNotFoundError. Now infers the correct provider from model naming patterns before falling back to the global default.
arbitrage_fallback hardcoded :cloud tier and :bedrock provider when inference failed. Now uses PROVIDER_TIER to resolve the correct tier for the inferred provider.

Added

Router.infer_provider_for_model(model) — public method that maps model naming patterns to providers. Recognizes Ollama-style models (: or / in name), Bedrock (us.*), OpenAI (gpt-*, o1-*/o3-*/o4-*), Anthropic (claude-*), and Gemini (gemini-*).

Assets 3

24 Apr 07:58

github-actions

v0.8.27

2067bbf

v0.8.27

Fixed

vLLM provider sent developer message role (OpenAI convention) which Qwen's chat template rejects. Added Vllm::Chat module that overrides format_messages and format_role to always send system.
vLLM provider called OpenAI::Chat.render_payload as a module function without provider instance context, causing NoMethodError on openai_use_system_role. Rewrote to use super with instance method overrides.
Audit events included the full conversation history in every message — quadratic payload growth. Now caps at the last 20 messages (configurable via compliance.audit_max_messages). Full conversation reconstructable via conversation_id.

Added

vLLM chat_template_kwargs with enable_thinking sent on every request so vLLM separates reasoning into the reasoning response field instead of inline <think> tags.
providers.vllm.enable_thinking setting (default: true). Controls whether thinking is enabled for vLLM requests. Per-request thinking param overrides.

Assets 3

24 Apr 06:55

github-actions

v0.8.26

f772d7f

v0.8.26

Added

First-class vLLM provider support. vLLM exposes an OpenAI-compatible API and is registered as a new RubyLLM provider (:vllm). Configured via providers.vllm.base_url in settings. Mapped to :fleet tier in the router.
vLLM discovery via /v1/models endpoint. Caches model list with max_model_len (context window size) using the same TTL as Ollama discovery. Health checks via /health endpoint.
Context overflow escalation: when vLLM rejects a request due to context length limits (32k on V100 hardware), the executor automatically falls back to cloud/frontier providers.

Changed

find_fallback_provider in Executor now skips all local providers (:ollama and :vllm) when searching for fallbacks, not just :ollama. Ensures context overflow escalates to cloud/frontier.
Router::PROVIDER_ORDER updated: :vllm inserted after :ollama and before :bedrock.
default_provider_for_tier(:fleet) returns :vllm when vLLM is enabled, falls back to :ollama.

Assets 3

24 Apr 06:01

github-actions

v0.8.25

2c36211

v0.8.25

Fixed

StructuredOutput.generate, handle_parse_error, and retry_with_instruction used hash-style access (result[:content], result[:model]) on the return value of chat_single, but chat_single returns a RubyLLM::Message object which only supports method access (.content, .model_id). All four access sites now use respond_to? duck-typing so both hash and Message objects work. Visible as undefined method '[]' for an instance of RubyLLM::Message in Apollo's llm_detects_conflict? and any structured output caller using non-schema-capable models (e.g. ollama/qwen).
Call::Embeddings.generate crashed with NoMethodError on .size when response.vectors was a flat array ([0.007, ...]) instead of nested ([[0.007, ...]]). RubyLLM's OpenAI provider unwraps single-input embedding responses. Added normalize_vectors_first to detect and handle both flat and nested vector formats before dimension enforcement.

Assets 3

23 Apr 15:06

github-actions

v0.8.24

adb01e1

v0.8.24

Fixed

All AMQP transport messages (audit, metering, tool, escalation) now include identity headers (x-legion-identity, x-legion-credential, x-legion-hostname) extracted from the caller field. Previously only prompt audit events carried identity in the body — tool audit and metering messages had no identity at all.
Embedding metering events now include caller context.
Non-pipeline chat_single metering events now include caller context from kwargs.

Assets 3

23 Apr 05:55

github-actions

v0.8.23

94e6425

v0.8.23

Fixed

Call::StructuredOutput prompt-fallback path passed messages: (plural) to chat_single which only accepts message: (singular), leaking the unknown kwarg into RubyLLM::Chat.new. Visible as repeated "unknown keyword: :messages" warnings during dream cycle contradiction detection. Flattened instruction + messages into a single string via extract_user_content.

Assets 3

23 Apr 05:20

github-actions

v0.8.22

1ebb572

v0.8.22

Fixed

Error paths in Executor#run_provider_call_single and #step_provider_call_stream now emit audit events (Audit.emit_prompt) before re-raising RateLimitError, ProviderError, and ProviderDown. Previously these errors produced no audit trail.
Escalation exhaustion (EscalationExhausted) in the pipeline executor now emits an audit event with status: 'escalation_exhausted' before raising.
assert_external_allowed! in the Inference module now emits an audit event with status: 'privacy_blocked' before raising PrivacyModeError, so enterprise privacy blocks are observable in the audit trail.
step_metering in Executor now passes request_id: and caller: to Steps::Metering.build_event so every metering event carries caller identity and request correlation.
Steps::Metering.identity_fields updated to include request_id and caller fields in the emitted metering event payload.
Call::Embeddings.generate now emits a metering event via Metering.emit after each successful RubyLLM.embed call, covering the previously unmetered embedding path.
chat_single in Inference now calls emit_non_pipeline_metering after a direct (non-pipeline) session.ask so token usage is recorded when the pipeline is disabled.
Call::StructuredOutput.generate now logs info on successful parse and warn on JSON::ParserError for observability.

Assets 3

23 Apr 05:06

github-actions

v0.8.21

1f8b746

v0.8.21

Fixed

Tool audit events are now published to llm.audit exchange via Audit.emit_tools after each tool execution completes. Previously emit_tools was defined but never called — the llm.audit.tools queue was always empty.
Metering events now include request_type: 'chat' so the routing key is metering.chat instead of metering. (empty suffix).

Assets 3

23 Apr 04:10

github-actions

v0.8.19

497fda9

v0.8.19

Fixed

Skills::Base#emit_event passed a positional Hash to Legion::Events.emit(**payload), causing ArgumentError on every skill activation. Now uses keyword splat correctly.
file_edit client tool crashed with TypeError: no implicit conversion of nil into String when the LLM passed nil old_text/new_text. Now returns an error message to the LLM instead of crashing.
tool_trigger_defaults[:tool_limit] reduced from 50 to 10 to prevent trigger word matching from injecting dozens of unrelated extension tools on normal user messages.

Assets 3

23 Apr 02:50

github-actions

v0.8.18

e8013a9

v0.8.18

Fixed

API caller identity no longer hardcoded as api:inference. The inference route now resolves the actual user via env['legion.principal'] (from Identity::Middleware), Legion::Identity::Process (LDAP/Kerberos), or OS username (with email domain stripped). Adds username and hostname to the requested_by hash in audit trails.

Assets 3

Releases: LegionIO/legion-llm

v0.8.28

Fixed

Added

Uh oh!

v0.8.27

Fixed

Added

Uh oh!

v0.8.26

Added

Changed

Uh oh!

v0.8.25

Fixed

Uh oh!

v0.8.24

Fixed

Uh oh!

v0.8.23

Fixed

Uh oh!

v0.8.22

Fixed

Uh oh!

v0.8.21

Fixed

Uh oh!

v0.8.19

Fixed

Uh oh!

v0.8.18

Fixed

Uh oh!