Releases: LegionIO/legion-llm
Releases · LegionIO/legion-llm
v0.8.28
Fixed
- Model/provider mismatch when clients send a model name (e.g.,
qwen3.5:latest) without an explicit provider. The fallback paths blindly paired it withdefault_provider(typicallybedrock), causingRubyLLM::ModelNotFoundError. Now infers the correct provider from model naming patterns before falling back to the global default. arbitrage_fallbackhardcoded:cloudtier and:bedrockprovider when inference failed. Now usesPROVIDER_TIERto resolve the correct tier for the inferred provider.
Added
Router.infer_provider_for_model(model)— public method that maps model naming patterns to providers. Recognizes Ollama-style models (:or/in name), Bedrock (us.*), OpenAI (gpt-*,o1-*/o3-*/o4-*), Anthropic (claude-*), and Gemini (gemini-*).
v0.8.27
Fixed
- vLLM provider sent
developermessage role (OpenAI convention) which Qwen's chat template rejects. AddedVllm::Chatmodule that overridesformat_messagesandformat_roleto always sendsystem. - vLLM provider called
OpenAI::Chat.render_payloadas a module function without provider instance context, causingNoMethodErroronopenai_use_system_role. Rewrote to usesuperwith instance method overrides. - Audit events included the full conversation history in every message — quadratic payload growth. Now caps at the last 20 messages (configurable via
compliance.audit_max_messages). Full conversation reconstructable viaconversation_id.
Added
- vLLM
chat_template_kwargswithenable_thinkingsent on every request so vLLM separates reasoning into thereasoningresponse field instead of inline<think>tags. providers.vllm.enable_thinkingsetting (default:true). Controls whether thinking is enabled for vLLM requests. Per-requestthinkingparam overrides.
v0.8.26
Added
- First-class vLLM provider support. vLLM exposes an OpenAI-compatible API and is registered as a new RubyLLM provider (
:vllm). Configured viaproviders.vllm.base_urlin settings. Mapped to:fleettier in the router. - vLLM discovery via
/v1/modelsendpoint. Caches model list withmax_model_len(context window size) using the same TTL as Ollama discovery. Health checks via/healthendpoint. - Context overflow escalation: when vLLM rejects a request due to context length limits (32k on V100 hardware), the executor automatically falls back to cloud/frontier providers.
Changed
find_fallback_providerinExecutornow skips all local providers (:ollamaand:vllm) when searching for fallbacks, not just:ollama. Ensures context overflow escalates to cloud/frontier.Router::PROVIDER_ORDERupdated::vllminserted after:ollamaand before:bedrock.default_provider_for_tier(:fleet)returns:vllmwhen vLLM is enabled, falls back to:ollama.
v0.8.25
Fixed
StructuredOutput.generate,handle_parse_error, andretry_with_instructionused hash-style access (result[:content],result[:model]) on the return value ofchat_single, butchat_singlereturns aRubyLLM::Messageobject which only supports method access (.content,.model_id). All four access sites now userespond_to?duck-typing so both hash and Message objects work. Visible asundefined method '[]' for an instance of RubyLLM::Messagein Apollo'sllm_detects_conflict?and any structured output caller using non-schema-capable models (e.g. ollama/qwen).Call::Embeddings.generatecrashed withNoMethodErroron.sizewhenresponse.vectorswas a flat array ([0.007, ...]) instead of nested ([[0.007, ...]]). RubyLLM's OpenAI provider unwraps single-input embedding responses. Addednormalize_vectors_firstto detect and handle both flat and nested vector formats before dimension enforcement.
v0.8.24
Fixed
- All AMQP transport messages (audit, metering, tool, escalation) now include identity headers (
x-legion-identity,x-legion-credential,x-legion-hostname) extracted from thecallerfield. Previously only prompt audit events carried identity in the body — tool audit and metering messages had no identity at all. - Embedding metering events now include
callercontext. - Non-pipeline
chat_singlemetering events now includecallercontext from kwargs.
v0.8.23
Fixed
Call::StructuredOutputprompt-fallback path passedmessages:(plural) tochat_singlewhich only acceptsmessage:(singular), leaking the unknown kwarg intoRubyLLM::Chat.new. Visible as repeated "unknown keyword: :messages" warnings during dream cycle contradiction detection. Flattened instruction + messages into a single string viaextract_user_content.
v0.8.22
Fixed
- Error paths in
Executor#run_provider_call_singleand#step_provider_call_streamnow emit audit events (Audit.emit_prompt) before re-raisingRateLimitError,ProviderError, andProviderDown. Previously these errors produced no audit trail. - Escalation exhaustion (
EscalationExhausted) in the pipeline executor now emits an audit event withstatus: 'escalation_exhausted'before raising. assert_external_allowed!in the Inference module now emits an audit event withstatus: 'privacy_blocked'before raisingPrivacyModeError, so enterprise privacy blocks are observable in the audit trail.step_meteringinExecutornow passesrequest_id:andcaller:toSteps::Metering.build_eventso every metering event carries caller identity and request correlation.Steps::Metering.identity_fieldsupdated to includerequest_idandcallerfields in the emitted metering event payload.Call::Embeddings.generatenow emits a metering event viaMetering.emitafter each successfulRubyLLM.embedcall, covering the previously unmetered embedding path.chat_singlein Inference now callsemit_non_pipeline_meteringafter a direct (non-pipeline)session.askso token usage is recorded when the pipeline is disabled.Call::StructuredOutput.generatenow logsinfoon successful parse andwarnonJSON::ParserErrorfor observability.
v0.8.21
Fixed
- Tool audit events are now published to
llm.auditexchange viaAudit.emit_toolsafter each tool execution completes. Previouslyemit_toolswas defined but never called — thellm.audit.toolsqueue was always empty. - Metering events now include
request_type: 'chat'so the routing key ismetering.chatinstead ofmetering.(empty suffix).
v0.8.19
Fixed
Skills::Base#emit_eventpassed a positional Hash toLegion::Events.emit(**payload), causingArgumentErroron every skill activation. Now uses keyword splat correctly.file_editclient tool crashed withTypeError: no implicit conversion of nil into Stringwhen the LLM passed nilold_text/new_text. Now returns an error message to the LLM instead of crashing.tool_trigger_defaults[:tool_limit]reduced from 50 to 10 to prevent trigger word matching from injecting dozens of unrelated extension tools on normal user messages.
v0.8.18
Fixed
- API caller identity no longer hardcoded as
api:inference. The inference route now resolves the actual user viaenv['legion.principal'](from Identity::Middleware),Legion::Identity::Process(LDAP/Kerberos), or OS username (with email domain stripped). Addsusernameandhostnameto therequested_byhash in audit trails.