Summary
OpenAIFilterer.Filter builds its request with three consecutive system messages followed by the user message. Many chat templates only allow a single, leading system message and raise on anything else. When the backend enforces this (e.g. Qwen3 served by llama.cpp with --jinja), every request returns HTTP 400 before the model runs. Because the filter returns FilterOnFailure on any error — and FilterOnFailure defaults to true — every message is silently filtered out. For an OpenAI-filter → Discord pipeline, this looks like the integration has simply stopped forwarding anything.
Version: v1.3.2 (latest). Backend: llama.cpp server (ghcr.io/ggml-org/llama.cpp:server-*) with --jinja, Qwen3-family GGUF.
Error returned by the backend
POST ".../chat/completions": 400 Bad Request
{"error":{"code":400,"message":"Unable to generate parser for this template. Automatic parser generation failed:
------------
While executing CallExpression at line 85, column 32 in source:
...first %}\n {{- raise_exception('System message must be at the beginnin...
^
Error: Jinja Exception: System message must be at the beginning.","type":"invalid_request_error"}}
Followed by:
WARN error filtering with OpenAIFilterer in step number 5: ... 400 Bad Request ...
INFO message ... was filtered in step 5 by OpenAIFilterer
Root cause
In filter_openai.go, the request is assembled as:
chatCompletion, err := client.Chat.Completions.New(context.TODO(),
openai.ChatCompletionNewParams{
Messages: openai.F([]openai.ChatCompletionMessageParamUnion{
openai.SystemMessage(OpenAISystemPrompt), // system #1
openai.SystemMessage(o.UserPrompt), // system #2
openai.SystemMessage(OpenAIFinalInstructions),// system #3
openai.UserMessage(ms),
}),
...
Three system messages in a row. Strict templates (Qwen3, and others) reject non-leading/repeated system messages via raise_exception(...), so the call 400s. On that error the function returns o.FilterOnFailure, which defaults to true, so the message is dropped.
Suggested fix
The sibling annotator already does this correctly — it concatenates everything into a single system message, see annotator_openai.go:125:
openai.SystemMessage(OpenAIAnnotatorFirstInstructions + a.UserPrompt + OpenAIAnnotatorFinalInstructions),
openai.UserMessage("Here is the message to evaluate:\n" + msg),
The filter should mirror this: collapse OpenAISystemPrompt + o.UserPrompt + OpenAIFinalInstructions into one SystemMessage (or move the instructions into the user turn). That keeps a single leading system message and works across both lenient and strict templates.
Impact / severity
- Silent: with
FilterOnFailure: true (the default), there is no user-facing error — messages just stop flowing, easily mistaken for "no matching traffic."
- Affects any OpenAI-compatible backend that enforces single-leading-system templates; notably
llama.cpp --jinja with Qwen3 models, a common self-hosted setup that the README's http://llama-server:8080/v1 examples point at.
Repro
- Run
llama.cpp server with a Qwen3 GGUF and --jinja.
- Configure an
OpenAI filter step pointing at it (URL: http://.../v1, any UserPrompt).
- Send any message with text. Backend returns 400 (
System message must be at the beginning); the message is filtered out.
Workaround (until fixed)
Override the model's chat template to a lenient one (e.g. --chat-template chatml) so repeated system messages are accepted — at the cost of the model's native (thinking/tool-call) template — or set FilterOnFailure: false to fail open. Neither is a real fix; the message construction above is the bug.
Summary
OpenAIFilterer.Filterbuilds its request with three consecutivesystemmessages followed by the user message. Many chat templates only allow a single, leading system message and raise on anything else. When the backend enforces this (e.g. Qwen3 served byllama.cppwith--jinja), every request returns HTTP 400 before the model runs. Because the filter returnsFilterOnFailureon any error — andFilterOnFailuredefaults totrue— every message is silently filtered out. For an OpenAI-filter → Discord pipeline, this looks like the integration has simply stopped forwarding anything.Version: v1.3.2 (latest). Backend:
llama.cppserver (ghcr.io/ggml-org/llama.cpp:server-*) with--jinja, Qwen3-family GGUF.Error returned by the backend
Followed by:
Root cause
In
filter_openai.go, the request is assembled as:Three
systemmessages in a row. Strict templates (Qwen3, and others) reject non-leading/repeated system messages viaraise_exception(...), so the call 400s. On that error the function returnso.FilterOnFailure, which defaults totrue, so the message is dropped.Suggested fix
The sibling annotator already does this correctly — it concatenates everything into a single system message, see
annotator_openai.go:125:The filter should mirror this: collapse
OpenAISystemPrompt + o.UserPrompt + OpenAIFinalInstructionsinto oneSystemMessage(or move the instructions into the user turn). That keeps a single leading system message and works across both lenient and strict templates.Impact / severity
FilterOnFailure: true(the default), there is no user-facing error — messages just stop flowing, easily mistaken for "no matching traffic."llama.cpp --jinjawith Qwen3 models, a common self-hosted setup that the README'shttp://llama-server:8080/v1examples point at.Repro
llama.cppserver with a Qwen3 GGUF and--jinja.OpenAIfilter step pointing at it (URL: http://.../v1, anyUserPrompt).System message must be at the beginning); the message is filtered out.Workaround (until fixed)
Override the model's chat template to a lenient one (e.g.
--chat-template chatml) so repeated system messages are accepted — at the cost of the model's native (thinking/tool-call) template — or setFilterOnFailure: falseto fail open. Neither is a real fix; the message construction above is the bug.