Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,18 @@ AGENT_FALLBACK_MODELS=
AGENT_PROVIDER_ORDER=
AGENT_PROVIDER_ALLOW_FALLBACKS=true
AGENT_SERVER_TOOLS=
# Tool-call approval. A gated tool call posts an Approve / Reject prompt in
# the channel and waits for the person who asked; a rejected call never runs
# and the model is told it was declined. AGENT_APPROVAL_TOOLS is a
# comma-separated list of tool names to gate (e.g.
# "files.write,files.delete"); AGENT_APPROVAL_RISKS a comma-separated list of
# risk tiers to gate (read, safe, mutate). Both default empty, which leaves
# approval off and changes nothing. AGENT_APPROVAL_TIMEOUT_S bounds the wait
# for a decision; keep it well below AI_REPLY_TIMEOUT_S, since the wait counts
# against the turn.
AGENT_APPROVAL_TOOLS=
AGENT_APPROVAL_RISKS=
AGENT_APPROVAL_TIMEOUT_S=45

# ── Memory ────────────────────────────────────────────────────────────────────
# Hours before a stored user memory is considered stale and refreshed.
Expand Down
20 changes: 16 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,10 +36,10 @@ runs the test suite.
- **File workspace** -- the `files.*` and `shell.run` tools give the model a
private scratch directory, one per server (per user in a DM). It is
sandboxed: paths cannot escape it, files and total size are capped, and the
shell runs only an allowlist of read-only commands -- it searches with
ripgrep, falling back to grep only when ripgrep cannot serve. Configured
with the `WORKSPACE_*` variables; turn it off entirely with
`WORKSPACE_ENABLED`.
shell runs only an allowlist of read-only commands -- it searches a
directory or the whole workspace with ripgrep and a single named file with
grep. Configured with the `WORKSPACE_*` variables; turn it off entirely
with `WORKSPACE_ENABLED`.
- **Lua plugins** -- a full plugin system. A plugin is one `.lua` file that
can register prefix commands, agent tools, background loops and event
handlers, and reach out through an HTTP client, a Discord read/write API,
Expand Down Expand Up @@ -178,6 +178,18 @@ feature is safe to leave on. Stop conditions are tunable: `AGENT_MAX_STEPS`
caps model steps per turn and `AGENT_MAX_COST` sets an optional per-turn
dollar ceiling.

Two within-turn controls run on top of that loop. A tool may return a
`next_turn` block in its result to steer the following model turn -- a
different model, a lower temperature, a tighter token budget, or extra
instructions -- which both the sidecar (through the Agent SDK's
`nextTurnParams`) and the in-process loop apply before the model is asked
again. And a tool call can be gated on human approval: list tool names in
`AGENT_APPROVAL_TOOLS` or risk tiers in `AGENT_APPROVAL_RISKS`, and a gated
call posts an Approve / Reject prompt in the channel before it runs, with a
rejected call handed back to the model as a declined result. Approval is
resolved entirely on the bot side within the turn, so the sidecar stays
stateless per turn and conversation state stays in the bot.

## Tool execution pipeline

A tool result never goes straight from a handler to the model. It travels a
Expand Down
102 changes: 94 additions & 8 deletions agent-sidecar/src/server.ts
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
* models?, provider?, tools, server_tools?, ... }
* sidecar -> bot : { type: "delta", text } streamed model text
* sidecar -> bot : { type: "tool_call", call_id, name, arguments }
* bot -> sidecar : { type: "tool_result", call_id, result }
* bot -> sidecar : { type: "tool_result", call_id, result, next_turn? }
* sidecar -> bot : { type: "done", text, finish_reason, model, usage,
* tool_names }
* sidecar -> bot : { type: "error", error }
Expand All @@ -27,6 +27,9 @@
* over the same socket: the SDK calls execute(), the sidecar emits a
* tool_call, and the bot answers with a tool_result. That keeps the tool
* registry, the execution pipeline and the Lua plugins exactly where they are.
* A tool_result may also carry a `next_turn` directive -- the bot's way of
* steering the following model turn (model, temperature, token budget,
* instructions) -- which the sidecar feeds to the SDK's nextTurnParams.
*/
import { createServer } from 'node:http';
import { readFileSync } from 'node:fs';
Expand All @@ -53,7 +56,7 @@ const API_KEY = process.env.OPENROUTER_API_KEY || '';
* frame; either side treats a mismatch as a reason to fall back rather than
* risk talking past a half-deployed peer.
*/
const PROTOCOL_VERSION = 1;
const PROTOCOL_VERSION = 2;

/** Best-effort version of the bundled Agent SDK, surfaced for logs only. */
function readSdkVersion(): string {
Expand Down Expand Up @@ -96,6 +99,7 @@ interface StartMessage {
}

interface PendingToolCall {
name: string;
resolve: (value: unknown) => void;
reject: (reason: unknown) => void;
}
Expand All @@ -107,23 +111,66 @@ function errorMessage(err: unknown): string {
return String(err);
}

/**
* Translate a bot-supplied `next_turn` directive into the SDK's
* nextTurnParams shape. The bridge speaks snake_case; the SDK speaks
* camelCase. Only the four parameters both the sidecar and the bot's
* in-process loop can honour identically are carried; an unknown or
* mistyped key is dropped, so a malformed directive can never break a turn.
*/
function translateNextTurn(raw: unknown): Record<string, unknown> {
const out: Record<string, unknown> = {};
if (!raw || typeof raw !== 'object') {
return out;
}
const directive = raw as Record<string, unknown>;
if (typeof directive.model === 'string' && directive.model) {
out.model = directive.model;
}
if (typeof directive.instructions === 'string' && directive.instructions) {
out.instructions = directive.instructions;
}
if (typeof directive.temperature === 'number') {
out.temperature = directive.temperature;
}
if (typeof directive.max_output_tokens === 'number') {
out.maxOutputTokens = directive.max_output_tokens;
}
return out;
}

/**
* One chat turn: drives callModel and bridges tool calls back to the bot.
*
* The sidecar is deliberately stateless per turn -- it opens one WebSocket,
* runs one turn and closes. Conversation history, memory and traits all live
* in the Python bot. The Agent SDK also offers a stateful mode (a persistent
* turn-state accessor and approval-gated tool pausing); the bridge does NOT
* use it, because that would split one turn's state across two runtimes.
* Anything stateful belongs on the Python side. A build guard
* (tests/test_sidecar_guards.py) fails CI if that surface appears here -- if a
* future change genuinely needs it, that guard must be revisited deliberately.
* turn-state accessor and approval-gated tool pausing across separate
* request/response cycles); the bridge does NOT use it, because that would
* split one turn's state across two runtimes. Anything stateful belongs on
* the Python side. A build guard (tests/test_sidecar_guards.py) fails CI if
* that surface appears here -- if a future change genuinely needs it, that
* guard must be revisited deliberately.
*
* Two within-turn controls do ride the bridge, because neither needs
* persistent state. A tool may return a next_turn directive on its
* tool_result frame to steer the following model turn (model, temperature,
* token budget, instructions); the sidecar feeds it to the SDK's
* nextTurnParams. Tool-call approval is resolved entirely on the Python side
* within the turn -- a gated tool simply makes the bot withhold its
* tool_result until a human decides -- so the gate never reaches this
* stateless surface.
*/
class Session {
private readonly ws: WebSocket;
private readonly client: OpenRouter;
private readonly pending = new Map<string, PendingToolCall>();
private readonly toolNames: string[] = [];
// The latest next_turn directive each tool returned, SDK-keyed and keyed by
// tool name. A tool's nextTurnParams functions read it after the tool runs;
// the next call of that tool overwrites it, so a stale directive is never
// applied twice.
private readonly nextTurnByName = new Map<string, Record<string, unknown>>();
private callCounter = 0;
private started = false;
private turnId = '-';
Expand Down Expand Up @@ -203,6 +250,9 @@ class Session {
const entry = this.pending.get(String(msg.call_id));
if (entry) {
this.pending.delete(String(msg.call_id));
// Record (or clear) this tool's next-turn directive before resolving,
// so the SDK sees it when it runs nextTurnParams after execute().
this.nextTurnByName.set(entry.name, translateNextTurn(msg.next_turn));
entry.resolve(msg.result);
}
}
Expand All @@ -213,7 +263,7 @@ class Session {
const callId = `t${++this.callCounter}`;
this.toolNames.push(name);
const promise = new Promise<unknown>((resolve, reject) => {
this.pending.set(callId, { resolve, reject });
this.pending.set(callId, { name, resolve, reject });
});
this.send({
type: 'tool_call',
Expand All @@ -228,11 +278,47 @@ class Session {
return schemas.map((entry) => {
const fn = (entry.function ?? entry) as Record<string, any>;
const name = String(fn.name || '');
// Each nextTurnParams function returns this tool's last directive value
// for that parameter, or the request's current value when it set none.
// The SDK threads the current value through every called tool, so a
// tool that sets no directive returns the value unchanged and never
// clobbers a directive an earlier tool in the round did set.
return tool({
name,
description: String(fn.description || ''),
inputSchema: toZodObject(fn.parameters),
execute: async (args: unknown) => this.runTool(name, args),
nextTurnParams: {
model: (_params, context): string => {
const d = this.nextTurnByName.get(name);
if (d && typeof d.model === 'string' && d.model) {
return d.model;
}
// No directive: keep the current model. When the request uses a
// models fallback array context.model is empty -- returning null
// there leaves the model parameter unset rather than pinning it
// to an empty string the API would reject.
return (context.model || null) as string;
},
instructions: (_params, context) => {
const d = this.nextTurnByName.get(name);
return d && typeof d.instructions === 'string'
? d.instructions
: context.instructions;
},
temperature: (_params, context) => {
const d = this.nextTurnByName.get(name);
return d && typeof d.temperature === 'number'
? d.temperature
: context.temperature;
},
maxOutputTokens: (_params, context) => {
const d = this.nextTurnByName.get(name);
return d && typeof d.maxOutputTokens === 'number'
? d.maxOutputTokens
: context.maxOutputTokens;
},
},
});
});
}
Expand Down
163 changes: 163 additions & 0 deletions ai/agent_control.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
"""ai/agent_control.py -- per-turn agent controls: next-turn parameters and
tool-call approval.

Two cross-cutting agent features live here, kept out of :mod:`ai.tools` and
:mod:`ai.agent_sidecar` so the in-process loop and the sidecar bridge share
one implementation:

* **Next-turn parameters.** A tool may steer the next model turn by
returning a ``next_turn`` block in its result -- a different model, a new
temperature, a tighter token budget, or extra instructions. This is the
Python-native shape of the OpenRouter Agent SDK's ``nextTurnParams``: the
tool decides at run time, the orchestrator applies the change before the
model is asked again. :func:`split_next_turn` peels the directive off a
raw tool return; :func:`apply_next_turn` folds it into the in-process
loop's working parameters. The sidecar forwards the same directive over
the bridge for the SDK's ``nextTurnParams`` to consume.

* **Tool-call approval.** A tool call may be gated on a human yes/no before
it runs. :func:`needs_approval` decides whether a given tool is gated --
by an explicit ``requires_approval`` flag on the tool, by tool name, or
by risk tier, the latter two driven by config. :func:`request_tool_approval`
asks the turn's approver (a button prompt in the channel, supplied by the
chat cog through :class:`ai.tools.ToolContext`) and returns the decision.
When no approver is reachable a gated call is denied, never silently run.

Neither feature reaches for the Agent SDK's persistent state surface: a
next-turn directive rides the existing tool-result frame within one turn, and
approval is resolved entirely on the Python side before the turn ends. The
sidecar stays stateless per turn.
"""
from __future__ import annotations

import logging

from config import Config

log = logging.getLogger(__name__)

# Parameters a tool may change for the next model turn. Kept deliberately
# small: every key here is one both the sidecar and the in-process loop can
# honour identically, so a directive behaves the same on either path.
NEXT_TURN_KEYS = ("model", "instructions", "temperature", "max_output_tokens")

# Hard ceilings on a next-turn directive. Bot tool handlers are trusted, but a
# Lua plugin tool is less so -- these bound a directive to sane values.
_MAX_INSTRUCTION_CHARS = 8000
_MAX_OUTPUT_TOKENS = 8000


def sanitize_next_turn(raw) -> dict | None:
"""Validate a raw ``next_turn`` directive into a clean, bounded dict.

Returns the subset of :data:`NEXT_TURN_KEYS` that carried a well-typed,
in-range value, or ``None`` when nothing usable remains. An unknown key, a
wrong type or an out-of-range number is dropped silently -- a malformed
directive must never break the turn.
"""
if not isinstance(raw, dict):
return None
out: dict = {}
for key in NEXT_TURN_KEYS:
if key not in raw:
continue
value = raw[key]
if key in ("model", "instructions"):
if isinstance(value, str) and value.strip():
text = value.strip()
if key == "instructions":
text = text[:_MAX_INSTRUCTION_CHARS]
out[key] = text
elif key == "temperature":
if isinstance(value, (int, float)) and not isinstance(value, bool):
out[key] = max(0.0, min(2.0, float(value)))
elif key == "max_output_tokens":
if (isinstance(value, int) and not isinstance(value, bool)
and value > 0):
out[key] = min(int(value), _MAX_OUTPUT_TOKENS)
return out or None


def split_next_turn(result):
"""Peel a ``next_turn`` directive off a raw tool return.

Returns ``(result_without_next_turn, directive_or_None)``. The directive
is removed from the result so it never reaches the execution pipeline or
the model -- it is turn-control data, not tool output.
"""
if not isinstance(result, dict) or "next_turn" not in result:
return result, None
cleaned = {key: value for key, value in result.items()
if key != "next_turn"}
return cleaned, sanitize_next_turn(result.get("next_turn"))


def apply_next_turn(
directive: dict | None,
*,
convo: list[dict],
model: str | None,
temperature: float,
max_tokens: int,
) -> tuple[str | None, float, int]:
"""Fold a next-turn directive into the in-process loop's parameters.

``convo`` is mutated in place when the directive carries ``instructions``
(appended as a system message, mirroring how the sidecar forwards
``instructions`` to the SDK). The model, temperature and token budget are
returned updated so the caller can rebind its working values.
"""
if not directive:
return model, temperature, max_tokens
if directive.get("model"):
model = directive["model"]
if directive.get("temperature") is not None:
temperature = directive["temperature"]
if directive.get("max_output_tokens"):
max_tokens = directive["max_output_tokens"]
instructions = directive.get("instructions")
if instructions:
convo.append({"role": "system", "content": instructions})
return model, temperature, max_tokens


def needs_approval(spec) -> bool:
"""Is a tool call gated on a human yes/no before it may run?

A tool is gated when its :class:`~ai.tools.ToolSpec` sets
``requires_approval``, when its name is listed in ``AGENT_APPROVAL_TOOLS``,
or when its risk tier is listed in ``AGENT_APPROVAL_RISKS``. With both
config lists empty -- the default -- approval is off and no call is gated.
"""
if spec is None:
return False
if getattr(spec, "requires_approval", False):
return True
name = getattr(spec, "name", "")
if name and name in Config.AGENT_APPROVAL_TOOLS:
return True
risk = getattr(spec, "risk", "")
if risk and risk in Config.AGENT_APPROVAL_RISKS:
return True
return False


async def request_tool_approval(name: str, args: dict, ctx) -> bool:
"""Ask the turn's approver to clear one gated tool call.

The approver -- set on :class:`~ai.tools.ToolContext` by the chat cog --
surfaces a human prompt and resolves to the decision. When no approver is
reachable the call is denied: a gated tool is never run unreviewed.
"""
approver = getattr(ctx, "approver", None)
if approver is None:
log.info("tool approval: %s denied (no approver available)", name)
return False
try:
decided = bool(await approver(name, args))
except Exception as exc: # noqa: BLE001
log.warning("tool approval for %s failed: %s", name, exc)
return False
log.info("tool approval: %s %s", name,
"approved" if decided else "rejected")
return decided
Loading
Loading