Skip to content

Add DSPy RLM based research agent#13

Open
wallyqs wants to merge 1 commit into
mainfrom
research-agent
Open

Add DSPy RLM based research agent#13
wallyqs wants to merge 1 commit into
mainfrom
research-agent

Conversation

@wallyqs
Copy link
Copy Markdown
Contributor

@wallyqs wallyqs commented Apr 24, 2026

Add example agent of using web search and RLMs for research:

RESEARCH_DEBUG=1 RESEARCH_PROVIDER=exa RESEARCH_MODEL="openai/gpt-oss-120b" ./scripts/run.sh
image

@synadia-claude-reviewer
Copy link
Copy Markdown

synadia-claude-reviewer Bot commented Apr 24, 2026

Claude finished @wallyqs's task in 4m 5s —— View job


PR Review: Add DSPy RLM based research agent

  • Gather context (read PR diff, changed files, CLAUDE.md)
  • Review src/index.ts — correctness, NATS protocol compliance, error handling
  • Review src/search.ts — security, correctness
  • Review scripts/run.sh and scripts/ask.ts — sandbox soundness
  • Review package.json / tsconfig.json — dependency hygiene
  • Post inline comments on specific issues
  • Write overall review summary

Summary

The RLM wiring is solid — AxJSRuntime with only TIMING permission keeps the sandbox tight, the checkpointed/balanced context policy is appropriate for long research runs, and the stream terminator (msg.respond("") in a finally block) is correctly spec-compliant. The NVIDIA-vLLM scrub workarounds (parallel_tool_calls, reasoning_effort rewrite) are well-isolated in scrubbedFetch.

A few real issues to address before merging:

Security / correctness (should fix):

  • URL scheme not validatedweb.fetch and web.findSimilar pass LLM-generated URLs directly to Tavily/Exa without checking for http(s)://. A prompt-injection-compromised actor could direct those providers to attempt arbitrary URLs. Fix → (see inline comment on search.ts:67)
  • USER env var as NATS subject token — if USER contains . (e.g. john.doe), the subject becomes agents.research.john.doe.rlm — five tokens instead of four, breaking discovery. Sanitize or validate on startup. (see inline comment on index.ts:198)
  • Module-level turn counter — shared across concurrent requests; concurrent debug files collide. Scope the counter per-request or use a random prefix. (see inline comment on index.ts:39)

Protocol compliance:

  • No error chunk type — mid-stream errors arrive as {"type":"response","data":"[agent error] …"}, which clients cannot distinguish from valid response text. Define and emit a {"type":"error","data":"…"} chunk type and document it in the README. (see inline comment on index.ts:293)

Minor (good to fix, not blocking):

  • Stale Content-Length on scrubbed responseres.headers is forwarded verbatim onto the re-serialised body; strip content-length from the rebuilt Response. (see inline comment on index.ts:85)
  • DEBUG files world-readable in /tmp — use { mode: 0o600 } in fs.writeFile or write to a per-process private directory. (see inline comment on index.ts:65)
  • run.sh always warns about TAVILY_API_KEY even when RESEARCH_PROVIDER=exa. (see inline comment on run.sh:22)

}

async fetch(url: string): Promise<FetchedPage> {
const res = await fetch("https://api.tavily.com/extract", {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

URL scheme not validated before delegating to Tavily/Exa

url here originates from the LLM-generated REPL code. There's no guard to reject non-http(s):// schemes (e.g. file://, ftp://, or an attacker-injected URL via a compromised search result). The same applies to ExaProvider.fetch() (line 125) and ExaProvider.findSimilar() (line 146).

Even though Tavily/Exa make the actual outbound request (not this process), arbitrary URLs are still forwarded to those third-party services — potential for prompt-injection-driven SSRF via the provider.

Suggested fix — add a guard in each provider's fetch/findSimilar, or centrally in the index.ts tool handlers:

if (!/^https?:\/\//i.test(url)) throw new Error(`fetch: only http/https URLs are supported, got: ${url.slice(0, 80)}`);

// non-reasoning models (ax rejects non-string `thought`) and refuses parallel
// tool calls on some llama variants. Same scrub as examples/dspy/src/index.ts.
const DEBUG_TRACE = process.env["RESEARCH_DEBUG"] === "1";
let turn = 0;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module-level turn counter races across concurrent requests

turn is a global, so two simultaneous promptHandler invocations will interleave increments. In debug mode this produces wrong pairings of research-req-N.jsonresearch-res-N.json, and in principle the scrub-fetch wrapper reads a file it didn't write. Move the counter inside scrubbedFetch as a closure variable scoped to the NATS handler invocation, or use a per-request unique prefix (e.g. a short uuid) for the debug file names.

// non-JSON body
}
}
if (DEBUG_TRACE && finalInit?.body) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DEBUG traces written to world-readable /tmp with predictable filenames

/tmp/research-req-N.json contains the full request body sent to the NVIDIA endpoint — including the bearer token in the Authorization header if ax places it there, plus the user's research question and any accumulated REPL context. /tmp/research-res-N.json contains raw search results that may include scraped page content.

On a multi-user machine these files are readable by any local user (/tmp has sticky bit but files are created with the process's default umask, typically 0644). Suggestions:

  • Use fs.writeFile(path, data, { mode: 0o600 }) to restrict permissions, or
  • Write to a private directory under $XDG_RUNTIME_DIR / os.tmpdir() with a random suffix.

const agentService = new ReferenceAgent({
nc,
agent: "research",
owner: process.env["USER"] ?? "anon",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

USER env var used unsanitized as a NATS subject token

NATS subjects use . as a delimiter; * and > are wildcards. If USER contains any of those characters (e.g. john.doe, or is empty and falls through to "anon" on a headless runner), the resulting subject agents.research.<USER>.rlm becomes malformed or ambiguous.

Consider validating/sanitizing the value:

const rawOwner = process.env["USER"] ?? "anon";
const owner = rawOwner.replace(/[.*>]/g, "_");

Or reject at startup with a clear error message if the value is invalid.

citations.forEach((u, i) => chunk("response", `\n[${i + 1}] ${u}`));
}
} catch (err) {
const message = err instanceof Error ? err.message : String(err);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mid-stream errors sent as response type — no error envelope type defined

When researcher.forward() rejects, the error string is wrapped as {"type":"response","data":"[agent error] …"}. Clients have no reliable way to distinguish a genuine partial response from a terminal error.

The spec calls for a 500 error envelope for server errors (msg.respondError(500, ...)), but here the stream has already started ("ack" was sent on line 222), so respondError can no longer be used exclusively.

Consider adding an error chunk type to the streaming protocol and documenting it in the README:

// Instead of chunk("response", `\n[agent error] ${message}`)
msg.respond(encoder.encode(JSON.stringify({ type: "error", data: message })));

This lets SDK clients (and future typed wrappers) surface errors cleanly without text-scanning response content.

}
return out;
};
return new Response(JSON.stringify(scrub(body)), {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reusing res.headers on a reconstructed Response carries a stale content-length

JSON.stringify(scrub(body)) will almost always differ in byte-length from the original response body (the scrubber removes at least the reasoning_content keys). Forwarding res.headers as-is means the new Response carries a Content-Length that no longer matches the body. Most HTTP clients ignore Content-Length when they already have the full body, but it's technically incorrect and could surprise consumers that inspect headers.

Fix: strip or override Content-Length on the rebuilt response:

const newHeaders = new Headers(res.headers);
newHeaders.delete("content-length");
return new Response(JSON.stringify(scrub(body)), {
  status: res.status,
  statusText: res.statusText,
  headers: newHeaders,
});

exit 1
fi

if [[ -z "${TAVILY_API_KEY:-}" ]]; then
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning always checks TAVILY_API_KEY regardless of selected provider

If the user sets RESEARCH_PROVIDER=exa and EXA_API_KEY, this warning fires anyway and may confuse them. The check should be conditioned on the selected provider, or warn generically that at least one of TAVILY_API_KEY/EXA_API_KEY is needed:

if [[ -z "${TAVILY_API_KEY:-}" && -z "${EXA_API_KEY:-}" ]]; then
  echo "note: neither TAVILY_API_KEY nor EXA_API_KEY is set — web search will be disabled" >&2
fi

Signed-off-by: Waldemar Quevedo <wally@nats.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant