Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 6 additions & 21 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Chloei is a **Next.js 16 / React 19** authenticated AI chat app backed by the **Vercel AI Gateway**. A single streaming agent endpoint (`/api/agent`) runs a multi-step tool-using loop (web search, code execution) and streams NDJSON to the browser. Auth, sessions, threads, and rate limiting are **PostgreSQL-backed** (Better Auth + Kysely/`pg`).
Chloei is a **Next.js 16 / React 19** authenticated AI chat app backed by the **Vercel AI Gateway**. A single streaming agent endpoint (`/api/agent`) runs a multi-step tool-using loop (Tavily web search) and streams NDJSON to the browser. Auth, sessions, threads, and rate limiting are **PostgreSQL-backed** (Better Auth + Kysely/`pg`).

> **Maintaining this file:** keep it accurate and concise — stale or bloated guidance makes Claude ignore the parts that matter. Cite durable anchors (file paths + symbol names), not line numbers. When you change a subsystem, update the matching section here in the same PR.

Expand Down Expand Up @@ -76,20 +76,15 @@ Before the model sees them, messages pass through `toModelMessages` (`agent-runt

This boundary is **enforced by Next.js bundling at build time** (importing `pg`/`better-auth`/server modules into a client bundle is a build error), **not** by an ESLint rule. Keep it in mind when adding imports.

### Task Modes

**Task mode** (`inferPromptTaskMode` in `agent-prompt-steering.ts`) is inferred from message content and drives only the **prompt overlay text** (`TASK MODE OVERLAY: <MODE>`). Modes: `general`, `coding`, `debugging`, `writing`, `research`, `high_stakes`, `closed_answer`, `instruction_following`. The tool-step budget is a single fixed constant (`AGENT_TOOL_MAX_STEPS`) — there is no per-request runtime-profile selection. (The `research` task mode is an automatic content-based overlay, not a user-facing toggle.)

### System Prompt Composition

`buildAgentSystemInstruction` (`src/lib/server/agent-context.ts`) assembles the prompt per-request from labeled blocks delimited by `--- BEGIN <LABEL> ---` / `--- END <LABEL> ---`, in this order:

1. `OPERATING INSTRUCTIONS` — `DEFAULT_OPERATING_INSTRUCTION` (`src/lib/shared/llm/system-instructions.ts`)
2. `RUNTIME DATE CONTEXT` — current UTC timestamp + user timezone (from `X-User-Timezone` header)
3. **Provider overlay** (`PROVIDER OVERLAY: ALIBABA|MOONSHOTAI`) — keyed by the model's **provider org**, not its nickname (alibaba=Qwen, moonshotai=Kimi). Always applied for a supported model.
4. **Task mode overlay** (`TASK MODE OVERLAY: <MODE>`)
5. `IDENTITY AND TONE CONTEXT` — `DEFAULT_SOUL_FALLBACK_INSTRUCTION` (`src/lib/shared/llm/system-instructions.ts`)
6. `AUTH USER CONTEXT` — authenticated user id, name, email
4. `IDENTITY AND TONE CONTEXT` — `DEFAULT_SOUL_FALLBACK_INSTRUCTION` (`src/lib/shared/llm/system-instructions.ts`)
5. `AUTH USER CONTEXT` — authenticated user id, name, email

Inline-citation rules are appended **later**, by `withAiSdkInlineCitationInstruction` (`system-instruction-augmentations.ts`), inside `createAgentStreamResponse` — not by `buildAgentSystemInstruction`.

Expand Down Expand Up @@ -126,27 +121,18 @@ Streamed reasoning is filtered through three layers so the hidden prompt never l

1. Literal `[REDACTED]` reasoning chunks are dropped (`initial-reasoning-chunk-sanitizer.ts`).
2. Leading `thinking:` / `reasoning:` labels are stripped (buffered across chunk boundaries).
3. `sanitizeReasoningForDisplay` (`src/lib/shared/agent/reasoning-privacy.ts`) redacts prompt-internal terms (`soul.md`, "system prompt", "operating instructions", "provider overlay", "task mode overlay", "auth user context", etc.). `getPrivateReasoningCarryLength` holds back a trailing partial token across chunk boundaries so a split secret can't slip through. Applied both at stream time and on thread persistence.
3. `sanitizeReasoningForDisplay` (`src/lib/shared/agent/reasoning-privacy.ts`) redacts prompt-internal terms (`soul.md`, "system prompt", "operating instructions", "provider overlay", "auth user context", etc.). `getPrivateReasoningCarryLength` holds back a trailing partial token across chunk boundaries so a split secret can't slip through. Applied both at stream time and on thread persistence.

User ids are never logged in the clear: `hashUserId` (`src/lib/server/privacy.ts`) produces a `sha256:<hex>` digest used for telemetry user-hashing.

### Agent Tools

Each tool is only registered when its requirements are met.
The only tools are the two Tavily web-search tools, and both are registered together when `TAVILY_API_KEY` is set. There are no other internal or external tools.

| Tool | Requirement | Tool id / operations |
| ---------------- | ---------------- | ---------------------------------------------------------------------------------------------------------------------------------------- |
| `tavily_search` | `TAVILY_API_KEY` | Web search. `topic: general\|news\|finance`, `timeRange`, `includeDomains`, `excludeDomains`, `country`; up to **8** results (default 5) |
| `tavily_extract` | `TAVILY_API_KEY` | Extract content from up to **5** URLs; `extractDepth: basic\|advanced` (default advanced), `format: markdown\|text` |
| `code_execution` | always on | `language: javascript\|python`; runs on the restricted computation-only backend |

**Code execution** (`src/lib/server/llm/code-execution-tools.ts`):

- Runs in a temp directory; network/filesystem/subprocess access blocked.
- JavaScript: runs via the Node binary (`process.execPath`) with `--input-type=module --eval`.
- Python: runs via `python3 -I -c` (the `python3` binary on `PATH`).
- **Restricted backend** (the only backend): computation-only Python imports (`math`, `collections`, `itertools`, …).
- **Limits**: timeout default **10 s**, max **60 s**; code input and output each capped at **12,000 chars**.

**Max tool steps** (`AGENT_TOOL_MAX_STEPS` in `agent-runtime-config.ts`): 12.

Expand Down Expand Up @@ -279,7 +265,7 @@ src/
editor/highlighter.ts
server/
agent-context.ts # buildAgentSystemInstruction
agent-prompt-steering.ts # Task-mode inference + provider/task overlays
agent-prompt-steering.ts # Provider overlays (Qwen/Kimi tuning)
agent-route.ts # parseAgentStreamRequest, createAgentStreamResponse
agent-runtime-config.ts # Runtime constants + a few operational env switches
auth.ts / auth-session.ts # isAuthConfigured, getRequestSession
Expand All @@ -296,7 +282,6 @@ src/
gateway-responses.ts # startGatewayResponseStream
gateway-client.ts # undici dispatcher for AI Gateway
ai-sdk-tavily-tools.ts # tavily_search / tavily_extract
code-execution-tools.ts # Sandboxed JS/Python execution
initial-reasoning-chunk-sanitizer.ts # Redacted-reasoning filtering
system-instruction-augmentations.ts # Citation rules appended to prompt
shared/
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Chloei

Chloei is a Next.js 16 chat app backed by Vercel AI Gateway. It currently exposes a curated model selector that defaults to Qwen 3.7 Max and also includes Kimi K2.6, and offers local code execution, optional Tavily retrieval, and Better Auth email/password authentication with PostgreSQL-backed users and sessions.
Chloei is a Next.js 16 chat app backed by Vercel AI Gateway. It currently exposes a curated model selector that defaults to Qwen 3.7 Max and also includes Kimi K2.6, and offers optional Tavily web search and Better Auth email/password authentication with PostgreSQL-backed users and sessions.

## Documentation

Expand Down
7 changes: 1 addition & 6 deletions src/app/api/agent/route.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,7 @@ import { type NextRequest } from "next/server"
import { getModels } from "@/lib/actions/api-keys"
import { createLogger } from "@/lib/logger"
import { buildAgentSystemInstruction } from "@/lib/server/agent-context"
import {
inferPromptTaskMode,
resolvePromptProvider,
} from "@/lib/server/agent-prompt-steering"
import { resolvePromptProvider } from "@/lib/server/agent-prompt-steering"
import {
createAgentStreamResponse,
createJsonErrorResponse,
Expand Down Expand Up @@ -157,7 +154,6 @@ export async function POST(request: NextRequest) {
const userTimeZone = resolveUserTimeZone(request)
const featureFlags = await resolveAgentFeatureFlags()
const promptProvider = resolvePromptProvider(selectedModel)
const promptTaskMode = inferPromptTaskMode(parsedRequest.messages)
const systemInstruction = buildAgentSystemInstruction(
{
id: session.user.id,
Expand All @@ -168,7 +164,6 @@ export async function POST(request: NextRequest) {
now: requestNow,
userTimeZone,
provider: promptProvider,
taskMode: promptTaskMode,
}
)

Expand Down
7 changes: 0 additions & 7 deletions src/lib/server/agent-context.ts
Original file line number Diff line number Diff line change
Expand Up @@ -7,20 +7,17 @@ import {
import {
createPromptSteeringBlocks,
type PromptProvider,
type PromptTaskMode,
} from "./agent-prompt-steering"

interface RuntimePromptContext {
now: Date
userTimeZone?: string
provider?: PromptProvider
taskMode?: PromptTaskMode
}

interface AgentContextOverrides {
operatingInstruction?: string
providerOverlaysEnabled?: boolean
taskModeOverlaysEnabled?: boolean
}

function formatPromptBlock(label: string, body: string): string {
Expand Down Expand Up @@ -101,7 +98,6 @@ function composeSystemInstruction(params: {
runtimeContext: RuntimePromptContext
operatingInstruction?: string
providerOverlaysEnabled?: boolean
taskModeOverlaysEnabled?: boolean
}): string {
const blocks = [
formatPromptBlock(
Expand All @@ -116,9 +112,7 @@ function composeSystemInstruction(params: {

const promptSteeringBlocks = createPromptSteeringBlocks({
provider: params.runtimeContext.provider,
taskMode: params.runtimeContext.taskMode,
providerOverlaysEnabled: params.providerOverlaysEnabled,
taskModeOverlaysEnabled: params.taskModeOverlaysEnabled,
})

for (const block of promptSteeringBlocks) {
Expand Down Expand Up @@ -147,6 +141,5 @@ export function buildAgentSystemInstruction(
runtimeContext,
operatingInstruction: overrides.operatingInstruction,
providerOverlaysEnabled: overrides.providerOverlaysEnabled,
taskModeOverlaysEnabled: overrides.taskModeOverlaysEnabled,
})
}
196 changes: 0 additions & 196 deletions src/lib/server/agent-prompt-steering.ts
Original file line number Diff line number Diff line change
@@ -1,61 +1,17 @@
import type { ModelType } from "@/lib/shared"

import {
getLastUserMessage,
hasPersonalFinancialAdviceIntent,
normalizeUserText,
type PromptTextMessage,
} from "./prompt-message-utils"

export type PromptProvider = "alibaba" | "moonshotai"

export type PromptTaskMode =
| "general"
| "instruction_following"
| "closed_answer"
| "coding"
| "debugging"
| "writing"
| "research"
| "high_stakes"

type UserExpertiseHint = "engineering" | "writing" | "research"

type PromptSteeringMessage = PromptTextMessage

interface PromptSteeringBlock {
label: string
body: string
}

interface CreatePromptSteeringBlocksParams {
provider?: PromptProvider
taskMode?: PromptTaskMode
providerOverlaysEnabled?: boolean
taskModeOverlaysEnabled?: boolean
}

interface InferPromptTaskModeOptions {
userExpertise?: UserExpertiseHint
}

const CODING_PATTERN =
/\b(code|coding|function|class|script|algorithm|typescript|javascript|python|sql|regex|unit test|implement|write a program|refactor|module|library|api endpoint|compile|build error)\b/i
// Debugging is a distinct category: triage/diagnosis work that benefits from
// extra reasoning even when no explicit code is requested.
const DEBUGGING_PATTERN =
/\b(stack trace|traceback|error message|exception|reproduce|repro|why does .{1,40}\s(fail|crash|break|hang|throw)|not working|broken|crashes?|hangs?|deadlock|memory leak|segfault|panic|enoent|undefined is not|cannot read propert(?:y|ies)|null pointer|race condition|flaky)\b/i
const WRITING_PATTERN =
/\b(draft|rewrite|edit|proofread|proofreading|tone|copy|copywrit|essay|blog post|newsletter|paragraph|prose|grammar|punctuation|tighten|polish|shorter version|longer version|press release|release notes?|changelog entry|cover letter|outline this|outline for)\b/i
const RESEARCH_PATTERN =
/\b(latest|current|today|recent|as of|sources?|cite|citation|link|look up|lookup|verify|check the web|news|price right now|right now|breaking|trending|happening (?:now|today))\b/i
const HIGH_STAKES_PATTERN =
/\b(bank|password|phish(?:ed|ing)?|security|medical|doctor|symptom|symptoms|dose|dosage|prescription|pregnant|lawsuit|legal|tax|suicid|self-harm|chest pain|emergency|infection|overdose)\b/i
const CLOSED_ANSWER_PATTERN =
/\b(multiple choice|choose one|which option|final answer|exact answer|boxed|answer:|confidence:|A\)|B\)|C\)|D\))\b/i
const STRICT_OUTPUT_PATTERN =
/\b(return only|exactly|exact format|valid json|minified json|last line|single word|one word|single line|one line|two sentences|one sentence|one paragraph|no more than|under \d+ words|no surrounding prose|only one ```|schema|yaml|xml|csv)\b/i

const PROVIDER_OVERLAYS: Record<PromptProvider, string> = {
alibaba: `
Use Qwen reasoning mode efficiently.
Expand All @@ -75,67 +31,6 @@ Use Kimi reasoning mode efficiently.
`.trim(),
}

const TASK_MODE_OVERLAYS: Record<Exclude<PromptTaskMode, "general">, string> = {
instruction_following: `
This request is parser-sensitive or format-sensitive.
- Exact compliance is mandatory.
- Return only the requested structure, wording, and delimiters.
- If a final line or key order is specified, check it literally before finishing.
- Treat word, sentence, paragraph, and line caps as hard limits. Count before finishing when close to the boundary.
- Cut any extra commentary that would reduce extractability.
`.trim(),
closed_answer: `
This request expects one clear answer.
- Resolve ambiguity, choose the best answer, and commit.
- Keep explanation brief and keep the final answer unambiguous.
- If the task implies a required final-answer line, end with that exact line.
- If the required answer form is numeric, boxed, or one-line, return that form exactly without extra prose.
- Do not leave the answer buried in exploratory prose.
`.trim(),
coding: `
This request is code-centric.
- Prefer runnable code and correct I/O behavior over explanation.
- If the user requests code only or one code block, obey that literally.
- Use the code_execution tool for arithmetic, spot checks, or quick validation when it reduces error risk.
- Do not add prose that would break copy-paste or grading.
`.trim(),
debugging: `
This request is a diagnosis/debugging task.
- Form a specific hypothesis about the root cause before recommending a fix; avoid generic "try restarting" advice.
- Ask for the missing signal (exact error, repro steps, environment) only when it would materially change the diagnosis; otherwise reason from what is provided.
- When the user shares a stack trace or error, identify the originating call site and explain *why* it fails, not just *that* it fails.
- Prefer code_execution to validate the fix when arithmetic, parsing, or small reproductions can be checked locally.
- End with a concrete next action: the patch, the command to run, or the data to collect.
`.trim(),
writing: `
This request is a writing/editing task.
- Match the requested voice, length, and audience; do not impose a default Chloei voice when the user has specified one.
- Preserve the user's factual claims and proper nouns verbatim unless asked to fact-check.
- If asked to edit, return the edited text in the requested form. If asked to rewrite, do not paste back the original.
- Length caps are hard caps; count before finishing when close to the limit.
- Skip preambles like "Sure, here is your draft" — return the deliverable.
`.trim(),
research: `
This request needs deep research, freshness, sources, or verification.
- Clarify missing scope only when the missing detail would materially change the research plan; otherwise proceed with stated assumptions.
- Decide what claims need verification before answering, and search before answering freshness-sensitive, source-heavy, or contested claims.
- Extract or read primary pages when details, dates, numbers, methodology, or quotes matter.
- Cross-check important claims across sources, especially when sources conflict or one source is promotional.
- Use explicit calendar dates when recency matters.
- Use code execution for calculations, tabular analysis, transformations, and arithmetic checks that could change the conclusion.
- Produce a structured, citation-forward final report with clear findings, evidence, limitations, and source gaps.
- If live retrieval tools are unavailable or evidence is missing or conflicting, say that plainly instead of guessing.
`.trim(),
high_stakes: `
This request is high-stakes.
- Optimize for correctness, concrete next actions, and low hallucination risk.
- If current or external facts matter, verify them when tools are available.
- Be direct and practical, not verbose or vague.
- In compromised-account, phishing, or financial-security scenarios, include immediate containment and stronger login protection such as 2FA/MFA when applicable.
- If something cannot be verified, say so explicitly rather than filling the gap.
`.trim(),
}

export function resolvePromptProvider(model: ModelType): PromptProvider {
if (model.startsWith("alibaba/")) {
return "alibaba"
Expand All @@ -148,86 +43,6 @@ export function resolvePromptProvider(model: ModelType): PromptProvider {
throw new Error(`Unsupported model provider for model: ${model}`)
}

export function inferPromptTaskMode(
messages: readonly PromptSteeringMessage[],
options: InferPromptTaskModeOptions = {}
): PromptTaskMode {
const lastUserMessage = getLastUserMessage(messages)
if (!lastUserMessage) {
return "general"
}

const fullUserText = normalizeUserText(messages)
const coding = CODING_PATTERN.test(lastUserMessage)
const debugging =
DEBUGGING_PATTERN.test(lastUserMessage) ||
DEBUGGING_PATTERN.test(fullUserText)
const writing =
WRITING_PATTERN.test(lastUserMessage) || WRITING_PATTERN.test(fullUserText)
const strictOutput =
STRICT_OUTPUT_PATTERN.test(lastUserMessage) ||
STRICT_OUTPUT_PATTERN.test(fullUserText)
const highStakes = HIGH_STAKES_PATTERN.test(lastUserMessage)
const financialAdvice = hasPersonalFinancialAdviceIntent(lastUserMessage)
const research =
RESEARCH_PATTERN.test(lastUserMessage) ||
RESEARCH_PATTERN.test(fullUserText)
const closedAnswer =
CLOSED_ANSWER_PATTERN.test(lastUserMessage) ||
CLOSED_ANSWER_PATTERN.test(fullUserText)

// High-stakes always wins over expertise hints so personalization can never
// downgrade a safety-relevant routing decision.
if (financialAdvice) {
return "high_stakes"
}

if (highStakes) {
return "high_stakes"
}

if (debugging) {
return "debugging"
}

if (coding) {
return "coding"
}

if (research) {
return "research"
}

if (writing) {
return "writing"
}

if (closedAnswer) {
return "closed_answer"
}

if (strictOutput) {
return "instruction_following"
}

if (options.userExpertise === "research") {
return "research"
}

if (options.userExpertise === "writing") {
return "writing"
}

if (
options.userExpertise === "engineering" &&
lastUserMessage.includes("?")
) {
return "coding"
}

return "general"
}

export function createPromptSteeringBlocks(
params: CreatePromptSteeringBlocksParams
): PromptSteeringBlock[] {
Expand All @@ -240,16 +55,5 @@ export function createPromptSteeringBlocks(
})
}

if (
params.taskMode &&
params.taskMode !== "general" &&
params.taskModeOverlaysEnabled !== false
) {
blocks.push({
label: `TASK MODE OVERLAY: ${params.taskMode.toUpperCase()}`,
body: TASK_MODE_OVERLAYS[params.taskMode],
})
}

return blocks
}
Loading