chloeilabs · chloeilabs · Jun 16, 2026 · Jun 16, 2026
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -2,7 +2,7 @@
 
 This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
 
-Chloei is a **Next.js 16 / React 19** authenticated AI chat app backed by the **Vercel AI Gateway**. A single streaming agent endpoint (`/api/agent`) runs a multi-step tool-using loop (web search, code execution) and streams NDJSON to the browser. Auth, sessions, threads, and rate limiting are **PostgreSQL-backed** (Better Auth + Kysely/`pg`).
+Chloei is a **Next.js 16 / React 19** authenticated AI chat app backed by the **Vercel AI Gateway**. A single streaming agent endpoint (`/api/agent`) runs a multi-step tool-using loop (Tavily web search) and streams NDJSON to the browser. Auth, sessions, threads, and rate limiting are **PostgreSQL-backed** (Better Auth + Kysely/`pg`).
 
 > **Maintaining this file:** keep it accurate and concise — stale or bloated guidance makes Claude ignore the parts that matter. Cite durable anchors (file paths + symbol names), not line numbers. When you change a subsystem, update the matching section here in the same PR.
 
@@ -76,20 +76,15 @@ Before the model sees them, messages pass through `toModelMessages` (`agent-runt
 
 This boundary is **enforced by Next.js bundling at build time** (importing `pg`/`better-auth`/server modules into a client bundle is a build error), **not** by an ESLint rule. Keep it in mind when adding imports.
 
-### Task Modes
-
-**Task mode** (`inferPromptTaskMode` in `agent-prompt-steering.ts`) is inferred from message content and drives only the **prompt overlay text** (`TASK MODE OVERLAY: <MODE>`). Modes: `general`, `coding`, `debugging`, `writing`, `research`, `high_stakes`, `closed_answer`, `instruction_following`. The tool-step budget is a single fixed constant (`AGENT_TOOL_MAX_STEPS`) — there is no per-request runtime-profile selection. (The `research` task mode is an automatic content-based overlay, not a user-facing toggle.)
-
 ### System Prompt Composition
 
 `buildAgentSystemInstruction` (`src/lib/server/agent-context.ts`) assembles the prompt per-request from labeled blocks delimited by `--- BEGIN <LABEL> ---` / `--- END <LABEL> ---`, in this order:
 
 1. `OPERATING INSTRUCTIONS` — `DEFAULT_OPERATING_INSTRUCTION` (`src/lib/shared/llm/system-instructions.ts`)
 2. `RUNTIME DATE CONTEXT` — current UTC timestamp + user timezone (from `X-User-Timezone` header)
 3. **Provider overlay** (`PROVIDER OVERLAY: ALIBABA|MOONSHOTAI`) — keyed by the model's **provider org**, not its nickname (alibaba=Qwen, moonshotai=Kimi). Always applied for a supported model.
-4. **Task mode overlay** (`TASK MODE OVERLAY: <MODE>`)
-5. `IDENTITY AND TONE CONTEXT` — `DEFAULT_SOUL_FALLBACK_INSTRUCTION` (`src/lib/shared/llm/system-instructions.ts`)
-6. `AUTH USER CONTEXT` — authenticated user id, name, email
+4. `IDENTITY AND TONE CONTEXT` — `DEFAULT_SOUL_FALLBACK_INSTRUCTION` (`src/lib/shared/llm/system-instructions.ts`)
+5. `AUTH USER CONTEXT` — authenticated user id, name, email
 
 Inline-citation rules are appended **later**, by `withAiSdkInlineCitationInstruction` (`system-instruction-augmentations.ts`), inside `createAgentStreamResponse` — not by `buildAgentSystemInstruction`.
 
@@ -126,27 +121,18 @@ Streamed reasoning is filtered through three layers so the hidden prompt never l
 
 1. Literal `[REDACTED]` reasoning chunks are dropped (`initial-reasoning-chunk-sanitizer.ts`).
 2. Leading `thinking:` / `reasoning:` labels are stripped (buffered across chunk boundaries).
-3. `sanitizeReasoningForDisplay` (`src/lib/shared/agent/reasoning-privacy.ts`) redacts prompt-internal terms (`soul.md`, "system prompt", "operating instructions", "provider overlay", "task mode overlay", "auth user context", etc.). `getPrivateReasoningCarryLength` holds back a trailing partial token across chunk boundaries so a split secret can't slip through. Applied both at stream time and on thread persistence.
+3. `sanitizeReasoningForDisplay` (`src/lib/shared/agent/reasoning-privacy.ts`) redacts prompt-internal terms (`soul.md`, "system prompt", "operating instructions", "provider overlay", "auth user context", etc.). `getPrivateReasoningCarryLength` holds back a trailing partial token across chunk boundaries so a split secret can't slip through. Applied both at stream time and on thread persistence.
 
 User ids are never logged in the clear: `hashUserId` (`src/lib/server/privacy.ts`) produces a `sha256:<hex>` digest used for telemetry user-hashing.
 
 ### Agent Tools
 
-Each tool is only registered when its requirements are met.
+The only tools are the two Tavily web-search tools, and both are registered together when `TAVILY_API_KEY` is set. There are no other internal or external tools.
 
 | Tool             | Requirement      | Tool id / operations                                                                                                                     |
 | ---------------- | ---------------- | ---------------------------------------------------------------------------------------------------------------------------------------- |
 | `tavily_search`  | `TAVILY_API_KEY` | Web search. `topic: general\|news\|finance`, `timeRange`, `includeDomains`, `excludeDomains`, `country`; up to **8** results (default 5) |
 | `tavily_extract` | `TAVILY_API_KEY` | Extract content from up to **5** URLs; `extractDepth: basic\|advanced` (default advanced), `format: markdown\|text`                      |
-| `code_execution` | always on        | `language: javascript\|python`; runs on the restricted computation-only backend                                                          |
-
-**Code execution** (`src/lib/server/llm/code-execution-tools.ts`):
-
-- Runs in a temp directory; network/filesystem/subprocess access blocked.
-- JavaScript: runs via the Node binary (`process.execPath`) with `--input-type=module --eval`.
-- Python: runs via `python3 -I -c` (the `python3` binary on `PATH`).
-- **Restricted backend** (the only backend): computation-only Python imports (`math`, `collections`, `itertools`, …).
-- **Limits**: timeout default **10 s**, max **60 s**; code input and output each capped at **12,000 chars**.
 
 **Max tool steps** (`AGENT_TOOL_MAX_STEPS` in `agent-runtime-config.ts`): 12.
 
@@ -279,7 +265,7 @@ src/
     editor/highlighter.ts
     server/
       agent-context.ts          # buildAgentSystemInstruction
-      agent-prompt-steering.ts  # Task-mode inference + provider/task overlays
+      agent-prompt-steering.ts  # Provider overlays (Qwen/Kimi tuning)
       agent-route.ts            # parseAgentStreamRequest, createAgentStreamResponse
       agent-runtime-config.ts   # Runtime constants + a few operational env switches
       auth.ts / auth-session.ts # isAuthConfigured, getRequestSession
@@ -296,7 +282,6 @@ src/
         gateway-responses.ts              # startGatewayResponseStream
         gateway-client.ts                 # undici dispatcher for AI Gateway
         ai-sdk-tavily-tools.ts            # tavily_search / tavily_extract
-        code-execution-tools.ts           # Sandboxed JS/Python execution
         initial-reasoning-chunk-sanitizer.ts # Redacted-reasoning filtering
         system-instruction-augmentations.ts  # Citation rules appended to prompt
     shared/

diff --git a/README.md b/README.md
@@ -1,6 +1,6 @@
 # Chloei
 
-Chloei is a Next.js 16 chat app backed by Vercel AI Gateway. It currently exposes a curated model selector that defaults to Qwen 3.7 Max and also includes Kimi K2.6, and offers local code execution, optional Tavily retrieval, and Better Auth email/password authentication with PostgreSQL-backed users and sessions.
+Chloei is a Next.js 16 chat app backed by Vercel AI Gateway. It currently exposes a curated model selector that defaults to Qwen 3.7 Max and also includes Kimi K2.6, and offers optional Tavily web search and Better Auth email/password authentication with PostgreSQL-backed users and sessions.
 
 ## Documentation
 

diff --git a/src/app/api/agent/route.ts b/src/app/api/agent/route.ts
@@ -3,10 +3,7 @@ import { type NextRequest } from "next/server"
 import { getModels } from "@/lib/actions/api-keys"
 import { createLogger } from "@/lib/logger"
 import { buildAgentSystemInstruction } from "@/lib/server/agent-context"
-import {
-  inferPromptTaskMode,
-  resolvePromptProvider,
-} from "@/lib/server/agent-prompt-steering"
+import { resolvePromptProvider } from "@/lib/server/agent-prompt-steering"
 import {
   createAgentStreamResponse,
   createJsonErrorResponse,
@@ -157,7 +154,6 @@ export async function POST(request: NextRequest) {
     const userTimeZone = resolveUserTimeZone(request)
     const featureFlags = await resolveAgentFeatureFlags()
     const promptProvider = resolvePromptProvider(selectedModel)
-    const promptTaskMode = inferPromptTaskMode(parsedRequest.messages)
     const systemInstruction = buildAgentSystemInstruction(
       {
         id: session.user.id,
@@ -168,7 +164,6 @@ export async function POST(request: NextRequest) {
         now: requestNow,
         userTimeZone,
         provider: promptProvider,
-        taskMode: promptTaskMode,
       }
     )
 

diff --git a/src/lib/server/agent-context.ts b/src/lib/server/agent-context.ts
@@ -7,20 +7,17 @@ import {
 import {
   createPromptSteeringBlocks,
   type PromptProvider,
-  type PromptTaskMode,
 } from "./agent-prompt-steering"
 
 interface RuntimePromptContext {
   now: Date
   userTimeZone?: string
   provider?: PromptProvider
-  taskMode?: PromptTaskMode
 }
 
 interface AgentContextOverrides {
   operatingInstruction?: string
   providerOverlaysEnabled?: boolean
-  taskModeOverlaysEnabled?: boolean
 }
 
 function formatPromptBlock(label: string, body: string): string {
@@ -101,7 +98,6 @@ function composeSystemInstruction(params: {
   runtimeContext: RuntimePromptContext
   operatingInstruction?: string
   providerOverlaysEnabled?: boolean
-  taskModeOverlaysEnabled?: boolean
 }): string {
   const blocks = [
     formatPromptBlock(
@@ -116,9 +112,7 @@ function composeSystemInstruction(params: {
 
   const promptSteeringBlocks = createPromptSteeringBlocks({
     provider: params.runtimeContext.provider,
-    taskMode: params.runtimeContext.taskMode,
     providerOverlaysEnabled: params.providerOverlaysEnabled,
-    taskModeOverlaysEnabled: params.taskModeOverlaysEnabled,
   })
 
   for (const block of promptSteeringBlocks) {
@@ -147,6 +141,5 @@ export function buildAgentSystemInstruction(
     runtimeContext,
     operatingInstruction: overrides.operatingInstruction,
     providerOverlaysEnabled: overrides.providerOverlaysEnabled,
-    taskModeOverlaysEnabled: overrides.taskModeOverlaysEnabled,
   })
 }
diff --git a/src/lib/server/agent-prompt-steering.ts b/src/lib/server/agent-prompt-steering.ts
@@ -1,61 +1,17 @@
 import type { ModelType } from "@/lib/shared"
 
-import {
-  getLastUserMessage,
-  hasPersonalFinancialAdviceIntent,
-  normalizeUserText,
-  type PromptTextMessage,
-} from "./prompt-message-utils"
-
 export type PromptProvider = "alibaba" | "moonshotai"
 
-export type PromptTaskMode =
-  | "general"
-  | "instruction_following"
-  | "closed_answer"
-  | "coding"
-  | "debugging"
-  | "writing"
-  | "research"
-  | "high_stakes"
-
-type UserExpertiseHint = "engineering" | "writing" | "research"
-
-type PromptSteeringMessage = PromptTextMessage
-
 interface PromptSteeringBlock {
   label: string
   body: string
 }
 
 interface CreatePromptSteeringBlocksParams {
   provider?: PromptProvider
-  taskMode?: PromptTaskMode
   providerOverlaysEnabled?: boolean
-  taskModeOverlaysEnabled?: boolean
-}
-
-interface InferPromptTaskModeOptions {
-  userExpertise?: UserExpertiseHint
 }
 
-const CODING_PATTERN =
-  /\b(code|coding|function|class|script|algorithm|typescript|javascript|python|sql|regex|unit test|implement|write a program|refactor|module|library|api endpoint|compile|build error)\b/i
-// Debugging is a distinct category: triage/diagnosis work that benefits from
-// extra reasoning even when no explicit code is requested.
-const DEBUGGING_PATTERN =
-  /\b(stack trace|traceback|error message|exception|reproduce|repro|why does .{1,40}\s(fail|crash|break|hang|throw)|not working|broken|crashes?|hangs?|deadlock|memory leak|segfault|panic|enoent|undefined is not|cannot read propert(?:y|ies)|null pointer|race condition|flaky)\b/i
-const WRITING_PATTERN =
-  /\b(draft|rewrite|edit|proofread|proofreading|tone|copy|copywrit|essay|blog post|newsletter|paragraph|prose|grammar|punctuation|tighten|polish|shorter version|longer version|press release|release notes?|changelog entry|cover letter|outline this|outline for)\b/i
-const RESEARCH_PATTERN =
-  /\b(latest|current|today|recent|as of|sources?|cite|citation|link|look up|lookup|verify|check the web|news|price right now|right now|breaking|trending|happening (?:now|today))\b/i
-const HIGH_STAKES_PATTERN =
-  /\b(bank|password|phish(?:ed|ing)?|security|medical|doctor|symptom|symptoms|dose|dosage|prescription|pregnant|lawsuit|legal|tax|suicid|self-harm|chest pain|emergency|infection|overdose)\b/i
-const CLOSED_ANSWER_PATTERN =
-  /\b(multiple choice|choose one|which option|final answer|exact answer|boxed|answer:|confidence:|A\)|B\)|C\)|D\))\b/i
-const STRICT_OUTPUT_PATTERN =
-  /\b(return only|exactly|exact format|valid json|minified json|last line|single word|one word|single line|one line|two sentences|one sentence|one paragraph|no more than|under \d+ words|no surrounding prose|only one ```|schema|yaml|xml|csv)\b/i
-
 const PROVIDER_OVERLAYS: Record<PromptProvider, string> = {
   alibaba: `
 Use Qwen reasoning mode efficiently.
@@ -75,67 +31,6 @@ Use Kimi reasoning mode efficiently.
 `.trim(),
 }
 
-const TASK_MODE_OVERLAYS: Record<Exclude<PromptTaskMode, "general">, string> = {
-  instruction_following: `
-This request is parser-sensitive or format-sensitive.
-- Exact compliance is mandatory.
-- Return only the requested structure, wording, and delimiters.
-- If a final line or key order is specified, check it literally before finishing.
-- Treat word, sentence, paragraph, and line caps as hard limits. Count before finishing when close to the boundary.
-- Cut any extra commentary that would reduce extractability.
-`.trim(),
-  closed_answer: `
-This request expects one clear answer.
-- Resolve ambiguity, choose the best answer, and commit.
-- Keep explanation brief and keep the final answer unambiguous.
-- If the task implies a required final-answer line, end with that exact line.
-- If the required answer form is numeric, boxed, or one-line, return that form exactly without extra prose.
-- Do not leave the answer buried in exploratory prose.
-`.trim(),
-  coding: `
-This request is code-centric.
-- Prefer runnable code and correct I/O behavior over explanation.
-- If the user requests code only or one code block, obey that literally.
-- Use the code_execution tool for arithmetic, spot checks, or quick validation when it reduces error risk.
-- Do not add prose that would break copy-paste or grading.
-`.trim(),
-  debugging: `
-This request is a diagnosis/debugging task.
-- Form a specific hypothesis about the root cause before recommending a fix; avoid generic "try restarting" advice.
-- Ask for the missing signal (exact error, repro steps, environment) only when it would materially change the diagnosis; otherwise reason from what is provided.
-- When the user shares a stack trace or error, identify the originating call site and explain *why* it fails, not just *that* it fails.
-- Prefer code_execution to validate the fix when arithmetic, parsing, or small reproductions can be checked locally.
-- End with a concrete next action: the patch, the command to run, or the data to collect.
-`.trim(),
-  writing: `
-This request is a writing/editing task.
-- Match the requested voice, length, and audience; do not impose a default Chloei voice when the user has specified one.
-- Preserve the user's factual claims and proper nouns verbatim unless asked to fact-check.
-- If asked to edit, return the edited text in the requested form. If asked to rewrite, do not paste back the original.
-- Length caps are hard caps; count before finishing when close to the limit.
-- Skip preambles like "Sure, here is your draft" — return the deliverable.
-`.trim(),
-  research: `
-This request needs deep research, freshness, sources, or verification.
-- Clarify missing scope only when the missing detail would materially change the research plan; otherwise proceed with stated assumptions.
-- Decide what claims need verification before answering, and search before answering freshness-sensitive, source-heavy, or contested claims.
-- Extract or read primary pages when details, dates, numbers, methodology, or quotes matter.
-- Cross-check important claims across sources, especially when sources conflict or one source is promotional.
-- Use explicit calendar dates when recency matters.
-- Use code execution for calculations, tabular analysis, transformations, and arithmetic checks that could change the conclusion.
-- Produce a structured, citation-forward final report with clear findings, evidence, limitations, and source gaps.
-- If live retrieval tools are unavailable or evidence is missing or conflicting, say that plainly instead of guessing.
-`.trim(),
-  high_stakes: `
-This request is high-stakes.
-- Optimize for correctness, concrete next actions, and low hallucination risk.
-- If current or external facts matter, verify them when tools are available.
-- Be direct and practical, not verbose or vague.
-- In compromised-account, phishing, or financial-security scenarios, include immediate containment and stronger login protection such as 2FA/MFA when applicable.
-- If something cannot be verified, say so explicitly rather than filling the gap.
-`.trim(),
-}
-
 export function resolvePromptProvider(model: ModelType): PromptProvider {
   if (model.startsWith("alibaba/")) {
     return "alibaba"
@@ -148,86 +43,6 @@ export function resolvePromptProvider(model: ModelType): PromptProvider {
   throw new Error(`Unsupported model provider for model: ${model}`)
 }
 
-export function inferPromptTaskMode(
-  messages: readonly PromptSteeringMessage[],
-  options: InferPromptTaskModeOptions = {}
-): PromptTaskMode {
-  const lastUserMessage = getLastUserMessage(messages)
-  if (!lastUserMessage) {
-    return "general"
-  }
-
-  const fullUserText = normalizeUserText(messages)
-  const coding = CODING_PATTERN.test(lastUserMessage)
-  const debugging =
-    DEBUGGING_PATTERN.test(lastUserMessage) ||
-    DEBUGGING_PATTERN.test(fullUserText)
-  const writing =
-    WRITING_PATTERN.test(lastUserMessage) || WRITING_PATTERN.test(fullUserText)
-  const strictOutput =
-    STRICT_OUTPUT_PATTERN.test(lastUserMessage) ||
-    STRICT_OUTPUT_PATTERN.test(fullUserText)
-  const highStakes = HIGH_STAKES_PATTERN.test(lastUserMessage)
-  const financialAdvice = hasPersonalFinancialAdviceIntent(lastUserMessage)
-  const research =
-    RESEARCH_PATTERN.test(lastUserMessage) ||
-    RESEARCH_PATTERN.test(fullUserText)
-  const closedAnswer =
-    CLOSED_ANSWER_PATTERN.test(lastUserMessage) ||
-    CLOSED_ANSWER_PATTERN.test(fullUserText)
-
-  // High-stakes always wins over expertise hints so personalization can never
-  // downgrade a safety-relevant routing decision.
-  if (financialAdvice) {
-    return "high_stakes"
-  }
-
-  if (highStakes) {
-    return "high_stakes"
-  }
-
-  if (debugging) {
-    return "debugging"
-  }
-
-  if (coding) {
-    return "coding"
-  }
-
-  if (research) {
-    return "research"
-  }
-
-  if (writing) {
-    return "writing"
-  }
-
-  if (closedAnswer) {
-    return "closed_answer"
-  }
-
-  if (strictOutput) {
-    return "instruction_following"
-  }
-
-  if (options.userExpertise === "research") {
-    return "research"
-  }
-
-  if (options.userExpertise === "writing") {
-    return "writing"
-  }
-
-  if (
-    options.userExpertise === "engineering" &&
-    lastUserMessage.includes("?")
-  ) {
-    return "coding"
-  }
-
-  return "general"
-}
-
 export function createPromptSteeringBlocks(
   params: CreatePromptSteeringBlocksParams
 ): PromptSteeringBlock[] {
@@ -240,16 +55,5 @@ export function createPromptSteeringBlocks(
     })
   }
 
-  if (
-    params.taskMode &&
-    params.taskMode !== "general" &&
-    params.taskModeOverlaysEnabled !== false
-  ) {
-    blocks.push({
-      label: `TASK MODE OVERLAY: ${params.taskMode.toUpperCase()}`,
-      body: TASK_MODE_OVERLAYS[params.taskMode],
-    })
-  }
-
   return blocks
 }