auth0 · frederikprijck · Jun 16, 2026
diff --git a/AGENTS.md b/AGENTS.md
@@ -344,6 +344,8 @@ All agent runners have access to file/shell tools in their respective environmen
 
 When MCP tools are enabled (`--tools mcp`), MCP server tool definitions are appended to the tool list.
 
+Authenticated HTTP MCP servers are configured with an `auth` block (`tokenUrl`, `clientId`, `clientSecret`, `audience`). The framework mints a Management API token per agent job via a client-credentials exchange and forwards it to the MCP server. All four runners support this: claude-code, copilot, and gemini-cli forward it as an `Authorization: Bearer` header in their server config; codex passes it via a `bearer_token_env_var` reference in `config.toml` (Codex rejects an inline token, so the secret stays out of the file). A failed token mint skips the server with a warning rather than registering it unauthenticated. Full setup guide: [docs/PROTECTED_MCP.md](docs/PROTECTED_MCP.md).
+
 ---
 
 ## Models

diff --git a/apps/auth0-evals/eval.config.js b/apps/auth0-evals/eval.config.js
@@ -44,6 +44,22 @@ export default {
         type: 'http',
         url: 'https://auth0.com/docs/mcp',
       },
+      ...(process.env.MCP_TENANT_DOMAIN &&
+      process.env.MCP_CLIENT_ID &&
+      process.env.MCP_CLIENT_SECRET
+        ? {
+            'auth0-hosted-mcp': {
+              type: 'http',
+              url: `https://${process.env.MCP_TENANT_DOMAIN}/v1/mcp`,
+              auth: {
+                tokenUrl: `https://${process.env.MCP_TENANT_DOMAIN}/oauth/token`,
+                clientId: process.env.MCP_CLIENT_ID,
+                clientSecret: process.env.MCP_CLIENT_SECRET,
+                audience: `https://${process.env.MCP_TENANT_DOMAIN}/api/v2/`,
+              },
+            },
+          }
+        : {}),
     },
   },
 
@@ -81,6 +97,13 @@ export default {
   },
 
 
+  sandbox: {
+    // Host env vars forwarded into the Docker sandbox (names only; values resolved
+    // from process.env at launch). Needed so the authenticated auth0-hosted-mcp
+    // server can mint its token inside the container.
+    passthroughEnv: ['MCP_TENANT_DOMAIN', 'MCP_CLIENT_ID', 'MCP_CLIENT_SECRET'],
+  },
+
   braintrust: {
     projectId: '38395851-dd41-46ec-a971-a30402db6921',
     datasetName: 'auth0-evals',

diff --git a/docs/PROTECTED_MCP.md b/docs/PROTECTED_MCP.md
@@ -0,0 +1,126 @@
+# Protected MCP Servers
+
+This guide covers how to wire up a **protected HTTP MCP server** — one that requires an `Authorization: Bearer` token, such as the Auth0 hosted MCP server which authenticates with a Management API token. It explains the credentials you need, the config to add, and how the framework mints and forwards the token to every runner.
+
+---
+
+## When to use this
+
+- You want an agent to use an MCP server that requires an `Authorization: Bearer` token rather than being publicly reachable.
+- The credentials come from a Machine-to-Machine (client-credentials) application, and you want a fresh token minted per job rather than a long-lived secret baked into config.
+
+> **Runner support:** token forwarding is implemented for **all runners** — claude-code, copilot, gemini-cli, and codex. The first three forward the token as an `Authorization: Bearer` header in their MCP server config; codex passes it via a `bearer_token_env_var` reference in `config.toml` (Codex rejects an inline token, so the secret never lands in the file).
+
+---
+
+## Prerequisites
+
+You need an Auth0 tenant with a **Machine-to-Machine application** authorized for the **Management API**:
+
+1. In the Auth0 Dashboard, create (or reuse) an M2M application.
+2. Authorize it for the **Auth0 Management API** (`https://YOUR_TENANT/api/v2/`) with the scopes the task needs — e.g. `read:clients` to list applications.
+3. Note the application's **Client ID** and **Client Secret**.
+
+> **Audience matters.** The hosted MCP server authenticates with a **Management API** token (`/api/v2/` audience). The `/v1/mcp` audience is reserved by Auth0 and returns `access_denied` for client credentials — so the `audience` field below points at `/api/v2/`, not at the MCP URL.
+
+---
+
+## Step 1 — Set the environment variables
+
+The server entry in `eval.config.js` is gated on three env vars. If any is missing, the server is omitted (see [Troubleshooting](#troubleshooting)).
+
+```bash
+export MCP_TENANT_DOMAIN="your-tenant.us.auth0.com"   # no scheme, no trailing slash
+export MCP_CLIENT_ID="your-m2m-client-id"
+export MCP_CLIENT_SECRET="your-m2m-client-secret"
+```
+
+You can also set the LLM `--model` you intend to run; the proxy/model setup is unchanged from any other eval.
+
+---
+
+## Step 2 — Register the MCP server in `eval.config.js`
+
+The Auth0 hosted MCP server is already registered in `apps/auth0-evals/eval.config.js`, gated on the env vars above:
+
+```js
+mcp: {
+  servers: {
+    'auth0-docs': { type: 'http', url: 'https://auth0.com/docs/mcp' },
+
+    ...(process.env.MCP_TENANT_DOMAIN &&
+    process.env.MCP_CLIENT_ID &&
+    process.env.MCP_CLIENT_SECRET
+      ? {
+          'auth0-hosted-mcp': {
+            type: 'http',
+            url: `https://${process.env.MCP_TENANT_DOMAIN}/v1/mcp`,
+            auth: {
+              tokenUrl: `https://${process.env.MCP_TENANT_DOMAIN}/oauth/token`,
+              clientId: process.env.MCP_CLIENT_ID,
+              clientSecret: process.env.MCP_CLIENT_SECRET,
+              audience: `https://${process.env.MCP_TENANT_DOMAIN}/api/v2/`,
+            },
+          },
+        }
+      : {}),
+  },
+},
+```
+
+To wire up **a different** protected HTTP MCP server, add another entry with an `auth` block. The `auth` field is typed as `MCPOAuthConfig`:
+
+| Field | Meaning |
+|---|---|
+| `tokenUrl` | OAuth token endpoint, e.g. `https://TENANT/oauth/token` |
+| `clientId` | Client ID for the client-credentials grant |
+| `clientSecret` | Client secret for the client-credentials grant |
+| `audience` | API audience the token is minted for, e.g. `https://TENANT/api/v2/` |
+
+Servers **without** an `auth` block (like `auth0-docs`) continue to work unauthenticated.
+
+---
+
+## Step 3 — How the token is minted and forwarded
+
+You don't write any token code — the framework does it per job:
+
+1. When a job starts with `--tools mcp`, the active runner walks the configured MCP servers.
+2. For each HTTP server with an `auth` block, it calls `mintMcpToken(auth)` — a **client-credentials** exchange (`grant_type=client_credentials`) against `tokenUrl` for the given `audience`.
+3. The resulting token is forwarded to the MCP server. claude-code, copilot, and gemini-cli set it as an `Authorization: Bearer <token>` header in the server config; codex writes a `bearer_token_env_var` reference into `config.toml` and injects the token into the Codex process env under that name (Codex rejects an inline `bearer_token`, so the secret stays out of the config file).
+
+The token is minted **per job**, not at config-load time, so a long `--model all --mode all` matrix never reuses an expired token.
+
+**Loud failure:** if the token mint fails (bad creds, network error, missing field), the server is **skipped with a `logger.warn`** rather than registered unauthenticated. This makes a misconfigured run look like "MCP wasn't available" — not a silent "the agent chose not to use MCP."
+
+---
+
+## Sandbox credential passthrough
+
+When evals run in the Docker sandbox (the default), the framework can only mint a token inside the container if the credentials reach it. The three `MCP_*` vars are forwarded via `sandbox.passthroughEnv` in `eval.config.js`:
+
+```js
+sandbox: {
+  passthroughEnv: ['MCP_TENANT_DOMAIN', 'MCP_CLIENT_ID', 'MCP_CLIENT_SECRET'],
+},
+```
+
+Only the **names** are listed here; values are resolved from `process.env` at job launch and never stored in config. Vars that aren't currently set on the host are skipped.
+
+---
+
+## Troubleshooting
+
+| Symptom | Likely cause |
+|---|---|
+| Log: `MCP server 'auth0-hosted-mcp' skipped — token mint failed or creds missing` | One of `MCP_TENANT_DOMAIN` / `MCP_CLIENT_ID` / `MCP_CLIENT_SECRET` is unset, or the token endpoint rejected the credentials. |
+| `auth0-hosted-mcp` not registered at all (only `auth0-docs`) | The env-var gate in `eval.config.js` evaluated false — at least one of the three vars is empty. |
+| Token mint returns `access_denied` | `audience` points at `/v1/mcp` instead of `/api/v2/`, or the M2M app isn't authorized for the Management API. |
+| `401` late in a very long job | The minted token's TTL expired mid-job. Management API tokens are typically long-lived (hours) vs. the 30-min job timeout, so this is rare. |
+
+---
+
+## Related docs
+
+- [docs/ADDING_EVALS.md](ADDING_EVALS.md) — grader primitives and how evals are structured.
+- [AGENTS.md](../AGENTS.md) — framework overview, runner details, and the MCP auth summary.
diff --git a/packages/eval-core/src/config/defaults.ts b/packages/eval-core/src/config/defaults.ts
@@ -50,4 +50,6 @@ export const DEFAULT_FRAMEWORK_CONFIG: Required<FrameworkConfig> = {
   },
 
   scoring: {},
+
+  sandbox: {},
 };
diff --git a/packages/eval-core/src/config/framework.ts b/packages/eval-core/src/config/framework.ts
@@ -37,11 +37,28 @@ export interface MCPStdioServerConfig {
   env?: Record<string, string>;
 }
 
+export interface MCPOAuthConfig {
+  /** OAuth token endpoint, e.g. https://TENANT/oauth/token */
+  tokenUrl: string;
+  /** OAuth client ID for the client-credentials grant. */
+  clientId: string;
+  /** OAuth client secret for the client-credentials grant. */
+  clientSecret: string;
+  /** API audience the token is requested for, e.g. https://TENANT/api/v2/ */
+  audience: string;
+}
+
 export interface MCPHttpServerConfig {
   /** URL-based MCP server. */
   type: 'http';
   /** HTTP URL for the remote MCP server. */
   url: string;
+  /**
+   * Optional OAuth config. When present, the framework mints a fresh Bearer
+   * token per agent job and injects it as an Authorization header. If any
+   * field is empty (e.g. a missing env var), the server is omitted with a warning.
+   */
+  auth?: MCPOAuthConfig;
 }
 
 /** Discriminated union — either a stdio (command-based) or http (URL-based) MCP server. */
@@ -119,6 +136,16 @@ export interface ScoringConfig {
 
 // ── Root config ──────────────────────────────────────────────────────────────
 
+export interface SandboxConfig {
+  /**
+   * Names of host environment variables to forward into the Docker sandbox.
+   * Each name is resolved from `process.env` at job launch; only currently-set
+   * vars are forwarded. Use for app-specific secrets the framework can't know
+   * about (e.g. MCP server credentials). Names only — values are never stored here.
+   */
+  passthroughEnv?: string[];
+}
+
 export interface FrameworkConfig {
   /** Directory containing evaluation definitions (required). */
   evalsDir: string;
@@ -140,4 +167,6 @@ export interface FrameworkConfig {
   braintrust?: BraintrustConfig;
   /** Scoring behaviour overrides (e.g. custom doc URL allowlist). */
   scoring?: ScoringConfig;
+  /** Docker sandbox settings (e.g. env vars to forward into the container). */
+  sandbox?: SandboxConfig;
 }
diff --git a/packages/eval-core/src/config/mcp-auth.ts b/packages/eval-core/src/config/mcp-auth.ts
@@ -0,0 +1,41 @@
+/**
+ * OAuth token minting for authenticated HTTP MCP servers.
+ *
+ * Performs a client-credentials exchange to obtain a short-lived Bearer token.
+ * Called once per agent job so a long matrix run never reuses an expired token.
+ */
+
+import type { MCPOAuthConfig } from './framework.js';
+import { logger } from '../utils/logger.js';
+
+export async function mintMcpToken(auth: MCPOAuthConfig): Promise<string | undefined> {
+  if (!auth.tokenUrl || !auth.clientId || !auth.clientSecret || !auth.audience) {
+    logger.warn('[mcp-auth] Incomplete OAuth config — skipping token mint');
+    return undefined;
+  }
+  try {
+    const res = await fetch(auth.tokenUrl, {
+      method: 'POST',
+      headers: { 'content-type': 'application/json' },
+      body: JSON.stringify({
+        grant_type: 'client_credentials',
+        client_id: auth.clientId,
+        client_secret: auth.clientSecret,
+        audience: auth.audience,
+      }),
+    });
+    if (!res.ok) {
+      logger.warn(`[mcp-auth] Token request failed: ${res.status}`);
+      return undefined;
+    }
+    const { access_token } = (await res.json()) as { access_token?: string };
+    if (!access_token) {
+      logger.warn('[mcp-auth] Token response missing access_token');
+      return undefined;
+    }
+    return access_token;
+  } catch (err) {
+    logger.warn(`[mcp-auth] Token request error: ${err instanceof Error ? err.message : String(err)}`);
+    return undefined;
+  }
+}
diff --git a/packages/eval-core/src/index.ts b/packages/eval-core/src/index.ts
@@ -62,17 +62,20 @@ export type {
   MCPServerConfig,
   MCPStdioServerConfig,
   MCPHttpServerConfig,
+  MCPOAuthConfig,
   SkillsConfig,
   RemoteSkillRepo,
   JudgeConfig,
   ModelsConfig,
   WorkspaceConfig,
   BraintrustConfig,
   ScoringConfig,
+  SandboxConfig,
 } from './config/framework.js';
 export { DEFAULT_FRAMEWORK_CONFIG } from './config/defaults.js';
 export { defineConfig, loadConfig, deepMerge } from './config/loader.js';
 export type { LoadConfigOptions } from './config/loader.js';
+export { mintMcpToken } from './config/mcp-auth.js';
 
 // Workspace
 export {

diff --git a/packages/eval-core/tests/config/mcp-auth.test.ts b/packages/eval-core/tests/config/mcp-auth.test.ts
@@ -0,0 +1,56 @@
+import { describe, it, expect, vi, afterEach } from 'vitest';
+import { mintMcpToken } from '../../src/config/mcp-auth.js';
+import type { MCPOAuthConfig } from '../../src/config/framework.js';
+
+const validAuth: MCPOAuthConfig = {
+  tokenUrl: 'https://tenant.us.auth0.com/oauth/token',
+  clientId: 'client-id',
+  clientSecret: 'client-secret',
+  audience: 'https://tenant.us.auth0.com/api/v2/',
+};
+
+afterEach(() => {
+  vi.restoreAllMocks();
+});
+
+describe('mintMcpToken', () => {
+  it('returns the access_token on a successful exchange', async () => {
+    const fetchMock = vi.fn().mockResolvedValue({
+      ok: true,
+      json: async () => ({ access_token: 'tok-123' }),
+    });
+    vi.stubGlobal('fetch', fetchMock);
+
+    const token = await mintMcpToken(validAuth);
+
+    expect(token).toBe('tok-123');
+    expect(fetchMock).toHaveBeenCalledOnce();
+    const [url, init] = fetchMock.mock.calls[0]!;
+    expect(url).toBe(validAuth.tokenUrl);
+    const body = JSON.parse((init as RequestInit).body as string);
+    expect(body).toMatchObject({
+      grant_type: 'client_credentials',
+      client_id: 'client-id',
+      client_secret: 'client-secret',
+      audience: validAuth.audience,
+    });
+  });
+
+  it('returns undefined when the response is not ok', async () => {
+    vi.stubGlobal('fetch', vi.fn().mockResolvedValue({ ok: false, json: async () => ({}) }));
+    expect(await mintMcpToken(validAuth)).toBeUndefined();
+  });
+
+  it('returns undefined without calling fetch when a credential is missing', async () => {
+    const fetchMock = vi.fn();
+    vi.stubGlobal('fetch', fetchMock);
+    const token = await mintMcpToken({ ...validAuth, clientSecret: '' });
+    expect(token).toBeUndefined();
+    expect(fetchMock).not.toHaveBeenCalled();
+  });
+
+  it('returns undefined when the body has no access_token', async () => {
+    vi.stubGlobal('fetch', vi.fn().mockResolvedValue({ ok: true, json: async () => ({}) }));
+    expect(await mintMcpToken(validAuth)).toBeUndefined();
+  });
+});
diff --git a/packages/eval/src/cli/run.ts b/packages/eval/src/cli/run.ts
@@ -153,6 +153,7 @@ async function runAgentJob(
         agentType,
         apiKey,
         ghToken: process.env.GH_TOKEN,
+        passthroughEnv: getFrameworkConfig().sandbox.passthroughEnv,
       });
     }
-Original file line number
+Diff line change
@@ Expand Up @@
     When MCP tools are enabled (`--tools mcp`), MCP server tool definitions are appended to the tool list.
+    Authenticated HTTP MCP servers are configured with an `auth` block (`tokenUrl`, `clientId`, `clientSecret`, `audience`). The framework mints a Management API token per agent job via a client-credentials exchange and forwards it to the MCP server. All four runners support this: claude-code, copilot, and gemini-cli forward it as an `Authorization: Bearer` header in their server config; codex passes it via a `bearer_token_env_var` reference in `config.toml` (Codex rejects an inline token, so the secret stays out of the file). A failed token mint skips the server with a warning rather than registering it unauthenticated. Full setup guide: [docs/PROTECTED_MCP.md](docs/PROTECTED_MCP.md).
     ---
     ## Models
@@ Expand Down @@