Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -344,6 +344,8 @@ All agent runners have access to file/shell tools in their respective environmen

When MCP tools are enabled (`--tools mcp`), MCP server tool definitions are appended to the tool list.

Authenticated HTTP MCP servers are configured with an `auth` block (`tokenUrl`, `clientId`, `clientSecret`, `audience`). The framework mints a Management API token per agent job via a client-credentials exchange and forwards it to the MCP server. All four runners support this: claude-code, copilot, and gemini-cli forward it as an `Authorization: Bearer` header in their server config; codex passes it via a `bearer_token_env_var` reference in `config.toml` (Codex rejects an inline token, so the secret stays out of the file). A failed token mint skips the server with a warning rather than registering it unauthenticated. Full setup guide: [docs/PROTECTED_MCP.md](docs/PROTECTED_MCP.md).

---

## Models
Expand Down
23 changes: 23 additions & 0 deletions apps/auth0-evals/eval.config.js
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,22 @@ export default {
type: 'http',
url: 'https://auth0.com/docs/mcp',
},
...(process.env.MCP_TENANT_DOMAIN &&
process.env.MCP_CLIENT_ID &&
process.env.MCP_CLIENT_SECRET
? {
'auth0-hosted-mcp': {
type: 'http',
url: `https://${process.env.MCP_TENANT_DOMAIN}/v1/mcp`,
auth: {
tokenUrl: `https://${process.env.MCP_TENANT_DOMAIN}/oauth/token`,
clientId: process.env.MCP_CLIENT_ID,
clientSecret: process.env.MCP_CLIENT_SECRET,
audience: `https://${process.env.MCP_TENANT_DOMAIN}/api/v2/`,
},
},
}
: {}),
},
},

Expand Down Expand Up @@ -81,6 +97,13 @@ export default {
},


sandbox: {
// Host env vars forwarded into the Docker sandbox (names only; values resolved
// from process.env at launch). Needed so the authenticated auth0-hosted-mcp
// server can mint its token inside the container.
passthroughEnv: ['MCP_TENANT_DOMAIN', 'MCP_CLIENT_ID', 'MCP_CLIENT_SECRET'],
},

braintrust: {
projectId: '38395851-dd41-46ec-a971-a30402db6921',
datasetName: 'auth0-evals',
Expand Down
126 changes: 126 additions & 0 deletions docs/PROTECTED_MCP.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
# Protected MCP Servers

This guide covers how to wire up a **protected HTTP MCP server** — one that requires an `Authorization: Bearer` token, such as the Auth0 hosted MCP server which authenticates with a Management API token. It explains the credentials you need, the config to add, and how the framework mints and forwards the token to every runner.

---

## When to use this

- You want an agent to use an MCP server that requires an `Authorization: Bearer` token rather than being publicly reachable.
- The credentials come from a Machine-to-Machine (client-credentials) application, and you want a fresh token minted per job rather than a long-lived secret baked into config.

> **Runner support:** token forwarding is implemented for **all runners** — claude-code, copilot, gemini-cli, and codex. The first three forward the token as an `Authorization: Bearer` header in their MCP server config; codex passes it via a `bearer_token_env_var` reference in `config.toml` (Codex rejects an inline token, so the secret never lands in the file).

---

## Prerequisites

You need an Auth0 tenant with a **Machine-to-Machine application** authorized for the **Management API**:

1. In the Auth0 Dashboard, create (or reuse) an M2M application.
2. Authorize it for the **Auth0 Management API** (`https://YOUR_TENANT/api/v2/`) with the scopes the task needs — e.g. `read:clients` to list applications.
3. Note the application's **Client ID** and **Client Secret**.

> **Audience matters.** The hosted MCP server authenticates with a **Management API** token (`/api/v2/` audience). The `/v1/mcp` audience is reserved by Auth0 and returns `access_denied` for client credentials — so the `audience` field below points at `/api/v2/`, not at the MCP URL.

---

## Step 1 — Set the environment variables

The server entry in `eval.config.js` is gated on three env vars. If any is missing, the server is omitted (see [Troubleshooting](#troubleshooting)).

```bash
export MCP_TENANT_DOMAIN="your-tenant.us.auth0.com" # no scheme, no trailing slash
export MCP_CLIENT_ID="your-m2m-client-id"
export MCP_CLIENT_SECRET="your-m2m-client-secret"
```

You can also set the LLM `--model` you intend to run; the proxy/model setup is unchanged from any other eval.

---

## Step 2 — Register the MCP server in `eval.config.js`

The Auth0 hosted MCP server is already registered in `apps/auth0-evals/eval.config.js`, gated on the env vars above:

```js
mcp: {
servers: {
'auth0-docs': { type: 'http', url: 'https://auth0.com/docs/mcp' },

...(process.env.MCP_TENANT_DOMAIN &&
process.env.MCP_CLIENT_ID &&
process.env.MCP_CLIENT_SECRET
? {
'auth0-hosted-mcp': {
type: 'http',
url: `https://${process.env.MCP_TENANT_DOMAIN}/v1/mcp`,
auth: {
tokenUrl: `https://${process.env.MCP_TENANT_DOMAIN}/oauth/token`,
clientId: process.env.MCP_CLIENT_ID,
clientSecret: process.env.MCP_CLIENT_SECRET,
audience: `https://${process.env.MCP_TENANT_DOMAIN}/api/v2/`,
},
},
}
: {}),
},
},
```

To wire up **a different** protected HTTP MCP server, add another entry with an `auth` block. The `auth` field is typed as `MCPOAuthConfig`:

| Field | Meaning |
|---|---|
| `tokenUrl` | OAuth token endpoint, e.g. `https://TENANT/oauth/token` |
| `clientId` | Client ID for the client-credentials grant |
| `clientSecret` | Client secret for the client-credentials grant |
| `audience` | API audience the token is minted for, e.g. `https://TENANT/api/v2/` |

Servers **without** an `auth` block (like `auth0-docs`) continue to work unauthenticated.

---

## Step 3 — How the token is minted and forwarded

You don't write any token code — the framework does it per job:

1. When a job starts with `--tools mcp`, the active runner walks the configured MCP servers.
2. For each HTTP server with an `auth` block, it calls `mintMcpToken(auth)` — a **client-credentials** exchange (`grant_type=client_credentials`) against `tokenUrl` for the given `audience`.
3. The resulting token is forwarded to the MCP server. claude-code, copilot, and gemini-cli set it as an `Authorization: Bearer <token>` header in the server config; codex writes a `bearer_token_env_var` reference into `config.toml` and injects the token into the Codex process env under that name (Codex rejects an inline `bearer_token`, so the secret stays out of the config file).

The token is minted **per job**, not at config-load time, so a long `--model all --mode all` matrix never reuses an expired token.

**Loud failure:** if the token mint fails (bad creds, network error, missing field), the server is **skipped with a `logger.warn`** rather than registered unauthenticated. This makes a misconfigured run look like "MCP wasn't available" — not a silent "the agent chose not to use MCP."

---

## Sandbox credential passthrough

When evals run in the Docker sandbox (the default), the framework can only mint a token inside the container if the credentials reach it. The three `MCP_*` vars are forwarded via `sandbox.passthroughEnv` in `eval.config.js`:

```js
sandbox: {
passthroughEnv: ['MCP_TENANT_DOMAIN', 'MCP_CLIENT_ID', 'MCP_CLIENT_SECRET'],
},
```

Only the **names** are listed here; values are resolved from `process.env` at job launch and never stored in config. Vars that aren't currently set on the host are skipped.

---

## Troubleshooting

| Symptom | Likely cause |
|---|---|
| Log: `MCP server 'auth0-hosted-mcp' skipped — token mint failed or creds missing` | One of `MCP_TENANT_DOMAIN` / `MCP_CLIENT_ID` / `MCP_CLIENT_SECRET` is unset, or the token endpoint rejected the credentials. |
| `auth0-hosted-mcp` not registered at all (only `auth0-docs`) | The env-var gate in `eval.config.js` evaluated false — at least one of the three vars is empty. |
| Token mint returns `access_denied` | `audience` points at `/v1/mcp` instead of `/api/v2/`, or the M2M app isn't authorized for the Management API. |
| `401` late in a very long job | The minted token's TTL expired mid-job. Management API tokens are typically long-lived (hours) vs. the 30-min job timeout, so this is rare. |

---

## Related docs

- [docs/ADDING_EVALS.md](ADDING_EVALS.md) — grader primitives and how evals are structured.
- [AGENTS.md](../AGENTS.md) — framework overview, runner details, and the MCP auth summary.
2 changes: 2 additions & 0 deletions packages/eval-core/src/config/defaults.ts
Original file line number Diff line number Diff line change
Expand Up @@ -50,4 +50,6 @@ export const DEFAULT_FRAMEWORK_CONFIG: Required<FrameworkConfig> = {
},

scoring: {},

sandbox: {},
};
29 changes: 29 additions & 0 deletions packages/eval-core/src/config/framework.ts
Original file line number Diff line number Diff line change
Expand Up @@ -37,11 +37,28 @@ export interface MCPStdioServerConfig {
env?: Record<string, string>;
}

export interface MCPOAuthConfig {
/** OAuth token endpoint, e.g. https://TENANT/oauth/token */
tokenUrl: string;
/** OAuth client ID for the client-credentials grant. */
clientId: string;
/** OAuth client secret for the client-credentials grant. */
clientSecret: string;
/** API audience the token is requested for, e.g. https://TENANT/api/v2/ */
audience: string;
}

export interface MCPHttpServerConfig {
/** URL-based MCP server. */
type: 'http';
/** HTTP URL for the remote MCP server. */
url: string;
/**
* Optional OAuth config. When present, the framework mints a fresh Bearer
* token per agent job and injects it as an Authorization header. If any
* field is empty (e.g. a missing env var), the server is omitted with a warning.
*/
auth?: MCPOAuthConfig;
}

/** Discriminated union — either a stdio (command-based) or http (URL-based) MCP server. */
Expand Down Expand Up @@ -119,6 +136,16 @@ export interface ScoringConfig {

// ── Root config ──────────────────────────────────────────────────────────────

export interface SandboxConfig {
/**
* Names of host environment variables to forward into the Docker sandbox.
* Each name is resolved from `process.env` at job launch; only currently-set
* vars are forwarded. Use for app-specific secrets the framework can't know
* about (e.g. MCP server credentials). Names only — values are never stored here.
*/
passthroughEnv?: string[];
}

export interface FrameworkConfig {
/** Directory containing evaluation definitions (required). */
evalsDir: string;
Expand All @@ -140,4 +167,6 @@ export interface FrameworkConfig {
braintrust?: BraintrustConfig;
/** Scoring behaviour overrides (e.g. custom doc URL allowlist). */
scoring?: ScoringConfig;
/** Docker sandbox settings (e.g. env vars to forward into the container). */
sandbox?: SandboxConfig;
}
41 changes: 41 additions & 0 deletions packages/eval-core/src/config/mcp-auth.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
/**
* OAuth token minting for authenticated HTTP MCP servers.
*
* Performs a client-credentials exchange to obtain a short-lived Bearer token.
* Called once per agent job so a long matrix run never reuses an expired token.
*/

import type { MCPOAuthConfig } from './framework.js';
import { logger } from '../utils/logger.js';

export async function mintMcpToken(auth: MCPOAuthConfig): Promise<string | undefined> {
if (!auth.tokenUrl || !auth.clientId || !auth.clientSecret || !auth.audience) {
logger.warn('[mcp-auth] Incomplete OAuth config — skipping token mint');
return undefined;
}
try {
const res = await fetch(auth.tokenUrl, {
method: 'POST',
headers: { 'content-type': 'application/json' },
body: JSON.stringify({
grant_type: 'client_credentials',
client_id: auth.clientId,
client_secret: auth.clientSecret,
audience: auth.audience,
}),
});
if (!res.ok) {
logger.warn(`[mcp-auth] Token request failed: ${res.status}`);
return undefined;
}
const { access_token } = (await res.json()) as { access_token?: string };
if (!access_token) {
logger.warn('[mcp-auth] Token response missing access_token');
return undefined;
}
return access_token;
} catch (err) {
logger.warn(`[mcp-auth] Token request error: ${err instanceof Error ? err.message : String(err)}`);
return undefined;
}
}
3 changes: 3 additions & 0 deletions packages/eval-core/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -62,17 +62,20 @@ export type {
MCPServerConfig,
MCPStdioServerConfig,
MCPHttpServerConfig,
MCPOAuthConfig,
SkillsConfig,
RemoteSkillRepo,
JudgeConfig,
ModelsConfig,
WorkspaceConfig,
BraintrustConfig,
ScoringConfig,
SandboxConfig,
} from './config/framework.js';
export { DEFAULT_FRAMEWORK_CONFIG } from './config/defaults.js';
export { defineConfig, loadConfig, deepMerge } from './config/loader.js';
export type { LoadConfigOptions } from './config/loader.js';
export { mintMcpToken } from './config/mcp-auth.js';

// Workspace
export {
Expand Down
56 changes: 56 additions & 0 deletions packages/eval-core/tests/config/mcp-auth.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
import { describe, it, expect, vi, afterEach } from 'vitest';
import { mintMcpToken } from '../../src/config/mcp-auth.js';
import type { MCPOAuthConfig } from '../../src/config/framework.js';

const validAuth: MCPOAuthConfig = {
tokenUrl: 'https://tenant.us.auth0.com/oauth/token',
clientId: 'client-id',
clientSecret: 'client-secret',
audience: 'https://tenant.us.auth0.com/api/v2/',
};

afterEach(() => {
vi.restoreAllMocks();
});

describe('mintMcpToken', () => {
it('returns the access_token on a successful exchange', async () => {
const fetchMock = vi.fn().mockResolvedValue({
ok: true,
json: async () => ({ access_token: 'tok-123' }),
});
vi.stubGlobal('fetch', fetchMock);

const token = await mintMcpToken(validAuth);

expect(token).toBe('tok-123');
expect(fetchMock).toHaveBeenCalledOnce();
const [url, init] = fetchMock.mock.calls[0]!;
expect(url).toBe(validAuth.tokenUrl);
const body = JSON.parse((init as RequestInit).body as string);
expect(body).toMatchObject({
grant_type: 'client_credentials',
client_id: 'client-id',
client_secret: 'client-secret',
audience: validAuth.audience,
});
});

it('returns undefined when the response is not ok', async () => {
vi.stubGlobal('fetch', vi.fn().mockResolvedValue({ ok: false, json: async () => ({}) }));
expect(await mintMcpToken(validAuth)).toBeUndefined();
});

it('returns undefined without calling fetch when a credential is missing', async () => {
const fetchMock = vi.fn();
vi.stubGlobal('fetch', fetchMock);
const token = await mintMcpToken({ ...validAuth, clientSecret: '' });
expect(token).toBeUndefined();
expect(fetchMock).not.toHaveBeenCalled();
});

it('returns undefined when the body has no access_token', async () => {
vi.stubGlobal('fetch', vi.fn().mockResolvedValue({ ok: true, json: async () => ({}) }));
expect(await mintMcpToken(validAuth)).toBeUndefined();
});
});
1 change: 1 addition & 0 deletions packages/eval/src/cli/run.ts
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,7 @@ async function runAgentJob(
agentType,
apiKey,
ghToken: process.env.GH_TOKEN,
passthroughEnv: getFrameworkConfig().sandbox.passthroughEnv,
});
}

Expand Down
Loading
Loading