scorecard-ai · Yash1hi · Feb 16, 2026 · Feb 16, 2026 · Feb 24, 2026 · Feb 24, 2026
diff --git a/docs.json b/docs.json
@@ -166,6 +166,7 @@
           {
             "group": "Advanced features",
             "pages": [
+              "features/sdk-tracing",
               "features/mcp",
               "features/synthetic-data-generation",
               "features/multi-turn-simulation",

diff --git a/features/mcp.mdx b/features/mcp.mdx
@@ -8,229 +8,221 @@
 
 ## Overview
 
-Scorecard's MCP (Model Context Protocol) server transforms AI assistants like Claude and Cursor into conversational AI evaluation companions. With natural language commands, you can manage projects, create testsets, configure metrics, run evaluations, and analyze results—all through your favorite AI assistant's interface.
+Scorecard's MCP (Model Context Protocol) server lets you manage projects, create testsets, configure metrics, run evaluations, and analyze results through natural language in any MCP-compatible client.
 
-## Setting Up the MCP Server
+## Available Tools
 
-### Prerequisites
+The MCP server exposes tools across 8 categories for full programmatic access to Scorecard.
 
-- An MCP-compatible client (Claude Desktop, Cursor, or other MCP clients)
-- A Scorecard account with API access
+<Frame caption="Scorecard MCP server tools listed in Claude Code.">
+  <img src="/images/mcp-tools-overview.png" alt="Scorecard MCP server tool listing showing ~45 available tools across Metrics, Scores, Systems, Annotations, and Docs." />
+</Frame>
 
-### Remote configuration (recommended)
+<AccordionGroup>
 
-You can install the Scorecard remote MCP server without any dependencies.
+<Accordion title="Projects">
 
-#### Claude Desktop
+| Tool | Description |
+|------|-------------|
+| `list_projects` | List all Projects, ordered by creation date |
+| `create_projects` | Create a new Project |
 
-Go to Claude Desktop settings page and click on the "Connectors" tab. Click "Add custom connector" and paste the following URL: `https://mcp.scorecard.io/mcp`. Click "Add" on the modal, then click "Connect" on the modal to login to Scorecard.
+</Accordion>
 
-<DarkLightImage
-  lightSrc="/images/claude-desktop-mcp-light.png"
-  caption="Screenshot of the Claude Desktop MCP connector"
-/>
+<Accordion title="Testsets">
 
-### Local configuration
+| Tool | Description |
+|------|-------------|
+| `list_testsets` | List Testsets in a Project |
+| `get_testsets` | Get a specific Testset by ID |
+| `create_testsets` | Create a new Testset with a JSON schema and field mappings |
+| `update_testsets` | Update a Testset's name, description, schema, or field mappings |
+| `delete_testsets` | Delete a Testset |
 
-You can directly run the MCP Server locally via npx:
+</Accordion>
 
-```sh
-export SCORECARD_API_KEY="My API Key"
-npx -y scorecard-ai-mcp@latest
-```
+<Accordion title="Testcases">
 
-If you already have a client, consult their documentation to install the MCP server.
+| Tool | Description |
+|------|-------------|
+| `list_testcases` | List Testcases in a Testset |
+| `get_testcases` | Get a specific Testcase by ID |
+| `create_testcases` | Create up to 100 Testcases in a Testset |
+| `update_testcases` | Replace the data of an existing Testcase |
+| `delete_testcases` | Delete multiple Testcases by ID |
 
-For clients with a configuration JSON, it might look something like this:
+</Accordion>
 
-```json
-{
-  "mcpServers": {
-    "scorecard_ai": {
-      "command": "npx",
-      "args": ["-y", "scorecard-ai-mcp", "--client=claude", "--tools=dynamic"],
-      "env": {
-        "SCORECARD_API_KEY": "ak_MyAPIKey",
-      }
-    }
-  }
-}
-```
+<Accordion title="Metrics">
 
-<Note>
-The MCP server uses Clerk OAuth authentication and JWT tokens to securely connect to your Scorecard account. The configuration is identical across all MCP clients—simply add it to your client's MCP settings.
-</Note>
+| Tool | Description |
+|------|-------------|
+| `list_metrics` | List Metrics configured for a Project |
+| `get_metrics` | Get a specific Metric by ID |
+| `create_metrics` | Create a Metric — supports `ai`, `human`, and `heuristic` eval types with `int`, `float`, or `boolean` output |
+| `update_metrics` | Update an existing Metric |
+| `delete_metrics` | Delete a Metric by ID |
 
-## Core Capabilities
+</Accordion>
 
-The MCP server provides natural language access to Scorecard's core functionality:
+<Accordion title="Runs">
 
-<Accordion title="Project Management">
-Create and manage evaluation projects for your AI systems.
+| Tool | Description |
+|------|-------------|
+| `list_runs` | List Runs for a Project, most recent first |
+| `get_runs` | Get a specific Run by ID |
+| `create_runs` | Create a new Run against a Testset and System Version |
 
-**Example Commands:**
-- "Create a new project for evaluating my customer service chatbot"
-- "Show me all my current projects"
-- "Set up a project for testing my RAG pipeline"
 </Accordion>
 
-<Accordion title="Testset Creation">
-Build comprehensive testsets with various scenarios and edge cases.
+<Accordion title="Records">
 
-**Example Commands:**
-- "Create a testset for customer service scenarios"
-- "Add 20 testcases covering product returns and refunds"
-- "Import testcases from my CSV file"
-</Accordion>
-
-<Accordion title="Testcase Organization">
-Organize and categorize your testcases for systematic evaluation.
+| Tool | Description |
+|------|-------------|
+| `list_records` | List Records for a Run, including all scores |
+| `create_records` | Create a new Record (system execution result) in a Run |
+| `delete_records` | Delete a specific Record by ID |
 
-**Example Commands:**
-- "Group testcases by difficulty level"
-- "Add tags for 'edge cases' and 'common queries'"
-- "Show me all testcases related to billing issues"
 </Accordion>
 
-<Accordion title="Custom Metrics Configuration">
-Define metrics that matter for your specific use case.
+<Accordion title="Scores">
+
+| Tool | Description |
+|------|-------------|
+| `upsert_scores` | Create or update a Score for a Record and Metric — updates if one already exists |
 
-**Example Commands:**
-- "Configure accuracy and helpfulness metrics"
-- "Add a custom metric for response relevance"
-- "Set up hallucination detection scoring"
 </Accordion>
 
-<Accordion title="AI System Version Management">
-Track different versions of your AI systems and models.
+<Accordion title="Systems">
+
+| Tool | Description |
+|------|-------------|
+| `list_systems` | List all Systems in a Project |
+| `get_systems` | Get a specific System by ID |
+| `upsert_systems` | Create a System, or update it if one with the same name exists |
+| `update_systems` | Update an existing System's name, description, or production version |
+| `delete_systems` | Delete a System by ID |
+| `get_systems_versions` | Get a specific System Version by ID |
+| `upsert_systems_versions` | Create a System Version, or update its name if the config already exists |
 
-**Example Commands:**
-- "Register my GPT-4 based assistant as version 1.0"
-- "Create a new version for my updated prompt template"
-- "Compare versions 1.0 and 2.0 of my chatbot"
 </Accordion>
 
-<Accordion title="Evaluation Runs & Analysis">
-Execute evaluations and analyze performance results.
+<Accordion title="Annotations">
+
+| Tool | Description |
+|------|-------------|
+| `list_annotations` | List annotations (ratings and comments) for a specific Record |
 
-**Example Commands:**
-- "Run an evaluation against my latest model"
-- "Show me the performance results from yesterday's run"
-- "Compare accuracy across the last 5 evaluation runs"
 </Accordion>
 
-<Accordion title="Annotations & Feedback">
-Access human feedback annotations (ratings and comments) left on records.
+<Accordion title="Docs">
+
+| Tool | Description |
+|------|-------------|
+| `search_docs` | Search SDK/API documentation — supports Python, TypeScript, Go, and more |
 
-**Example Commands:**
-- "Show me all annotations for record 12345"
-- "What feedback did reviewers leave on the latest run's records?"
-- "Filter annotations to only show negative ratings"
 </Accordion>
 
-## Example Workflows
+</AccordionGroup>
 
-### Complete Evaluation Setup
+## Setting Up the MCP Server
 
-Here's how you might set up a complete evaluation workflow using natural language in any MCP client:
+### Claude Code
 
-<Steps>
-<Step title="Create a Project">
-"Create a new project called 'Customer Support Bot v2' for evaluating my updated support assistant"
-</Step>
+Add the Scorecard remote MCP server with a single command:
 
-<Step title="Define Testcases">
-"Create a testset with 50 diverse customer support scenarios including billing, technical issues, and product inquiries"
-</Step>
+```bash
+claude mcp add --transport http scorecard https://mcp.scorecard.io/mcp
+```
 
-<Step title="Configure Metrics">
-"Set up metrics for accuracy, response helpfulness, hallucination rate, and response time"
-</Step>
+Complete the OAuth authentication flow in your browser when prompted. Verify the connection:
 
-<Step title="Register Your Model">
-"Register my current GPT-4 based assistant with custom prompts as version 2.0"
-</Step>
+```bash
+claude mcp list
+```
 
-<Step title="Run Evaluation">
-"Run a full evaluation of version 2.0 against all testcases"
-</Step>
+You should see `scorecard: https://mcp.scorecard.io/mcp (HTTP) - ✓ Connected`.
 
-<Step title="Analyze Results">
-"Show me areas where the model is underperforming and suggest improvements"
-</Step>
-</Steps>
+### Claude Desktop
 
-### Continuous Testing Workflow
+Go to Claude Desktop settings and click the "Connectors" tab. Click "Add custom connector" and paste the URL: `https://mcp.scorecard.io/mcp`. Click "Add", then "Connect" to login to Scorecard.
 
-<Tabs>
-<Tab title="Daily Testing">
-**Example Commands:**
-- "Run daily evaluation on production model"
-- "Alert me if accuracy drops below 85%"
-- "Generate weekly performance report"
-</Tab>
+<DarkLightImage
+  lightSrc="/images/claude-desktop-mcp-light.png"
+  caption="Adding the Scorecard MCP connector in Claude Desktop."
+/>
 
-<Tab title="A/B Testing">
-**Example Commands:**
-- "Compare performance between prompt A and prompt B"
-- "Run both versions on the same testset"
-- "Show statistical significance of differences"
-</Tab>
+### Local configuration
 
-<Tab title="Regression Testing">
-**Example Commands:**
-- "Test new model version against regression suite"
-- "Highlight any degradations from previous version"
-- "Generate detailed comparison report"
-</Tab>
-</Tabs>
+You can run the MCP server locally via npx:
 
-## Advanced Use Cases
+```sh
+export SCORECARD_API_KEY="your_api_key"
+npx -y scorecard-ai-mcp@latest
+```
 
-### Multi-Model Comparison
+For clients with a configuration JSON:
 
-Use your AI assistant to orchestrate complex multi-model evaluations:
+```json
+{
+  "mcpServers": {
+    "scorecard_ai": {
+      "command": "npx",
+      "args": ["-y", "scorecard-ai-mcp", "--client=claude", "--tools=dynamic"],
+      "env": {
+        "SCORECARD_API_KEY": "ak_MyAPIKey"
+      }
+    }
+  }
+}
+```
 
-**Example Commands:**
-- "Compare GPT-4, Claude 3, and Llama 3 on my customer service testset"
-- "Evaluate cost-performance tradeoffs between models"
-- "Recommend the best model for my use case"
+## Examples
 
-### Automated Test Generation
+### Create a project and testset
 
-Leverage your AI assistant's understanding to create comprehensive test suites:
+```
+Create a new Scorecard project called "Support Bot Eval". Then create a testset
+called "Support Scenarios" with 10 testcases. Each testcase should have:
+- inputs: "customerMessage" and "category" (billing, technical, or product)
+- expected: "idealResponse"
+```
 
-**Example Commands:**
-- "Generate 100 edge cases for my medical diagnosis assistant"
-- "Create adversarial testcases to test robustness"
-- "Build a testset from real user conversations"
+### Create metrics
 
-### Performance Optimization
+```
+Create two metrics in the "Support Bot Eval" project:
+1. "Response Accuracy" (integer 1-5) - How well does the response answer the question?
+2. "Tone" (boolean) - Is the response professional and empathetic?
+```
 
-Get insights and recommendations for improving your AI systems:
+### Analyze results
 
-**Example Commands:**
-- "Analyze failure patterns in my evaluation results"
-- "Suggest prompt improvements based on errors"
-- "Identify which types of queries need more training data"
+```
+Show me the latest run results for the "Support Bot Eval" project.
+Which testcases scored lowest on Response Accuracy?
+```
 
-## Technical Architecture
+### Generate testcases from a codebase
 
-The MCP server is:
-- Built on the Model Context Protocol standard
-- Compatible with any MCP client (Claude Desktop, Cursor, and more)
-- Deployed on Vercel edge infrastructure for low latency
-- Secured with Clerk OAuth authentication
-- Open source and available on [GitHub](https://github.com/scorecard-ai/scorecard-mcp)
+In Claude Code, you can combine file access with the MCP server:
 
-<Tip>
-The MCP server is continuously updated with new capabilities. Check the [GitHub repository](https://github.com/scorecard-ai/scorecard-mcp) for the latest features and updates.
-</Tip>
+```
+Read the API routes in src/api/ and generate 20 testcases covering
+the edge cases for each endpoint. Add them to the "API Tests" testset
+in project 1234.
+```
 
-## Getting Help
+### Iterate on metrics
+
+```
+The "Response Accuracy" metric is too lenient — update the prompt template
+to penalize responses that miss key details from the ideal response.
+```
 
-If you encounter issues or have questions about the MCP server:
+## Technical Details
 
-1. Check the [GitHub repository](https://github.com/scorecard-ai/scorecard-mcp) for documentation
-2. Open an issue for bugs or feature requests
-3. Contact Scorecard support (support@scorecard.io) for account-related questions
+- Built on the [Model Context Protocol](https://modelcontextprotocol.io/) standard
+- Compatible with any MCP client (Claude Code, Claude Desktop, Cursor, and more)
+- Secured with OAuth authentication
+- Open source: [github.com/scorecard-ai/scorecard-mcp](https://github.com/scorecard-ai/scorecard-mcp)