Test-Driven Agent Development for Salesforce Agentforce
A local web app that guides you through the full 7-stage TDAD pipeline — from agent spec to production deployment — powered by Claude AI, connected directly to your Salesforce DX project.
Version:
v1.0.0-beta
- What is AgentKit?
- Architecture
- Prerequisites
- Installation
- Configuration
- Running the app
- The 7-Stage Pipeline
- Salesforce DX reference
- Troubleshooting
- Security
AgentKit is a local UI for Test-Driven Agent Development (TDAD) on Salesforce Agentforce. It covers the full lifecycle from spec to production across 7 stages.
Instead of juggling CLI commands, YAML files, and terminal windows, AgentKit gives you:
- 📋 Stage 01 — Generate
agentSpec.yamlvia AI or CLI, edit inline, save to project - ✍️ Stage 02 — Author your
.agentscript via CLI command or Agent Skill prompt for Claude Code - ✅ Stage 03 — Validate your agent with
sf agent validate - 🧪 Stage 03.5 — Local Test with
sf agent preview— send utterances, collect traces, analyze routing and actions without deploying - 🚀 Stage 04 — Deploy to your dev org with publish + activate commands
- 🎯 Stage 05 — Formal Test — generate
testSpec.yaml(via AI, CLI, or Gherkin), run tests, track fix loops, and visualize pass rate history - 📊 Stage 06 — Observability — STDM analysis with
sf agent analyze - 🏭 Stage 07 — Production — staging + production deployment pipeline with all commands ready to copy
Everything runs locally. Your API key never leaves your machine.
AgentKit uses a React frontend + local Express server pattern. The server acts as a secure proxy — your Anthropic API key stays server-side only.
Browser (React UI — Vite, port 5173)
│
│ HTTP / SSE
▼
Local Express server (port 3001) ←── tdad-server.js
│
├── POST /ai ──▶ Anthropic API (Claude)
├── POST /files/save ──▶ SFDX project / specs/ or tests/
├── GET /files/specs ◀── lists saved YAML specs
├── GET /files/agents ◀── lists .agent files in project
├── GET /files/tests ◀── lists testSpec YAML files
├── GET /files/aievals ◀── lists AiEvaluationDefinition XMLs
├── GET /files/formal-tests ◀── structured test history
├── GET /history/runs ◀── runs + fix loops per suite
├── GET /history/report ◀── raw results JSON for a run
├── GET /sf/run?cmd=... ──▶ sf CLI (streamed via SSE)
└── GET /status ◀── project health + org detection
Before installing AgentKit, make sure you have:
| Tool | Version | Install |
|---|---|---|
| Node.js | v18 or higher | nodejs.org |
| Salesforce CLI | latest | Install guide |
| Anthropic API key | — | console.anthropic.com |
| A Salesforce DX project | — | sf project generate -n my-project |
Verify your setup:
node --version # should print v18.x or higher
sf --version # should print @salesforce/cli/...npm installCreate your config file from the template:
cp .env.example .envEdit .env with your values:
# Your Anthropic API key (required)
ANTHROPIC_API_KEY=sk-ant-api03-...
# Absolute path to your Salesforce DX project (required)
SF_PROJECT_PATH=/Users/yourname/Projects/my-sfdx-project
# Default org alias used in generated sf CLI commands (optional)
TARGET_ORG=my-dev-org
# Server port (optional, default: 3001)
PORT=3001
⚠️ Never commit.env— it contains your API key. It is excluded by.gitignore.
If ANTHROPIC_API_KEY is already defined in your shell profile (~/.zshrc, ~/.bashrc, etc.), it may override the value in .env. Check with:
grep "ANTHROPIC_API_KEY" ~/.zshrc ~/.zprofile ~/.bashrc ~/.bash_profile 2>/dev/nullIf found, remove that line from your shell profile, then restart the server.
AgentKit requires two terminals running simultaneously.
Terminal 1 — Start the Express server:
npm run serverExpected output:
⚡ TDAD Local Server
──────────────────────────────────────────────
Project : /Users/yourname/Projects/my-sfdx-project
Specs dir : /Users/yourname/Projects/my-sfdx-project/specs
Target org : my-dev-org (detected from .sf/config.json)
Port : 3001
API key : ✓ set
sf CLI : @salesforce/cli/2.x.x darwin-arm64 node-v20.x.x
──────────────────────────────────────────────
✓ Server ready → http://localhost:3001
Terminal 2 — Start the React app:
npm run devOpen http://localhost:5173 in your browser.
The header confirms the connection: ● <project_name> · Project connected · N specs
Generate or edit the agentSpec.yaml that defines your agent's role, tone, and topics.
Via AI — fill in the form (agent type, company, role, tone, max topics) and let Claude generate the YAML. Refine iteratively with natural language instructions.
Via CLI Command — generates the full sf agent generate agent-spec command with all flags pre-filled. Run it directly from the UI with live output.
Edit Spec — paste an existing YAML or pick a file from your specs/ folder to load and edit it.
# Generated command example:
sf agent generate agent-spec \
--type customer \
--role "Handles booking info and cancellations" \
--company-name "SkyBlue Airlines" \
--tone casual \
--max-topics 3 \
--output-file specs/skyblue-spec.yamlCreate the .agent script from your spec.
Via CLI Command — generates sf agent generate authoring-bundle with bundle name, API name, and target org pre-filled.
Via Agent Skill — generates a ready-to-paste prompt for Claude Code (or any AI coding agent with MCP skill support) using the sf-ai-agentscript skill. The AI agent authors the full .agent file autonomously.
# Generated command example:
sf agent generate authoring-bundle \
--spec specs/skyblue-spec.yaml \
--name "SkyBlue Airlines Service Agent" \
--api-name SkyBlueAirlinesServiceAgent \
--target-org my-dev-orgValidate your .agent file before deploying.
Select your .agent file, enter the Agent API name (auto-detected), and run:
sf agent validate --agent-api-name SkyBlueAirlinesServiceAgent --target-org my-dev-orgSmoke test your agent locally without deploying using sf agent preview.
- Select your
.agentfile and target org - Step 1 — Start a preview session (generates
SESSION_ID) - Step 2 — Add utterances and send them one by one
- Step 3 — End the session and collect trace files
- Trace Analysis — automatically parses trace JSON files and displays per-utterance:
- Topic routing (TransitionStep)
- Actions invoked
- Tools visible to planner
- Agent response
- Grounding category, safety score, latency
sf agent preview start --bundle-name SkyBlueAirlinesServiceAgent --target-org my-dev-org
sf agent preview send --bundle-name SkyBlueAirlinesServiceAgent --session-id "$SESSION_ID" \
--utterance "I want to cancel my flight" --target-org my-dev-org
sf agent preview end --bundle-name SkyBlueAirlinesServiceAgent --session-id "$SESSION_ID" \
--target-org my-dev-orgPublish and activate your agent in the dev org.
sf agent publish authoring-bundle --api-name SkyBlueAirlinesServiceAgent --target-org my-dev-org
sf agent activate --api-name SkyBlueAirlinesServiceAgent --target-org my-dev-orgBoth commands are runnable directly from the UI with live streaming output.
The most comprehensive stage. Four tabs:
Generate a testSpec.yaml for your agent. Three modes:
Via CLI Command — interactive sf agent generate test-spec (prompts for test cases in terminal).
Via AI — select topics, set tests per topic, pick metrics, and let Claude generate structured test cases with expectedTopic, expectedActions, and expectedOutcome.
Via Gherkin (AI) — paste a Given / When / Then scenario to convert it to a test case.
Configure and run formal tests:
- Select
.agentfile andtestSpec.yaml - Choose target org (auto-detected or override)
- Select wait mode: async (get job ID) or sync (wait up to N minutes)
- Run
sf agent test createthensf agent test run - Fetch results by job ID with
sf agent test results
Generate a prompt for the sf-ai-agentforce-testing Agent Skill — a full autonomous test-fix-deploy cycle powered by Claude Code. The AI agent runs tests, diagnoses failures, applies fixes, and retries up to 3 times.
Visual history of all test runs for each agent and test suite, stored in formal-tests/:
formal-tests/
└── SkyBlueAgent/
└── SkyBlueAgent_General-testSpec/
├── Results/
│ ├── SkyBlueAgent-{RunId}-...-results.json
│ └── ...
└── FixLoop/
├── SkyBlueAgent-{RunId}-...-fix.json
└── ...
The history table shows:
- Pass rate per run with color coding (green ≥ 80%, amber ≥ 50%, red below)
- Average latency per run
- Δ vs previous — improvement or regression since last run
- Fix loop rows — expandable between runs showing files changed and field-level diffs
- Detailed view — per test case results with topic, actions, outcome, and metric scores
{
"schema_version": "1.0",
"iteration": 2,
"agent": "SkyBlueAgent",
"date": "2026-03-15",
"triggered_by_run_id": "4KBg80000000BoHGAU",
"issues": [
{
"test_id": "TC2",
"assertion": "actions_assertion",
"status": "FAIL",
"category": "TEST_SPEC_CORRECTION",
"agent_behavior_correct": true,
"description": "...",
"root_cause": "..."
}
],
"changes": [
{
"file": "tests/SkyBlueAgent_General-testSpec.yaml",
"type": "MODIFIED",
"description": "...",
"details": [
{ "field": "TC2.expectedActions", "old_value": "['go_booking_info']", "new_value": "[]" }
]
}
],
"expected_outcome": "TC2 fully resolved. Projected: 15/15 = 100%.",
"result": null
}Analyze your agent's behavior using Salesforce's STDM (Structured Test Data Model).
sf agent analyze --agent-api-name SkyBlueAirlinesServiceAgent --target-org my-dev-orgFull staging → production deployment pipeline with all commands pre-filled and ready to copy:
Staging:
sf agent publish authoring-bundle --api-name SkyBlueAirlinesServiceAgent --target-org skyblue-staging
sf agent activate --api-name SkyBlueAirlinesServiceAgent --target-org skyblue-staging
sf agent test run --name SkyBlueAirlinesServiceAgentTest --wait 10 --target-org skyblue-stagingProduction:
sf agent publish authoring-bundle --api-name SkyBlueAirlinesServiceAgent --target-org skyblue-prod
sf agent activate --api-name SkyBlueAirlinesServiceAgent --target-org skyblue-prod| Field | Required | Valid values |
|---|---|---|
agentType |
✅ | customer or internal only |
tone |
✅ | casual, formal, or neutral only |
subjectType |
✅ (testSpec) | AGENT only |
| Metric ID | What it measures |
|---|---|
coherence |
Response is logically consistent and easy to read |
completeness |
All aspects of the request are addressed |
conciseness |
Response is appropriately brief |
output_latency_milliseconds |
Response time in milliseconds |
instruction_following |
Agent follows its reasoning instructions (API/CLI only) |
factuality |
Response is factually accurate (API/CLI only) |
⚠️ Useoutput_latency_millisecondsexactly — notlatency. Wrong metric IDs are silently ignored by Salesforce.
| Category | Meaning | Fix |
|---|---|---|
TOPIC_NOT_MATCHED |
Agent routed to wrong topic | Improve topic description wording |
ACTION_NOT_INVOKED |
Expected action not called | Improve action description |
WRONG_ACTION_SELECTED |
Wrong action called | Differentiate action descriptions |
ACTION_INVOCATION_FAILED |
Action called but failed | Fix Flow or Apex logic |
TEST_SPEC_CORRECTION |
Spec was wrong, agent was correct | Update expectedTopic/expectedActions |
TEST_SPEC_IMPROVEMENT |
Spec needs enrichment (data, context) | Add org data or conversationHistory |
INFORMATIONAL |
Known metric false-negative | No change required |
The Express server isn't running. Start it: node tdad-server.js
A system environment variable is overriding your .env key.
Run grep "ANTHROPIC_API_KEY" ~/.zshrc ~/.zprofile and remove the line if found.
AgentKit looks for topic <name>: lines. Make sure your .agent file uses standard agentscript syntax.
- Check
SF_PROJECT_PATHin.envpoints to your SFDX project root - In the server terminal, confirm
Project : /your/pathat startup - For AiEvaluationDefinition files, retrieve them first:
sf project retrieve start --metadata AiEvaluationDefinition --target-org my-dev-org
- The server extracts Run IDs from filenames using pattern
4K[A-Za-z0-9]{13,} - Make sure your results files follow the naming convention:
{AgentName}-{RunId}-{SuiteName}-results.json - Salesforce sometimes writes incorrect
runIdinside the JSON — the server uses the filename as source of truth
- Check
triggered_by_run_idin your fix JSON matches the exact Run ID of the run that triggered the fix - Fix loops are displayed below the run they were triggered by (older run)
Restart Vite after changing vite.config.js: Ctrl+C then npm run dev
Set PORT=3002 in .env and update all proxy entries in vite.config.js to http://localhost:3002.
sf org login web --alias my-dev-orgSalesforce has a known bug where contextVariables in test cases trigger an INTERNAL_SERVER_ERROR: RETRY enum error. Remove all contextVariables from your test cases and embed context in conversationHistory instead.
| Concern | How AgentKit handles it |
|---|---|
| API key exposure | Key is server-side only — the browser never sees it |
| Shell injection | Only sf agent and sf project deploy commands are allowed |
| Path traversal | File writes are scoped to specs/ and tests/ only |
| Secrets in git | .env is in .gitignore by default |
# Verify before pushing:
cat .gitignore | grep .env
git ls-files | grep env # .env should NOT appearagentkit/
├── src/
│ ├── App.jsx # React UI — all components (7 stages)
│ └── main.jsx # Vite entry point
├── tdad-server.js # Express server
├── vite.config.js # Proxy config
├── package.json
├── index.html
├── .env # Your secrets — never commit
├── .env.example # Safe template — commit this
└── README.md
- Full 7-stage TDAD pipeline (Agent Spec → Production)
- Stage 03.5 Local Test with Trace Analysis panel
- Stage 05 Testing History with fix loop visualization
- Fix loop JSON format with field-level diffs
- Project name auto-detected from
sfdx-project.json - Run ID extraction from filename (tolerates Salesforce metadata bugs)
- Support for all Salesforce Run ID formats (
4K*)
Built with Claude · Designed for Salesforce Agentforce TDAD workflows