⚡ AgentKit — Designed for TDAD

Test-Driven Agent Development for Salesforce Agentforce
A local web app that guides you through the full 7-stage TDAD pipeline — from agent spec to production deployment — powered by Claude AI, connected directly to your Salesforce DX project.

Version: v1.0.0-beta

What is AgentKit?

AgentKit is a local UI for Test-Driven Agent Development (TDAD) on Salesforce Agentforce. It covers the full lifecycle from spec to production across 7 stages.

Instead of juggling CLI commands, YAML files, and terminal windows, AgentKit gives you:

📋 Stage 01 — Generate agentSpec.yaml via AI or CLI, edit inline, save to project
✍️ Stage 02 — Author your .agent script via CLI command or Agent Skill prompt for Claude Code
✅ Stage 03 — Validate your agent with sf agent validate
🧪 Stage 03.5 — Local Test with sf agent preview — send utterances, collect traces, analyze routing and actions without deploying
🚀 Stage 04 — Deploy to your dev org with publish + activate commands
🎯 Stage 05 — Formal Test — generate testSpec.yaml (via AI, CLI, or Gherkin), run tests, track fix loops, and visualize pass rate history
📊 Stage 06 — Observability — STDM analysis with sf agent analyze
🏭 Stage 07 — Production — staging + production deployment pipeline with all commands ready to copy

Everything runs locally. Your API key never leaves your machine.

Architecture

AgentKit uses a React frontend + local Express server pattern. The server acts as a secure proxy — your Anthropic API key stays server-side only.

Browser (React UI — Vite, port 5173)
         │
         │  HTTP / SSE
         ▼
Local Express server (port 3001)  ←── tdad-server.js
         │
         ├── POST /ai                  ──▶  Anthropic API  (Claude)
         ├── POST /files/save          ──▶  SFDX project / specs/ or tests/
         ├── GET  /files/specs         ◀──  lists saved YAML specs
         ├── GET  /files/agents        ◀──  lists .agent files in project
         ├── GET  /files/tests         ◀──  lists testSpec YAML files
         ├── GET  /files/aievals       ◀──  lists AiEvaluationDefinition XMLs
         ├── GET  /files/formal-tests  ◀──  structured test history
         ├── GET  /history/runs        ◀──  runs + fix loops per suite
         ├── GET  /history/report      ◀──  raw results JSON for a run
         ├── GET  /sf/run?cmd=...      ──▶  sf CLI (streamed via SSE)
         └── GET  /status              ◀──  project health + org detection

Prerequisites

Before installing AgentKit, make sure you have:

Tool	Version	Install
Node.js	v18 or higher	nodejs.org
Salesforce CLI	latest	Install guide
Anthropic API key	—	console.anthropic.com
A Salesforce DX project	—	`sf project generate -n my-project`

Verify your setup:

node --version        # should print v18.x or higher
sf --version          # should print @salesforce/cli/...

Installation

1. Create the Vite project

npm install

`.env`

Create your config file from the template:

cp .env.example .env

Edit .env with your values:

# Your Anthropic API key (required)
ANTHROPIC_API_KEY=sk-ant-api03-...

# Absolute path to your Salesforce DX project (required)
SF_PROJECT_PATH=/Users/yourname/Projects/my-sfdx-project

# Default org alias used in generated sf CLI commands (optional)
TARGET_ORG=my-dev-org

# Server port (optional, default: 3001)
PORT=3001

⚠️ Never commit .env — it contains your API key. It is excluded by .gitignore.

Watch out: shell environment conflicts

If ANTHROPIC_API_KEY is already defined in your shell profile (~/.zshrc, ~/.bashrc, etc.), it may override the value in .env. Check with:

grep "ANTHROPIC_API_KEY" ~/.zshrc ~/.zprofile ~/.bashrc ~/.bash_profile 2>/dev/null

If found, remove that line from your shell profile, then restart the server.

Running the app

AgentKit requires two terminals running simultaneously.

Terminal 1 — Start the Express server:

npm run server

Expected output:

⚡  TDAD Local Server
──────────────────────────────────────────────
  Project    : /Users/yourname/Projects/my-sfdx-project
  Specs dir  : /Users/yourname/Projects/my-sfdx-project/specs
  Target org : my-dev-org  (detected from .sf/config.json)
  Port       : 3001
  API key    : ✓ set
  sf CLI     : @salesforce/cli/2.x.x darwin-arm64 node-v20.x.x
──────────────────────────────────────────────

✓ Server ready → http://localhost:3001

Terminal 2 — Start the React app:

npm run dev

Open http://localhost:5173 in your browser.

The header confirms the connection: ● <project_name> · Project connected · N specs

The 7-Stage Pipeline

Stage 01 — Agent Spec

Generate or edit the agentSpec.yaml that defines your agent's role, tone, and topics.

Via AI — fill in the form (agent type, company, role, tone, max topics) and let Claude generate the YAML. Refine iteratively with natural language instructions.

Via CLI Command — generates the full sf agent generate agent-spec command with all flags pre-filled. Run it directly from the UI with live output.

Edit Spec — paste an existing YAML or pick a file from your specs/ folder to load and edit it.

# Generated command example:
sf agent generate agent-spec \
  --type customer \
  --role "Handles booking info and cancellations" \
  --company-name "SkyBlue Airlines" \
  --tone casual \
  --max-topics 3 \
  --output-file specs/skyblue-spec.yaml

Stage 02 — Authoring

Create the .agent script from your spec.

Via CLI Command — generates sf agent generate authoring-bundle with bundle name, API name, and target org pre-filled.

Via Agent Skill — generates a ready-to-paste prompt for Claude Code (or any AI coding agent with MCP skill support) using the sf-ai-agentscript skill. The AI agent authors the full .agent file autonomously.

# Generated command example:
sf agent generate authoring-bundle \
  --spec specs/skyblue-spec.yaml \
  --name "SkyBlue Airlines Service Agent" \
  --api-name SkyBlueAirlinesServiceAgent \
  --target-org my-dev-org

Stage 03 — Validation

Validate your .agent file before deploying.

Select your .agent file, enter the Agent API name (auto-detected), and run:

sf agent validate --agent-api-name SkyBlueAirlinesServiceAgent --target-org my-dev-org

Stage 03.5 — Local Test

Smoke test your agent locally without deploying using sf agent preview.

Select your .agent file and target org
Step 1 — Start a preview session (generates SESSION_ID)
Step 2 — Add utterances and send them one by one
Step 3 — End the session and collect trace files
Trace Analysis — automatically parses trace JSON files and displays per-utterance:
- Topic routing (TransitionStep)
- Actions invoked
- Tools visible to planner
- Agent response
- Grounding category, safety score, latency

sf agent preview start --bundle-name SkyBlueAirlinesServiceAgent --target-org my-dev-org
sf agent preview send --bundle-name SkyBlueAirlinesServiceAgent --session-id "$SESSION_ID" \
  --utterance "I want to cancel my flight" --target-org my-dev-org
sf agent preview end --bundle-name SkyBlueAirlinesServiceAgent --session-id "$SESSION_ID" \
  --target-org my-dev-org

Stage 04 — Deployment

Publish and activate your agent in the dev org.

sf agent publish authoring-bundle --api-name SkyBlueAirlinesServiceAgent --target-org my-dev-org
sf agent activate --api-name SkyBlueAirlinesServiceAgent --target-org my-dev-org

Both commands are runnable directly from the UI with live streaming output.

Stage 05 — Formal Test

The most comprehensive stage. Four tabs:

Test Spec

Generate a testSpec.yaml for your agent. Three modes:

Via CLI Command — interactive sf agent generate test-spec (prompts for test cases in terminal).

Via AI — select topics, set tests per topic, pick metrics, and let Claude generate structured test cases with expectedTopic, expectedActions, and expectedOutcome.

Via Gherkin (AI) — paste a Given / When / Then scenario to convert it to a test case.

Run Test

Configure and run formal tests:

Select .agent file and testSpec.yaml
Choose target org (auto-detected or override)
Select wait mode: async (get job ID) or sync (wait up to N minutes)
Run sf agent test create then sf agent test run
Fetch results by job ID with sf agent test results

Test & Fix

Generate a prompt for the sf-ai-agentforce-testing Agent Skill — a full autonomous test-fix-deploy cycle powered by Claude Code. The AI agent runs tests, diagnoses failures, applies fixes, and retries up to 3 times.

Testing History

Visual history of all test runs for each agent and test suite, stored in formal-tests/:

formal-tests/
└── SkyBlueAgent/
    └── SkyBlueAgent_General-testSpec/
        ├── Results/
        │   ├── SkyBlueAgent-{RunId}-...-results.json
        │   └── ...
        └── FixLoop/
            ├── SkyBlueAgent-{RunId}-...-fix.json
            └── ...

The history table shows:

Pass rate per run with color coding (green ≥ 80%, amber ≥ 50%, red below)
Average latency per run
Δ vs previous — improvement or regression since last run
Fix loop rows — expandable between runs showing files changed and field-level diffs
Detailed view — per test case results with topic, actions, outcome, and metric scores

Fix loop JSON format

{
  "schema_version": "1.0",
  "iteration": 2,
  "agent": "SkyBlueAgent",
  "date": "2026-03-15",
  "triggered_by_run_id": "4KBg80000000BoHGAU",
  "issues": [
    {
      "test_id": "TC2",
      "assertion": "actions_assertion",
      "status": "FAIL",
      "category": "TEST_SPEC_CORRECTION",
      "agent_behavior_correct": true,
      "description": "...",
      "root_cause": "..."
    }
  ],
  "changes": [
    {
      "file": "tests/SkyBlueAgent_General-testSpec.yaml",
      "type": "MODIFIED",
      "description": "...",
      "details": [
        { "field": "TC2.expectedActions", "old_value": "['go_booking_info']", "new_value": "[]" }
      ]
    }
  ],
  "expected_outcome": "TC2 fully resolved. Projected: 15/15 = 100%.",
  "result": null
}

Stage 06 — Observability

Analyze your agent's behavior using Salesforce's STDM (Structured Test Data Model).

sf agent analyze --agent-api-name SkyBlueAirlinesServiceAgent --target-org my-dev-org

Stage 07 — Production

Full staging → production deployment pipeline with all commands pre-filled and ready to copy:

Staging:

sf agent publish authoring-bundle --api-name SkyBlueAirlinesServiceAgent --target-org skyblue-staging
sf agent activate --api-name SkyBlueAirlinesServiceAgent --target-org skyblue-staging
sf agent test run --name SkyBlueAirlinesServiceAgentTest --wait 10 --target-org skyblue-staging

Production:

sf agent publish authoring-bundle --api-name SkyBlueAirlinesServiceAgent --target-org skyblue-prod
sf agent activate --api-name SkyBlueAirlinesServiceAgent --target-org skyblue-prod

Salesforce DX reference

Valid agentSpec values

Field	Required	Valid values
`agentType`	✅	`customer` or `internal` only
`tone`	✅	`casual`, `formal`, or `neutral` only
`subjectType`	✅ (testSpec)	`AGENT` only

Metrics

Metric ID	What it measures
`coherence`	Response is logically consistent and easy to read
`completeness`	All aspects of the request are addressed
`conciseness`	Response is appropriately brief
`output_latency_milliseconds`	Response time in milliseconds
`instruction_following`	Agent follows its reasoning instructions (API/CLI only)
`factuality`	Response is factually accurate (API/CLI only)

⚠️ Use output_latency_milliseconds exactly — not latency. Wrong metric IDs are silently ignored by Salesforce.

Fix loop failure categories

Category	Meaning	Fix
`TOPIC_NOT_MATCHED`	Agent routed to wrong topic	Improve topic description wording
`ACTION_NOT_INVOKED`	Expected action not called	Improve action description
`WRONG_ACTION_SELECTED`	Wrong action called	Differentiate action descriptions
`ACTION_INVOCATION_FAILED`	Action called but failed	Fix Flow or Apex logic
`TEST_SPEC_CORRECTION`	Spec was wrong, agent was correct	Update expectedTopic/expectedActions
`TEST_SPEC_IMPROVEMENT`	Spec needs enrichment (data, context)	Add org data or conversationHistory
`INFORMATIONAL`	Known metric false-negative	No change required

Official documentation

Troubleshooting

🔴 "Server offline" in the header

The Express server isn't running. Start it: node tdad-server.js

❌ AI generation fails — "invalid x-api-key"

A system environment variable is overriding your .env key.
Run grep "ANTHROPIC_API_KEY" ~/.zshrc ~/.zprofile and remove the line if found.

❌ Topics not detected in `.agent` file

AgentKit looks for topic <name>: lines. Make sure your .agent file uses standard agentscript syntax.

❌ Files not found in picklist

Check SF_PROJECT_PATH in .env points to your SFDX project root
In the server terminal, confirm Project : /your/path at startup

For AiEvaluationDefinition files, retrieve them first:

sf project retrieve start --metadata AiEvaluationDefinition --target-org my-dev-org

❌ Run ID not showing in Testing History

The server extracts Run IDs from filenames using pattern 4K[A-Za-z0-9]{13,}
Make sure your results files follow the naming convention: {AgentName}-{RunId}-{SuiteName}-results.json
Salesforce sometimes writes incorrect runId inside the JSON — the server uses the filename as source of truth

❌ Fix loop not appearing between runs

Check triggered_by_run_id in your fix JSON matches the exact Run ID of the run that triggered the fix
Fix loops are displayed below the run they were triggered by (older run)

❌ Vite proxy not working — getting HTML instead of JSON

Restart Vite after changing vite.config.js: Ctrl+C then npm run dev

❌ Port 3001 already in use

Set PORT=3002 in .env and update all proxy entries in vite.config.js to http://localhost:3002.

❌ `sf` commands fail — "not authenticated"

sf org login web --alias my-dev-org

❌ contextVariables in testSpec cause RETRY crash

Salesforce has a known bug where contextVariables in test cases trigger an INTERNAL_SERVER_ERROR: RETRY enum error. Remove all contextVariables from your test cases and embed context in conversationHistory instead.

Security

Concern	How AgentKit handles it
API key exposure	Key is server-side only — the browser never sees it
Shell injection	Only `sf agent` and `sf project deploy` commands are allowed
Path traversal	File writes are scoped to `specs/` and `tests/` only
Secrets in git	`.env` is in `.gitignore` by default

# Verify before pushing:
cat .gitignore | grep .env
git ls-files | grep env   # .env should NOT appear

Project structure

agentkit/
├── src/
│   ├── App.jsx              # React UI — all components (7 stages)
│   └── main.jsx             # Vite entry point
├── tdad-server.js           # Express server
├── vite.config.js           # Proxy config
├── package.json
├── index.html
├── .env                     # Your secrets — never commit
├── .env.example             # Safe template — commit this
└── README.md

Changelog

v1.0.0-beta

Full 7-stage TDAD pipeline (Agent Spec → Production)
Stage 03.5 Local Test with Trace Analysis panel
Stage 05 Testing History with fix loop visualization
Fix loop JSON format with field-level diffs
Project name auto-detected from sfdx-project.json
Run ID extraction from filename (tolerates Salesforce metadata bugs)
Support for all Salesforce Run ID formats (4K*)

Built with Claude · Designed for Salesforce Agentforce TDAD workflows

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
public		public
src		src
.gitignore		.gitignore
App.jsx		App.jsx
README.md		README.md
env.example		env.example
eslint.config.js		eslint.config.js
gitignore		gitignore
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
tdad-server.js		tdad-server.js
vite.config.js		vite.config.js

Folders and files

Latest commit

History

Repository files navigation

⚡ AgentKit — Designed for TDAD

Table of Contents

What is AgentKit?

Architecture

Prerequisites

Installation

1. Create the Vite project

.env

Watch out: shell environment conflicts

Running the app

The 7-Stage Pipeline

Stage 01 — Agent Spec

Stage 02 — Authoring

Stage 03 — Validation

Stage 03.5 — Local Test

Stage 04 — Deployment

Stage 05 — Formal Test

Test Spec

Run Test

Test & Fix

Testing History

Fix loop JSON format

Stage 06 — Observability

Stage 07 — Production

Salesforce DX reference

Valid agentSpec values

Metrics

Fix loop failure categories

Official documentation

Troubleshooting

🔴 "Server offline" in the header

❌ AI generation fails — "invalid x-api-key"

❌ Topics not detected in .agent file

❌ Files not found in picklist

❌ Run ID not showing in Testing History

❌ Fix loop not appearing between runs

❌ Vite proxy not working — getting HTML instead of JSON

❌ Port 3001 already in use

❌ sf commands fail — "not authenticated"

❌ contextVariables in testSpec cause RETRY crash

Security

Project structure

Changelog

v1.0.0-beta

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`.env`

❌ Topics not detected in `.agent` file

❌ `sf` commands fail — "not authenticated"

Packages