ivanmkc · ivanmkc · Jul 3, 2026 · Jul 4, 2026 · Jul 4, 2026 · Jul 4, 2026
diff --git a/.github/workflows/agent-compat.yml b/.github/workflows/agent-compat.yml
@@ -0,0 +1,84 @@
+name: agent-compat
+
+# Verify external coding-agent CLIs (Claude Code, Gemini CLI, Antigravity) can drive
+# the termchart CLI to push valid boards to a live viewer. Orchestrated by
+# agent-generator's benchmark-runner; the real logic lives in
+# scripts/experiments/agent-compat/ (run.sh + definitions/).
+#
+# Runnable in CI (dispatch) and locally via act+podman
+# (scripts/experiments/agent-compat/run_locally.sh) — cloud-auth steps are ACT-gated.
+
+on:
+  workflow_dispatch:
+    inputs:
+      backends:
+        description: "Space-separated backend sets"
+        default: "tc-claude tc-gemini tc-antigravity"
+      case_set:
+        description: "Case set (pattern ^TC-)"
+        default: "termchart-compat"
+
+jobs:
+  compat:
+    runs-on: ubuntu-latest
+    env:
+      AGENT_GENERATOR_DIR: ${{ vars.AGENT_GENERATOR_DIR || '/home/ivanmkc/agent-generator' }}
+      GOOGLE_CLOUD_PROJECT: ${{ vars.GCP_PROJECT_ID || 'adk-coding-agents' }}
+      ANTHROPIC_VERTEX_PROJECT_ID: ${{ vars.GCP_PROJECT_ID || 'adk-coding-agents' }}
+      CASE_SET: ${{ inputs.case_set }}
+    steps:
+      - uses: actions/checkout@v4
+
+      - uses: actions/setup-node@v4
+        with: { node-version: "20" }
+      - uses: actions/setup-python@v5
+        with: { python-version: "3.11" }
+
+      - name: Install uv
+        run: curl -LsSf https://astral.sh/uv/install.sh | sh
+
+      - name: Install agent CLIs + termchart
+        run: |
+          npm i -g @anthropic-ai/claude-code @google/gemini-cli @ivanmkc/termchart
+          # Antigravity (agy) is not public; run_locally.sh bind-mounts the host binary
+          # under act. In hosted CI the tc-antigravity backend is skipped (agy absent).
+
+      - name: Install benchmark-runner (agent-generator)
+        run: |
+          if [ -d "$AGENT_GENERATOR_DIR" ]; then
+            (cd "$AGENT_GENERATOR_DIR" && uv tool install --prerelease allow .)
+          else
+            echo "::warning::AGENT_GENERATOR_DIR not present; provide agent-generator (bind-mount under act, or clone in CI)"
+            exit 1
+          fi
+          echo "$HOME/.local/bin" >> "$GITHUB_PATH"
+
+      - name: Install skills into the coding agents (agents-cli setup)
+        run: |
+          # Installs skills into ~/.claude/skills, ~/.gemini/extensions, ~/.agents/skills —
+          # the locations the simulator sandbox mounts. termchart's own skills are then
+          # installed on top by run.sh (from plugin/skills/), since termchart ships them as
+          # a plugin that doesn't land in those dirs.
+          if command -v agents-cli >/dev/null 2>&1; then
+            agents-cli setup --skip-auth || echo "::warning::agents-cli setup failed (continuing; run.sh still installs termchart skills)"
+          else
+            echo "::warning::agents-cli not found; skipping general skill setup (run.sh still installs termchart skills)"
+          fi
+
+      # Hosted CI: authenticate to GCP via WIF (skipped under act — uses mounted ADC).
+      - name: GCP auth (CI only)
+        if: ${{ !env.ACT }}
+        uses: google-github-actions/auth@v3
+        with:
+          workload_identity_provider: ${{ secrets.WIF_PROVIDER }}
+          service_account: ${{ secrets.WIF_SERVICE_ACCOUNT }}
+
+      # Local act: verify a host ADC was mounted (models are Vertex-routed).
+      - name: Verify ADC (act only)
+        if: ${{ env.ACT }}
+        run: |
+          test -f "$HOME/.config/gcloud/application_default_credentials.json" \
+            || { echo "Mount host ADC: run via run_locally.sh"; exit 1; }
+
+      - name: Run agent-compat
+        run: bash scripts/experiments/agent-compat/run.sh ${{ inputs.backends }}
diff --git a/scripts/experiments/agent-compat/README.md b/scripts/experiments/agent-compat/README.md
@@ -0,0 +1,85 @@
+# agent-compat — do Claude Code, Gemini CLI & Antigravity play nice with termchart?
+
+Runs each external coding-agent CLI **headless** on termchart tasks and verifies, via a
+live viewer, that each one:
+
+1. **can drive the `termchart` CLI** to push a valid board (smoke cases), and
+2. **activates the right journey/recipe** — given a *scenario only* (never told the
+   diagram type), does it pick the correct termchart type for the job? (journey cases)
+
+Orchestrated by [agent-generator]'s `benchmark-runner` (interactive-simulation cases,
+`output_format: direct` — prompts go straight to the CLI backend).
+
+## How it works
+
+```
+run.sh
+ ├─ build + run the repo's termchart CLI:  termchart serve  → local viewer, capture URL+token
+ ├─ PATH wrappers (under $HOME, NOT /tmp — the sandbox tmpfs-masks /tmp):
+ │    termchart   → inject viewer URL/token, exec the repo CLI
+ │    antigravity → exec `agy --dangerously-skip-permissions`  (harness calls `antigravity -p …`; agy -p == --print)
+ ├─ for each backend {claude, gemini-cli, antigravity}:
+ │    (antigravity: preflight agy auth under a fresh HOME → SKIP if it can't auth)
+ │    termchart clear --all         → fresh viewer state
+ │    benchmark-runner --config-dir definitions --case-set <set> --generator-set <backend>
+ └─ summary (PASS / FAIL / SKIP per backend)
+```
+
+**Why a `termchart` wrapper, not env vars:** agent-generator's sandbox forwards `PATH`
+to the agent but not arbitrary env (`TERMCHART_VIEWER_URL/TOKEN`). Baking the viewer
+config into a PATH wrapper lets every agent run `termchart push` with zero credential
+handling. It must live under `$HOME` (ro-bound into the sandbox), not `/tmp` (the sandbox
+mounts a fresh tmpfs over `/tmp`, which would hide it).
+
+**Verification is deterministic, scope-specific, and trace-independent.** Each case pins
+a unique scope (`--project … --agent <case>`) but lets the agent choose the `--type`;
+a `command` objective then checks *that scope's* board has the prescribed type via
+`termchart list` (e.g. `/compare …[component]`). Scope-specific ⇒ no cross-case
+contamination / false passes. Trace-independent ⇒ it's the only fair way to score
+**Antigravity**, whose harness produces no tool trace (the `cli_command` objective, which
+needs a trace, is `required: false`).
+
+## Cases (`definitions/cases/`)
+
+| Case set | Cases | Tests |
+|---|---|---|
+| `termchart-smoke` | `TC-FLOW-001`, `TC-COMPONENT-001` | Prompt names the type — can the agent drive `termchart push` at all |
+| `termchart-journeys` | `TC-JOURNEY-{COMPARE,ARCH,METRICS,ER,DASHBOARD}-001` | **Scenario only** — does the agent pick the right type/journey (component / flow / vegalite / flow / panes\|component) |
+| `termchart-compat` | all `TC-*` | smoke + journeys |
+
+Every case seeds one shared `AGENTS.md` "choose the diagram" decision guide (auto-read by
+claude/gemini/agy), so all backends have the same recipe knowledge — the test is whether
+they **activate** the right journey, not whether they know termchart exists.
+
+## Run it
+
+```bash
+scripts/experiments/agent-compat/run.sh                          # all backends, all cases
+CASE_SET=termchart-journeys scripts/experiments/agent-compat/run.sh tc-claude tc-gemini
+scripts/experiments/agent-compat/run_locally.sh                  # via GitHub Actions + act+podman
+```
+
+## Requirements
+
+- `claude`, `gemini`, `agy`, `node` on PATH (run.sh warns if any are missing).
+- `benchmark-runner` (`cd <agent-generator> && uv tool install --prerelease allow .`) or `uv` on PATH.
+- GCP ADC — models are Vertex-routed (`gcloud auth application-default login`).
+- The repo's termchart CLI + viewer are built automatically on first run.
+- `AGENT_GENERATOR_DIR` (default `/home/ivanmkc/agent-generator`).
+
+## Backend status & caveats
+
+- **Claude Code, Gemini CLI — supported and verified.** Both drive termchart correctly;
+  auth is forwarded into the sandbox via env (Vertex).
+- **Antigravity (agy) — SKIPPED by default (auth limitation).** agy authenticates via a
+  login stored in `$HOME`, but agent-generator's sandbox forces `HOME=/tmp`, so agy can't
+  authenticate inside a case (its creds are neither ADC- nor env-based, so they can't be
+  forwarded the way Claude/Gemini creds are). run.sh detects this and marks `tc-antigravity`
+  **SKIP** instead of a misleading FAIL. The wrapper + deterministic check are correct and
+  will work once agy's auth is available in the sandbox — e.g. by extending agent-generator's
+  (stub) antigravity harness `get_sandbox_auth` to mount agy's credential dir, or running
+  agy's backend outside the sandbox. Its harness also extracts no tool trace (stub), so only
+  the deterministic viewer check applies to it.
+- Add a case by dropping a `TC-*` interactive-simulation YAML in `definitions/cases/`.
+
+[agent-generator]: https://github.com/ivanmkc/agent-generator