diff --git a/docs/DEMO_SCRIPT.md b/docs/DEMO_SCRIPT.md new file mode 100644 index 0000000..179b7a7 --- /dev/null +++ b/docs/DEMO_SCRIPT.md @@ -0,0 +1,342 @@ +# Demo Recording Script — Revka Track 3 (2:00, three-window) + +A 2-minute cut built around a **three-window layout**: + +- **`[DASH]` — Revka dashboard (main window):** the governed orchestration — + steps lighting up, the human approval gates. The *product*. +- **`[LOGS]` — Cloud Run monitor:** the agents actually executing on Google + Cloud — A2A task calls, ADK/Vertex reasoning, clone/test/PR. The *proof it's + on GCP*. +- **`[GH]` — GitHub:** the demo repo — the issue, then the PR appearing, the + diff, the merge, the close. The *real-world outcome*. + +The real run takes ~6 min; the final video is 2:00 via continuous narration, +auto-approved gates (no clicking dead air), and time-lapsing the coder segment. + +## Before you hit record (prep, off-camera) + +1. **Token + env** (durable; reuse across takes): + ```bash + REVKA_ADMIN_TOKEN= DEVICE=demo bash scripts/cloud-paircode.sh + export BT=rk_... URL=https://revka-orchestrator-n22ujw2j2a-uc.a.run.app + ``` +2. **`[DASH]`** open `$URL/`, paste the token, go to Workflow Runs. +3. **`[LOGS]`** Cloud Run monitor. Pick one: + - **Console (best visual):** Logs Explorer → paste the query → **Stream logs** on: + ``` + resource.type="cloud_run_revision" + resource.labels.service_name=("revka-orchestrator" OR "coder-agent" OR "reviewer-agent") + ``` + - **Terminal (fallback):** `bash scripts/demo-logs.sh` — colored, service-tagged + (cyan=orchestrator, green=coder, magenta=reviewer). +4. **`[GH]`** open the demo repo: `github.com/KumihoIO/google-agentops-demo` — + start on the Issues tab. +5. **Pre-write the fresh issue** (additive, testable, distinct each take), e.g. + *"Add `apply_percentage_discount(items, percent_off)` to `cart.py`."* +6. **Auto-approver** ready in a hidden terminal (script at the bottom). + +> Editing tip: record the full ~6-min session once; in your editor speed the +> coder segment to fit 0:48→1:16. Narration runs continuously over the cut. + +--- + +## Final-cut timeline (2:00) + +### 0:00 – 0:12 · Hook +- **[DASH]** dashboard idle (main). **[LOGS]** and **[GH]** visible alongside. +- **VO:** "Revka resolves GitHub issues autonomously — with human approval gates + — using a multi-agent system running entirely on Google Cloud. Dashboard's the + orchestrator, the logs are the agents on Cloud Run, and GitHub is where the + work lands." + +### 0:12 – 0:26 · New issue → auto-trigger (live) +- **[GH]** open the fresh feature issue, then add the **`revka`** label. + The repo's `revka-issue-trigger.yml` GitHub Action fires automatically and + POSTs to the Cloud Run orchestrator (bearer token in repo secrets) — no manual + command. +- **[DASH]** the new run appears in Workflow Runs on its own. +- **[LOGS]** first orchestrator lines scroll in. +- **VO:** "I open a feature request and label it for Revka. A GitHub Action + triggers the pipeline on Cloud Run automatically — no human touches the code." + +> The label trigger is the headline flow (issue → GitHub Action → Cloud Run). A +> manual `POST /api/workflows/run/...` (commands below) is the fallback if you'd +> rather not depend on the Action timing on camera. + +### 0:26 – 0:42 · Assess + AgentOps preflight + Gate 1 +- **[DASH]** `assess_issue` ✓ → `agentops_preflight` ✓ (click it: flash + `a2a_discovery_status: discovered`) → pause at gate 1, auto-approves. +- **[LOGS]** orchestrator: A2A discovery + identity-token mint. +- **VO:** "It plans the fix on Gemini, proves the Google AgentOps integration by + discovering the control plane over A2A, then pauses for human approval." + +### 0:42 – 1:16 · Coder agent works → PR (TIME-LAPSE) +- **[DASH]** `deploy_coder_agent` running (sped up). +- **[LOGS]** **green** coder lines — *the money shot, point at it:* A2A task + received → git clone → Vertex/Gemini → pytest → "PR opened". +- **[GH]** the new PR appears (`fix/issue-`); open the diff briefly. +- **VO:** "Watch the logs: the ADK coder agent on Cloud Run takes the A2A task, + clones the repo, reasons on Gemini via Vertex, writes the code and a test, runs + pytest, and opens this pull request — all on Google Cloud." + +### 1:16 – 1:34 · Review + Gate 2 +- **[DASH]** `review_pr` ✓ (flash verdict) → gate 2 auto-approves. +- **[LOGS]** **magenta** reviewer lines: diff fetched, verdict returned. +- **VO:** "A separate reviewer agent — its own identity — reviews the diff over + A2A. Then the second human gate before merge." + +### 1:34 – 1:52 · Merge + close (payoff) +- **[DASH]** `merge_and_close` ✓, run **completed**. +- **[LOGS]** green coder lines: merge + issue close. +- **[GH]** PR flips to **Merged**, issue to **Closed**; new function on `main`. +- **VO:** "The coder merges and closes the issue. A real feature, shipped + autonomously, governed by humans, entirely on Google Cloud." + +### 1:52 – 2:00 · Tagline +- **[DASH]** run all green, **[GH]** closed issue, **[LOGS]** final lines. +- **VO:** "Cloud-native runtime, Gemini intelligence, A2A interoperability. + Track 3, end to end." + +--- + +## Automatic trigger (headline flow) + +The demo repo's `revka-issue-trigger.yml` Action posts to Cloud Run when an issue +is labeled `revka`. Repo secrets (set once): + +- `REVKA_GATEWAY_URL` = `https://revka-orchestrator-n22ujw2j2a-uc.a.run.app` +- `REVKA_BEARER_TOKEN` = the stable admin token (`revka-GATEWAY_ADMIN_TOKEN`) + +On camera, just open the issue and add the label: + +```bash +gh issue create -R KumihoIO/google-agentops-demo \ + --title "Feature: percentage discount on cart subtotal" \ + --body "Add apply_percentage_discount(items, percent_off) to src/agentops_demo/cart.py returning the discounted subtotal in cents (round to nearest cent), with a regression test." \ + --label revka +``` + +## Manual trigger (fallback) + +```bash +# [GH] create the fresh issue (no label) +gh issue create -R KumihoIO/google-agentops-demo \ + --title "Feature: percentage discount on cart subtotal" \ + --body "Add apply_percentage_discount(items, percent_off) to src/agentops_demo/cart.py returning the discounted subtotal in cents (round to nearest cent), with a regression test." + +# [DASH] trigger (replace with the new issue number) +ISSUE=$(curl -s https://api.github.com/repos/KumihoIO/google-agentops-demo/issues/) +RUN=$(curl -s -X POST "$URL/api/workflows/run/github-issue-resolver" \ + -H "Authorization: Bearer $BT" -H "Content-Type: application/json" \ + -d "$(python3 -c 'import json,sys;print(json.dumps({"inputs":{"repo_name":"KumihoIO/google-agentops-demo","github_payload":sys.argv[1],"track3_a2a_url":"https://construct-agentops-a2a-1091585228963.us-central1.run.app"}}))' "$ISSUE")" \ + | python3 -c 'import json,sys;print(json.load(sys.stdin)["run_id"])') +echo "run: $RUN" + +# [LOGS] Cloud Run monitor (terminal fallback; prefer Console Logs Explorer) +bash scripts/demo-logs.sh + +# hidden terminal — hands-free gate approver (run right after triggering) +while true; do + S=$(curl -s -H "Authorization: Bearer $BT" "$URL/api/workflows/runs/$RUN" \ + | python3 -c 'import json,sys;print(json.load(sys.stdin).get("run",{}).get("status"))') + [ "$S" = "paused" ] && curl -s -X POST -H "Authorization: Bearer $BT" \ + -H "Content-Type: application/json" -d '{"approved":true,"feedback":"demo"}' \ + "$URL/api/workflows/runs/$RUN/approve" >/dev/null + case "$S" in completed|failed|cancelled) break;; esac + sleep 8 +done +``` + +## Presenter narration (full pitch — recommended) + +This is the **presenter's script**, not a voice-over. You're making a case: +*here's the problem enterprises have with autonomous agents, here's how Revka +solves it, here's why it's different, and here it is doing it for real.* The +demo is your evidence, not your subject. ~620 words, ~4:00 at a measured keynote +pace. Section headers are beats, not on-screen text. (For a tight 2:00 cut, use +the compressed voice-over below instead.) + +### The problem — why enterprises can't ship autonomous agents *(~0:00–0:40)* + +> Every enterprise wants the same thing from AI: take the multi-step work that +> consumes engineering, security, and operations teams — and let it run itself. +> Almost none of them have. And it isn't because the agents aren't capable. +> +> It's because a raw autonomous agent is a black box. It acts without oversight. +> It forgets everything the moment a task ends. And it leaves no trail you could +> ever show an auditor. You cannot put that in front of production — not in +> finance, not in security, not anywhere a wrong move is expensive and +> irreversible. The capability arrived years ago. The *trust* didn't. + +### What Revka is *(~0:40–1:15)* + +> Revka closes that gap. It's a platform for building autonomous workflows an +> enterprise can actually turn on. +> +> You define the work **visually** — as a graph of steps, not a hidden prompt. +> Agents execute it. And at every decision that matters, a human approves. There +> is no black box: you see the workflow, you govern it, and every single step is +> recorded. It runs on your own cloud — here, entirely on Google Cloud — where +> each agent holds its own cryptographic identity and reasons on Gemini through +> Vertex AI, with no API keys anywhere in the system. + +### Why Revka is different — memory and the record *(~1:15–2:05)* + +> But governance isn't what sets Revka apart. Two things do. +> +> The first is **memory**. Most agents are amnesiac — or they bolt on a flat pile +> of vectors and call it memory. Revka's memory — we call it Kumiho — is +> **graph-native**: every workflow's outputs, decisions, and context are stored +> and versioned as a connected knowledge graph. The system doesn't just *act* — +> it remembers what was done, why, and how +> it relates to everything before it, and it grounds the next run in that +> accumulated understanding of your organization. Every workflow makes the next +> one smarter. +> +> The second is the **record**. Revka's infrastructure captures every workflow +> output — every agent action, every approval, every result — as a durable, +> attributable audit trail. Graph-native memory plus a complete record of every +> run: *that* is what turns "autonomous" from a liability your compliance team +> vetoes into an asset you can defend to a regulator. + +### See it run — one workflow of many *(~2:05–3:35)* + +> Let me show you one. This workflow resolves a software issue, end to end. +> +> I trigger it by labeling the issue — and it appears in the dashboard, running +> on its own. It assesses the work, verifies a partner agent over the A2A +> protocol, and then stops at a human gate before touching anything. I approve. ‖ +> Now the coder agent — built with the Agent Development Kit, reasoning on Gemini +> — takes the task over A2A, does the work on Cloud Run, and opens this pull +> request. No human wrote that code. ‖ Then an independent reviewer agent — running +> on Vertex AI Agent Engine — checks it, grounded in our coding standards +> retrieved from Vertex AI Search, and cites the exact rule it applies. A second +> human gate. I approve. ‖ The agents merge the change and close the issue — agent +> to agent, on Google Cloud. And everything you just watched is now versioned in +> Kumiho and in the audit trail. + +### Close — the platform behind it *(~3:35–4:00)* + +> That was one workflow. The same engine runs a security audit, a data pipeline, +> a compliance review, a research task — anything you can define as steps. Revka +> is the platform underneath: visual, governed, auditable, and grounded in memory +> that compounds. That's the difference between an agent that's impressive in a +> demo — and one an enterprise can actually deploy. + +**Delivery notes:** the first two beats (*problem* and *what Revka is*) can run +over a slow pan of the dashboard or a title card — you don't need live action +until "See it run." Let the *differentiation* beat breathe; it's the argument +the judges remember. The four `‖` marks in the demo beat are your approval/cut +points. If you must trim to ~3:00, cut the close to one sentence and tighten the +problem to its first and last lines — never cut the memory/record beat. + +## Voice-over transcript — tight 2:00 cut (alternative) + +Continuous narration for the 2:00 cut — ~300 words, ~150 wpm. Read at a measured +demo pace; pause briefly at each `‖`. Timestamps are cut points, not hard cues. + +> **[0:00]** This is Revka — a platform for building autonomous, auditable +> workflows for real enterprise work. You define them visually, as steps on a +> canvas; agents execute them; and a human governs the decisions that matter. ‖ +> It all runs on Google Cloud. +> +> **[0:14]** What you're about to see is *one* workflow — resolving a software +> issue, end to end. But the same engine runs security audits, data pipelines, +> research, operations — any multi-step process you can define. ‖ Every agent +> here is a Cloud Run service with its own identity, reasoning on Gemini through +> Vertex AI. No API keys. +> +> **[0:36]** I start this workflow by labeling an issue — a GitHub Action triggers +> it, and it appears in the dashboard, running on its own. ‖ It assesses the +> work, verifies a partner agent over the A2A protocol, and then pauses for human +> approval before anything changes. I approve. +> +> **[0:58]** Now the coder agent — built with the Agent Development Kit, reasoning +> on Gemini — takes the task over A2A, does the work on Cloud Run, and opens a +> pull request. ‖ No human wrote that code. +> +> **[1:18]** Then an independent reviewer agent — on Vertex AI Agent Engine — +> checks it, grounded in our coding standards retrieved from Vertex AI Search, and +> cites the exact rule it applies. ‖ A second human gate, before merge. I approve. +> +> **[1:38]** The agents merge the change and close the issue — agent to agent, +> entirely on Google Cloud. The work was done autonomously, ‖ but every step is +> versioned in Kumiho, our graph memory — a complete, auditable trail. +> +> **[1:52]** That's one workflow. Revka is the platform behind it — visual, +> governed, auditable, multi-agent — turning everyday enterprise problems into +> autonomous workflows on Google Cloud. + +**Pacing notes:** the framing bookends ([0:00] and [1:52]) carry the message — +*Revka is the platform; issue resolution is one example.* If you run long, trim +the partner-agent clause in [0:36] and the "No human wrote that code" aside. If +short, hold on the PR diff (0:58–1:18) and the merged/closed GitHub state +(1:38–1:52). The two "I approve" beats land as you click (or as the auto-approver +fires). + +## On-screen captions (lower-thirds) + +Short overlays synced to the beats — keep each ≤ 6 words, sans-serif, lower +third, ~3s on screen. They carry the mandate keywords even with the sound off. + +| In → Out | Caption | +| --- | --- | +| 0:00 → 0:06 | **Revka — autonomous workflow platform** | +| 0:07 → 0:13 | *Visual · governed · auditable · on Google Cloud* | +| 0:16 → 0:22 | *One workflow of many — issue resolution* | +| 0:13 → 0:18 | **New GitHub issue → triggered** | +| 0:27 → 0:33 | **Plans the fix on Gemini (Vertex AI)** | +| 0:34 → 0:40 | **A2A preflight: control plane discovered** | +| 0:40 → 0:44 | **⏸ Human approval gate 1** | +| 0:45 → 0:52 | **ADK coder agent · Cloud Run** | +| 0:53 → 1:00 | *clone → Gemini → test → open PR* | +| 1:05 → 1:12 | **Pull request opened — by the agent** | +| 1:17 → 1:24 | **ADK reviewer agent · A2A** | +| 1:25 → 1:30 | **⏸ Human approval gate 2** | +| 1:35 → 1:42 | **Merged ✓ · Issue closed ✓** | +| 1:43 → 1:50 | *Shipped autonomously, governed by humans* | +| 1:53 → 2:00 | **Cloud Run · Gemini · A2A — Track 3** | + +Persistent corner tags (tiny, top-left, whole video) reinforce which window is +which: `DASHBOARD` / `CLOUD RUN LOGS` / `GITHUB`. + +## Pinned description (paste under the video) + +> **Revka — Google for Startups AI Agents Challenge, Track 3.** +> Revka is a platform for building **autonomous, auditable, visually-defined +> workflows** that run on Google Cloud — for software, security, data, research, +> operations, and more. Workflows are multi-agent and human-governed: agents +> reason on Gemini, coordinate over A2A, and pause at approval gates a human +> controls, with every step recorded as an audit trail. +> +> This video shows **one example workflow** — resolving a software issue end to +> end. Labeling the issue triggers it; an orchestrator on Cloud Run assesses the +> work and verifies the Google AgentOps control plane over A2A; after a human +> gate, an **ADK coder agent** (Gemini via Vertex AI, on Cloud Run) does the work +> and opens a pull request; an **ADK reviewer agent**, grounded in Vertex AI +> Search, reviews it; and after a second human gate, the change is merged and the +> issue closed — no human touches the code. +> +> **Left:** the Revka dashboard (visual workflow + human gates). **Top-right:** +> live Cloud Run logs (the agents executing on GCP). **Bottom-right:** the work +> target on GitHub. +> +> Stack: Cloud Run · Vertex AI (Gemini 2.5 Pro) · Agent Development Kit · A2A +> protocol · Vertex AI Search grounding · Workload Identity Federation (keyless) · +> per-agent service identities. No API keys; reasoning authenticated by each +> service's own account. + +## Window layout suggestion +- **Dashboard** large on the left (≈55% width). +- **Cloud Run logs** top-right; **GitHub** bottom-right (≈45% width, stacked). +- Camera/recording at 1080p+ so the log text is legible. + +## Notes +- **Use a different feature each take** so the coder always has a real diff. +- Prefer the **Console Logs Explorer (streaming)** for `[LOGS]` — live and + formatted; `demo-logs.sh` polls every 3s as a fallback. +- After any **orchestrator redeploy**, re-pair (`scripts/cloud-paircode.sh`); run + `gcloud auth login` first if the log-read step fails. +- The coder runs one task at a time — don't fire two runs at once. +- Want the gates clicked **on camera** instead of auto-approved? Skip the + approver and click **Approve** in `[DASH]` at ~0:38 and ~1:18. diff --git a/docs/ENTERPRISE_ROADMAP.md b/docs/ENTERPRISE_ROADMAP.md new file mode 100644 index 0000000..af69f2d --- /dev/null +++ b/docs/ENTERPRISE_ROADMAP.md @@ -0,0 +1,151 @@ +# Revka — Enterprise & Cloud Architecture Roadmap + +The Track 3 submission proves the **loop closes**: a GitHub issue is resolved +end-to-end by a governed, multi-agent system on Google Cloud — assess → AgentOps +A2A preflight → human gate → ADK/Gemini coder PR → grounded reviewer → human gate +→ merge → close, with per-agent identity and no API keys. This document is the +roadmap from that demonstrable engine to a sellable enterprise platform. + +> Scope note: nothing here is required for the submission. The deployed Cloud +> Run build satisfies all four Track 3 mandates today. This is the evolution. + +--- + +## 1. Runtime: GKE for the stateful core, Cloud Run / Agent Engine for agents + +Track 3 permits **Cloud Run or GKE**. Cloud Run is the right zero-ops choice for +the demo; **GKE is the right enterprise target** for three concrete reasons the +demo surfaced: + +| Need | Why Cloud Run falls short | GKE answer | +| --- | --- | --- | +| **Durable, auditable artifacts** | Container filesystem is in-memory and per-instance; artifacts vanish on recycle/scale (no mounted volume). | **Filestore (NFS, RWX)** or PersistentVolumes — one durable, shared artifact + audit tree across pods. | +| **Stateful orchestrator** | Scheduler, heartbeat, event listener, and WebSocket session affinity fight a request-scaled, instance-recycling model. | StatefulSet/Deployment with affinity; the orchestrator is genuinely long-lived. | +| **Multi-tenancy** | Limited isolation primitives. | Namespaces, network policies, resource quotas per customer. | + +**Recommended topology (hybrid):** + +```mermaid +flowchart TB + TRIG["Triggers\nwebhook · API · schedule"] --> ORCH + subgraph GKE["GKE cluster"] + ORCH["Revka orchestrator\n(StatefulSet)\ngovernance · gates · audit · scheduler"] + FS["Filestore / PV\ndurable artifacts + audit trail"] + ORCH --- FS + end + subgraph ENT["Enterprise agent tier — API-key only (hard rule)"] + AE["Agent Engine / Cloud Run\nADK · Gemini/Vertex"] + SM["session-manager\nnative agent SDKs"] + end + subgraph BYO["BYO / cost-sensitive tier — customer-supplied auth"] + BYOCLI["Subscription CLI agents\n(claude · codex · agy)"] + BYOKEY["Customer API keys\n(Anthropic · OpenAI · Gemini)"] + end + ORCH -->|A2A · identity token| AE + ORCH --> SM + ORCH -. "opt-in, customer accepts ToS" .-> BYO + AE --> VTX["Vertex AI · Gemini"] + SM --> VTX + SM --> ANT["Anthropic API"] + SM --> OAI["OpenAI API"] + BYOCLI -. "customer subscription / login" .-> EXT["External provider"] + BYOKEY -. "customer key" .-> EXT +``` + +- **GKE** hosts the stateful orchestrator + Filestore-backed durable artifacts. +- **Enterprise agent tier** (default): Agent Engine / Cloud Run via A2A, or the + session-manager via native agent SDKs — **metered API / Vertex auth only**. +- **BYO / cost-sensitive tier** (opt-in): subscription CLI agents or + customer-supplied keys, where the *customer* owns the credential and the ToS + liability — never a consumer subscription the platform carries itself. + +--- + +## 2. The agent-auth hard rule: native SDKs + metered API, never consumer subscriptions + +The cost arbitrage of driving consumer-subscription CLIs (Claude Pro, ChatGPT +Plus, Antigravity) is fine for a solo developer or internal use, but baking +**personal-tier subscriptions into a commercial product** almost certainly +violates those services' terms (personal-use licenses, not "serve your SaaS"). + +**Hard rule for the enterprise tier:** every agent runs through a provider's +**agent SDK with metered API / Vertex auth — never a consumer CLI subscription.** + +The `session-manager` is the home for this (it already has the provider +abstraction). Use each vendor's *agent* SDK (not the raw completion API — the +agent SDK is the tool-using loop you'd otherwise rebuild): + +| Provider | Agent SDK | Auth | +| --- | --- | --- | +| Anthropic | `@anthropic-ai/claude-agent-sdk` (already wired) | `ANTHROPIC_API_KEY` | +| OpenAI | OpenAI Agents SDK / Responses API w/ tools | `OPENAI_API_KEY` | +| Gemini | Google ADK (already used by the cloud agents) | Vertex / ADC | + +**Result:** the session-manager becomes a **multi-model agent runtime** — pick +Claude, GPT, or Gemini per workflow step, all key-authed, all headless (no CLI +binaries; runs on Cloud Run *and* GKE), all emitting structured SDK events +instead of stdout scraping. + +**Deliberate trade-off:** this removes the subscription cost arbitrage — +everything is metered API spend. That is the cost of legal cleanliness, so it is +offered as a tier: + +- **Enterprise tier:** API-key agent SDKs only. Hard rule. Billable, multi-tenant-safe. +- **BYO / cost-sensitive tier:** subscription CLIs allowed, but the *customer* + supplies the login and accepts the ToS — the liability is theirs. + +--- + +## 3. The product: continuous, goal-driven autonomous improvement + +The issue-resolver is one instance of a larger loop. The same engine, scheduled +and goal-parameterized, becomes a **continuous codebase-improvement platform**: + +```mermaid +flowchart LR + CRON["scheduler / cron"] --> ASSESS["Assessor agent\ngoal: cybersecurity | perf | deps | tests | a11y"] + ASSESS -->|creates| ISSUE["detailed GitHub issue (labeled 'revka')"] + ISSUE --> PIPE["issue-resolver pipeline\n(assess → preflight → GATE → code → review → GATE → merge)"] + PIPE --> KUMIHO["Kumiho knowledge graph\naccumulated findings · decisions · conventions"] + KUMIHO -. grounds .-> PIPE +``` + +- **Goal-driven** = one engine, many verticals. The only new component is a + goal-parameterized **assessor agent** (a sibling of the coder/reviewer). +- **Governance is the moat.** Plenty of tools write code; what enterprises can't + buy is autonomous improvement they're *allowed to turn on* — every change + gated, attributed to a distinct cryptographic agent identity, and recorded. + Cybersecurity is a particularly strong wedge: continuous audit → triaged + findings → governed remediation PRs. +- **Kumiho is the data flywheel.** Every assessment, review, and decision is + stored and connected, so over time the knowledge graph *becomes* the repo's + living standards and audit history — grounding future reviews in accumulated + institutional memory that no competitor can replicate. + +**Required guardrails before "continuous" is safe at scale:** per-run scope +limits, blast-radius caps, rollback, and gates that stay meaningful rather than +rubber-stamped. The bones exist (gates, identity, audit); productionizing the +continuous loop is its own milestone. + +--- + +## 4. Hardening backlog (surfaced during the build) + +| Item | Issue | Fix | +| --- | --- | --- | +| **Durable artifacts** | Cloud Run artifacts are ephemeral/per-instance. | Filestore/PV on GKE, or a GCS volume / push-to-bucket on Cloud Run. | +| **Admin token rotation** | `REVKA_GATEWAY_ADMIN_TOKEN` is long-lived; rotation needs a redeploy, no instant revocation. | Support a token list (rotate with overlap); document the secret-version rotate flow; consider short-lived tokens. | +| **Admin token entropy** | The gateway accepts any non-empty string as a full-access bearer. | Reject tokens below a minimum length/entropy; warn on weak values. | +| **Deployment-specific workflows in OSS** | Cloud-bound workflows hardcoding deployment URLs must not ship as embedded builtins. | Keep them as deployment-owned Kumiho artifacts; OSS builtins stay generic. | +| **Secrets in repo** | Local trigger scripts holding tokens must never be committed. | gitignored; use env/Secret Manager, never literals. | + +--- + +## 5. Mandate alignment (today vs. enterprise) + +| Mandate | Demo (today) | Enterprise | +| --- | --- | --- | +| Cloud-Native Runtime | Cloud Run (orchestrator + agents), keyless CI (WIF) | GKE (stateful core + Filestore) + managed agent tier | +| Gemini / Vertex intelligence | Gemini 2.5 Pro via Vertex ADC, no keys | + multi-model session-manager (Claude/GPT/Gemini), API-key hard rule | +| A2A interoperability | A2A discovery + send_task across 4 services | + assessor agent; cross-org A2A federation | +| B2B multi-agent | Governed issue→PR→merge with human gates | Continuous, goal-driven, auditable improvement platform | diff --git a/docs/JUDGES.md b/docs/JUDGES.md index d67c831..e464a55 100644 --- a/docs/JUDGES.md +++ b/docs/JUDGES.md @@ -2,10 +2,14 @@ Revka runs natively on **Google Cloud Run** (project `construct-498201`, region `us-central1`) as the service `revka-orchestrator`. It is the -governance + orchestration brain of the Track 3 multi-agent system: it -receives GitHub issues, runs the Google AgentOps preflight, enforces -human-in-the-loop approval gates, and coordinates the **ADK / Gemini coder -and reviewer agents** (also on Cloud Run) over the **A2A protocol**. +governance + orchestration brain of a general multi-agent **workflow +platform**; the Track 3 demo runs one workflow (`github-issue-resolver`) end to +end. The orchestrator receives the trigger, runs the Google AgentOps preflight, +enforces human-in-the-loop approval gates, and coordinates two **ADK / Gemini** +agents — a **coder on Cloud Run** (it needs a git/shell sandbox) and a +**reviewer on Vertex AI Agent Engine** (pure reasoning + Vertex AI Search +grounding). Every run is stored and versioned in **Kumiho**, Revka's +graph-native memory. Reasoning runs on **Gemini via Vertex AI**, authenticated by the service's own Google service account (no API keys). Each agent — orchestrator, coder, @@ -98,7 +102,9 @@ GitHub issue ──webhook──▶ Revka orchestrator (Cloud Run) │ governance · gates · audit ├──A2A (identity token)──▶ AgentOps control plane (Cloud Run) ├──A2A (identity token)──▶ coder agent (ADK · Gemini/Vertex · Cloud Run) - └──A2A (identity token)──▶ reviewer agent (ADK · Gemini/Vertex · Cloud Run) + ├──query (identity token)─▶ reviewer agent (ADK · Gemini/Vertex · Agent Engine) + │ └─ grounded in Vertex AI Search + └──store + version every run──▶ Kumiho (graph-native memory) │ ▼ PR opened → reviewed → merged diff --git a/docs/TRACK3_SUBMISSION.md b/docs/TRACK3_SUBMISSION.md new file mode 100644 index 0000000..afdae73 --- /dev/null +++ b/docs/TRACK3_SUBMISSION.md @@ -0,0 +1,227 @@ +# Revka — Google for Startups AI Agents Challenge, Track 3 + +**A platform for building autonomous, auditable, multi-agent workflows on Google Cloud — governed by humans, grounded in memory that compounds.** + +Revka lets an enterprise define multi-step work *visually*, as a graph of steps; +**ADK agents** execute it, coordinating over the **A2A protocol** and reasoning on +**Gemini via Vertex AI**; a human approves the decisions that matter; and every +run is stored and versioned in **Kumiho**, Revka's graph-native memory. The same +engine runs security audits, compliance reviews, data pipelines, and research — +any process you can define. + +This submission demonstrates **one** such workflow end to end — +**`github-issue-resolver`**. A GitHub issue triggers a governed pipeline: a +Gemini **assessor** plans the fix, an **AgentOps preflight** verifies the Google +AgentOps / A2A control plane, and — after a human gate — an ADK **coder agent** +(on Cloud Run, where it has a real git/shell sandbox) implements the fix and +opens a PR; an ADK **reviewer agent** (on **Vertex AI Agent Engine**, grounded in +**Vertex AI Search**) reviews it; and after a second gate the change is merged and +the issue closed. Two grounded specialists, governed by a human, doing what no +single agent could do reliably — and the whole run recorded in Kumiho. + +--- + +## Architecture + +The architecture centers on **Revka running on Google Cloud** and its +integration with Google Cloud services. Triggers are pluggable — a GitHub +webhook/Action, the REST API, or the scheduler; the demo uses the GitHub-label +trigger as one example. The GitHub repository is the *work target*, not the core. + +```mermaid +flowchart TB + subgraph TRIG["Triggers (pluggable — GitHub label is the demo example)"] + T1["GitHub webhook / Action"] + T2["REST API"] + T3["Scheduler / cron"] + end + + subgraph GCP["Google Cloud — Revka deployment (Cloud Run · us-central1)"] + direction TB + subgraph ORCH["revka-orchestrator · Cloud Run"] + direction TB + WF["Workflow engine\nassess → preflight → GATE → code → review → GATE → merge"] + GATE["Human-in-the-loop gates"] + REASON["Operator reasoning\nGemini 2.5 Pro · Vertex AI"] + end + subgraph EXEC["Agent tier — right runtime per agent"] + direction TB + CODER["coder-agent\nADK · Gemini/Vertex\n· Cloud Run (git/shell sandbox)"] + REVIEWER["reviewer-agent\nADK · Gemini/Vertex\n· Vertex AI Agent Engine"] + end + CP["AgentOps control plane\n(A2A)"] + VERTEX["Vertex AI\nGemini 2.5 Pro"] + VSEARCH["Vertex AI Search\nconventions grounding (RAG)"] + SM["Secret Manager\ntokens · keys"] + AR["Artifact Registry\nimages (keyless CI / WIF)"] + end + + KUM["Kumiho — graph-native memory (private-data RAG)\nstores & versions every workflow + run\nthe heart of Revka (GCP-compliant; hosted on AWS today)"] + GHREPO["GitHub repository\nwork target: issue → PR → merge → close"] + + TRIG --> WF + WF --> GATE + WF -. "A2A discover · identity token" .-> CP + WF -- "A2A send_task · identity token" --> CODER + WF -- "query · aiplatform identity token" --> REVIEWER + REASON --> VERTEX + CODER --> VERTEX + REVIEWER --> VERTEX + REVIEWER -- "ground review (RAG)" --> VSEARCH + ORCH --> SM + CODER --> SM + WF -- "store + version every run\nground future runs" --> KUM + AR -. images .-> ORCH + AR -. images .-> CODER + CODER -- "open PR · merge · close" --> GHREPO + GHREPO -. "issue payload" .-> TRIG +``` + +### Identity & trust mesh (Agent Identity) + +Each service has its own Google service account; least-privilege IAM governs +every edge — there are **no long-lived keys** anywhere except a single, +repo-scoped GitHub token held in Secret Manager. + +```mermaid +flowchart LR + DEP["revka-deployer@\n(GitHub Actions via WIF)"] + ORCH["revka-orchestrator@\nVertex user"] + CODER["coder-agent@\nVertex user · secret accessor"] + REV["reviewer-agent@\nVertex user · secret accessor"] + + DEP -- "iam.serviceAccountUser\n+ run.admin + AR writer" --> ORCH + DEP --> CODER + DEP --> REV + ORCH -- "run.invoker" --> CODER + ORCH -- "run.invoker" --> REV +``` + +--- + +## How Revka addresses the five key considerations + +| Key consideration | How Revka delivers it | +| --- | --- | +| **Multi-agent design & orchestration with ADK** | The work is done by **ADK agents** — a coder and a reviewer, each with its own function tools — coordinated by Revka's declarative **workflow engine** over the **A2A protocol**. Orchestration is *visual and governed*: a step graph with structured I/O, human gates, audit, and checkpoint/resume — not a single prompt. | +| **Deployment on Agent Engine** | The **reviewer** — a pure reasoning + grounding agent — is deployed to **Vertex AI Agent Engine** (`reasoning_engines.AdkApp`), the managed runtime built for exactly this. The **coder** stays on **Cloud Run** because it needs a real git/shell sandbox to clone, edit, test, and open PRs. *Right runtime per agent* is the design, not a compromise. | +| **Compelling business use case** | Revka is a **platform** for autonomous, auditable enterprise workflows — security audits, compliance reviews, data pipelines, research, operations. The demoed `github-issue-resolver` is one concrete, high-value instance: autonomous dev-ops with a human in the loop and a full audit trail. | +| **Grounding & RAG** | Two layers. **(1) Vertex AI Search** grounds the reviewer in the repo's coding conventions — it retrieves numbered rules and cites them in findings. **(2) Kumiho**, Revka's **graph-native memory**, is a *private-data RAG*: it stores and versions **every workflow and every run**, so past work grounds future runs. Kumiho is the heart of Revka. | +| **Collaboration + grounding > a single agent** | See the dedicated argument [below](#why-multi-agent--grounding-beats-a-single-agent). In short: a *grounded reviewer* independently catches what the coder — optimizing to satisfy the task — is structurally blind to, and human gates bound the autonomy. The separation of duties **is** the capability. | + +## How it maps to the Track 3 mandates + +| Mandate | How Revka satisfies it | +| --- | --- | +| **B2B focus** | Autonomous, audited issue→PR→merge resolution with human approval gates — a developer-operations product for engineering orgs. | +| **Cloud-Native Runtime** | The orchestrator and coder run on **Cloud Run** (`revka-orchestrator`, `coder-agent`); the reviewer runs on **Vertex AI Agent Engine** (`revka-reviewer`). All built and deployed via GitHub Actions with **Workload Identity Federation** (no service-account keys). | +| **Google Cloud Powered Intelligence** | Every reasoning step runs on **Gemini 2.5 Pro through Vertex AI**, authenticated by each service's own account via Application Default Credentials — **no API keys**. | +| **A2A Interoperability** | All work steps are **A2A** calls: discovery of the AgentOps control plane, and `send_task`/`get_task` to the coder and reviewer. Cross-service calls authenticate with **Cloud Run identity tokens minted from the metadata server**. | + +### Mandatory technologies + +- **Intelligence:** Gemini 2.5 Pro on Vertex AI. +- **Orchestration of the work agents:** **Agent Development Kit (ADK)** — the coder and reviewer are ADK agents with function tools (`run_shell`, `read_file`/`write_file`, `github_open_pr`, `github_merge_pr`, `github_comment_and_close_issue`, `github_get_pr_diff`). +- **Higher-order orchestration & governance:** the Revka workflow engine (declarative steps, structured I/O, human gates, audit, checkpoint/resume), with **Kumiho** graph-native memory storing and versioning every run. +- **Agent runtimes:** **Vertex AI Agent Engine** for the reasoning reviewer; **Cloud Run** for the orchestrator and the sandbox-needing coder. +- **Infrastructure:** Cloud Run + Vertex AI Agent Engine + Artifact Registry + Secret Manager + Vertex AI + Vertex AI Search + Workload Identity Federation. + +--- + +## The workflow (`github-issue-resolver`, revision r19) + +8 steps, no local CLI agents — every reasoning/work step is A2A, Python, or a human gate: + +| # | Step | Type | What it does | +| --- | --- | --- | --- | +| 1 | `assess_issue` | python | Parse the GitHub payload; derive issue number/title/body and a fix strategy. | +| 2 | `agentops_preflight` | python | Mint a metadata-server identity token and A2A-discover the AgentOps control plane (evidence, never hard-fails). | +| 3 | `human_approval_gate_1` | human gate | Approve before any repository mutation. | +| 4 | `deploy_coder_agent` | **a2a** | Send the issue+strategy to the ADK coder; it clones, implements, tests, opens a PR. | +| 5 | `extract_pr_number` | python | Regex the PR number out of the coder's `pr_url` for downstream steps. | +| 6 | `review_pr` | **agent_engine** | Query the ADK reviewer on **Vertex AI Agent Engine** (orchestrator mints an `aiplatform.user` identity token); it fetches the diff, grounds in Vertex AI Search, and returns a verdict. | +| 7 | `human_approval_gate_2` | human gate | Approve before merge. | +| 8 | `merge_and_close` | **a2a** | Coder merges the PR and closes the issue via the GitHub REST API. | + +--- + +## Live proof + +Multiple complete cloud-only runs, verified on GitHub: + +- **Fully automatic trigger:** opening an issue and adding the **`revka`** label + fires a GitHub Action that POSTs to the Cloud Run orchestrator (stable bearer + token in repo secrets) — no manual command. Issue → Action → Cloud Run → + PR → merge → close. +- **Run `b5ee9d9d` — reviewer on Vertex AI Agent Engine** (issue + [#12](https://github.com/KumihoIO/google-agentops-demo/issues/12) → PR + [#13](https://github.com/KumihoIO/google-agentops-demo/pull/13), **MERGED / + CLOSED**): the coder (Cloud Run) opened the PR; the reviewer **on Agent Engine** + returned a grounded verdict citing **Rules 1, 5, 6, 7, 9, 10** and flagging a + missing negative-input guard (Rule 7) and missing test (Rule 9); a human + approved both gates. The reviewer's **Vertex AI Search** (`reviewer-conventions` + data store) query returned **HTTP 200** — confirmed in the engine logs — i.e. + *live* grounding, not the bundled fallback. +- **Run `89bb9e5a`** (issue [#8](https://github.com/KumihoIO/google-agentops-demo/issues/8) → PR [#9](https://github.com/KumihoIO/google-agentops-demo/pull/9), **MERGED / CLOSED**): + the **grounded reviewer cited a specific rule** — *"violates Rule 1: 'Money is + integer cents, never floats'"* — retrieved from the **Vertex AI Search** + conventions data store. +- **AgentOps preflight:** `a2a_discovery_status: discovered` against the Cloud + Run control plane (identity token minted from the metadata server). +- **Reasoning:** Gemini 2.5 Pro via Vertex AI throughout; coder & reviewer are + ADK agents (reviewer on **Agent Engine**); grounding via **live Vertex AI Search**. + +### Grounding & RAG — two layers + +**1. Vertex AI Search (per-agent grounding).** The reviewer agent is grounded in +the repository's coding conventions (`reviewer-conventions` Discovery Engine data +store). It queries Vertex AI Search and checks the PR diff against the retrieved +numbered rules, citing them in its findings (e.g. *"violates Rule 1: money is +integer cents, never floats"*). + +**2. Kumiho — graph-native memory as private-data RAG.** Kumiho is the heart of +Revka. Every workflow definition and every run — inputs, each step's structured +output, agent verdicts, approvals, the final result — is **stored and versioned** +as a connected graph, not flat logs. This is a *custom RAG over the organization's +own private data*: past runs become retrievable, attributable context that grounds +future runs, and the version history is the audit trail. Workflow revisions +(e.g. `github-issue-resolver` r19) are themselves Kumiho-tracked artifacts. +Kumiho is GCP-compliant (previously GCP-hosted; on AWS today), and a managed-GCP +migration is on the roadmap. + +### Why multi-agent + grounding beats a single agent + +A single agent asked to "fix the issue and self-review" is structurally +conflicted: the same context and objective that produce the change also produce +the self-assessment, so it is blind to its own blind spots. Revka splits the work +into **independent, separately-grounded specialists**: + +- The **coder** (Cloud Run sandbox) optimizes to *satisfy the task* — clone, + implement, test, open a PR. +- The **reviewer** (Agent Engine) optimizes to *find what's wrong*, grounded in + named, retrieved standards it did not write and the coder never saw as a rubric. +- **Human gates** bound the autonomy at the two decisions that carry risk + (first mutation, and merge), and **Kumiho** records every step so the outcome is + auditable and feeds the next run. + +The separation of duties — plus grounding that gives the reviewer an external +source of truth — is precisely what makes the result trustworthy. That is the +capability a single agent cannot reach: not more tokens, but **independent +verification against grounded standards, under human governance**. + +## Service endpoints + +| Service | URL | +| --- | --- | +| Orchestrator | Cloud Run · `https://revka-orchestrator-n22ujw2j2a-uc.a.run.app` | +| Coder agent | Cloud Run · `https://coder-agent-n22ujw2j2a-uc.a.run.app` | +| Reviewer agent | **Vertex AI Agent Engine** · `projects/1091585228963/locations/us-central1/reasoningEngines/3625053003137941504` | +| AgentOps control plane | Cloud Run · `https://construct-agentops-a2a-1091585228963.us-central1.run.app` | + +Access for judges: see [`docs/JUDGES.md`](./JUDGES.md). +Enterprise & GKE roadmap: see [`docs/ENTERPRISE_ROADMAP.md`](./ENTERPRISE_ROADMAP.md). + +> Note: the cloud-native `github-issue-resolver` workflow is deployment-specific +> (it A2A-calls this project's Cloud Run agents) and lives as a Kumiho-revisioned +> artifact in this deployment — it is intentionally **not** shipped in the OSS +> builtins. diff --git a/scripts/demo-logs.sh b/scripts/demo-logs.sh new file mode 100755 index 0000000..04bef8f --- /dev/null +++ b/scripts/demo-logs.sh @@ -0,0 +1,59 @@ +#!/usr/bin/env bash +# Live-ish Cloud Run log tail for the Track 3 demo split pane. +# +# Streams recent logs from the three Revka services with a colored service tag, +# so a screen-recording split window shows the agents actually executing on +# Google Cloud (A2A calls, ADK/Vertex reasoning, clone/test/PR) while the Revka +# dashboard drives the run in the other pane. +# +# bash scripts/demo-logs.sh # all three services +# SERVICES="coder-agent" bash scripts/demo-logs.sh # focus one +# +# Prefers the real-time `gcloud alpha logging tail`; falls back to a 3s poll +# loop (no extra components needed) if tail is unavailable. +set -euo pipefail + +PROJECT="${PROJECT:-construct-498201}" +SERVICES="${SERVICES:-revka-orchestrator coder-agent reviewer-agent}" + +# Build a Logging filter over the selected services. +names=$(printf '"%s" OR ' $SERVICES); names="(${names% OR })" +FILTER="resource.type=\"cloud_run_revision\" AND resource.labels.service_name=${names}" + +color() { case "$1" in + revka-orchestrator) printf '\033[36m';; # cyan + coder-agent) printf '\033[32m';; # green + reviewer-agent) printf '\033[35m';; # magenta + *) printf '\033[37m';; +esac; } + +emit() { # service \t text + local svc="${1%%$'\t'*}" txt="${1#*$'\t'}" + [ -z "$txt" ] && return + printf '%b%-18s\033[0m │ %s\n' "$(color "$svc")" "$svc" "$txt" +} + +echo "── Cloud Run logs · $SERVICES · project $PROJECT ──" + +if gcloud alpha logging tail --help >/dev/null 2>&1; then + gcloud alpha logging tail "$FILTER" --project "$PROJECT" \ + --format='value(resource.labels.service_name, textPayload)' 2>/dev/null \ + | while IFS= read -r line; do emit "$line"; done +else + echo "(alpha tail unavailable — polling every 3s)" + seen="/tmp/.demo-logs-seen.$$"; : > "$seen" + trap 'rm -f "$seen"' EXIT + while true; do + gcloud logging read "$FILTER" --project "$PROJECT" --freshness=30s --order=asc \ + --format='value(timestamp, resource.labels.service_name, textPayload)' 2>/dev/null \ + | while IFS= read -r row; do + key=$(printf '%s' "$row" | cksum | cut -d' ' -f1) + grep -q "^$key\$" "$seen" 2>/dev/null && continue + echo "$key" >> "$seen" + svc=$(printf '%s' "$row" | awk '{print $2}') + txt=$(printf '%s' "$row" | cut -d$'\t' -f3-) + emit "$svc"$'\t'"$txt" + done + sleep 3 + done +fi