Skip to content

release: v0.2.2 — IDP, observability, tiered architecture, org cycle#641

Closed
kokevidaurre wants to merge 158 commits intomainfrom
develop
Closed

release: v0.2.2 — IDP, observability, tiered architecture, org cycle#641
kokevidaurre wants to merge 158 commits intomainfrom
develop

Conversation

@kokevidaurre
Copy link
Copy Markdown
Contributor

See full changelog in develop branch commits. 157 files, 17K+ lines. All 1779 tests pass. 13/13 Docker fresh-user tests pass.

kokevidaurre and others added 30 commits February 21, 2026 12:32
Closes #342

Co-authored-by: Squads Cloud Worker <cloud@agents-squads.com>
Co-authored-by: Claude <noreply@anthropic.com>
…351)

Prevents shell injection via crafted paths in background and watch
execution modes. Applies same escaping used in foreground mode (PR #324).

Adds shellEscape() helper that replaces single quotes with '\'' to
safely interpolate variables into single-quoted shell strings. Applied to:
- Watch mode: projectRoot, worktreeDir, branchName, logFile, pidFile
- Background mode: projectRoot, worktreeDir, branchName, logFile, pidFile
- Provider background mode: workDir, logFile, pidFile, provider args
- execSync worktree calls in foreground and provider modes

Closes #340

Co-authored-by: Squads Cloud Worker <cloud@agents-squads.com>
Co-authored-by: Claude <noreply@anthropic.com>
v0.6.2 released, 3 security P1 issue-solvers dispatched,
751 tests passing, Q1 goals 2/3 achieved.

Co-authored-by: Squads Cloud Worker <cloud@agents-squads.com>
Co-authored-by: Claude <noreply@anthropic.com>
…339)

Closes #319

Added default .action(() => cmd.outputHelp()) to 7 parent commands
(env, kpi, feedback, session, trigger, approval, autonomous) so they
exit 0 instead of 1 when invoked without a subcommand. Matches the
pattern already used by memory, goal, deploy, and exec commands.

Co-Authored-By: engineering/issue-solver <engineering-issue-solver@agents-squads.com>

Agent: engineering/issue-solver
Squad: engineering
Model: claude-opus-4-6

Co-authored-by: Squads Cloud Worker <cloud@agents-squads.com>
…354)

Replace scattered console.log calls with the project's writeLine()
utility from src/lib/terminal.ts. This provides a single output
layer for consistent formatting and future output control.

- Convert 238 console.log calls to writeLine across 10 files
- Remove 8 debug/placeholder log statements from anthropic.ts
- Keep console.log only for JSON.stringify output (--json flags)
  and raw prompt piping — standard CLI patterns
- Reduction: 269 → 31 occurrences (88% decrease)
- Zero new TypeScript errors

Files: init.ts, deploy.ts, autonomous.ts, trigger.ts, approval.ts,
eval.ts, login.ts, cli.ts, anthropic.ts, update.ts

Co-authored-by: kokevidaurre <kokevidaurre@users.noreply.github.com>
Co-authored-by: Claude <noreply@anthropic.com>
Replace minimal README with comprehensive 331-line version covering:
- Quick start with real output examples
- Why Squads (4 differentiators)
- Provider table (7 LLM providers)
- Feature showcase (dashboard, memory, sessions, autonomous, hooks)
- Command reference (21 active commands, no removed ones)
- Project structure and configuration examples
- Development guide and tech stack
- Contributing and community links

References only current commands (memory write/read instead of learn,
env show instead of context, exec list instead of history).

🤖 Generated with [Agents Squads](https://agents-squads.com)

Co-authored-by: kokevidaurre <kokevidaurre@users.noreply.github.com>
Co-authored-by: Claude <noreply@anthropic.com>
Closes agents-squads/engineering#51

Removed the base64-obfuscated API key from source code and replaced
with SQUADS_TELEMETRY_KEY env var. Telemetry send is skipped when key
is not set. The exposed key must be rotated server-side separately.

Co-Authored-By: engineering/issue-solver <engineering-issue-solver@agents-squads.com>

Agent: engineering/issue-solver
Squad: engineering
Model: claude-opus-4-6

Co-authored-by: Squads Cloud Worker <cloud@agents-squads.com>
Closes #343

The daemon process was silently failing because Commander.js rejected
the unregistered --daemon CLI flag. Replace with SQUADS_DAEMON env var
to signal daemon mode, redirect child stdout/stderr to log file for
diagnosability, and show clear error when daemon fails to start.

Co-Authored-By: engineering/issue-solver <engineering-issue-solver@agents-squads.com>

Agent: engineering/issue-solver
Squad: engineering
Model: claude-opus-4-6

Co-authored-by: Squads Cloud Worker <cloud@agents-squads.com>
* feat(status): show milestones and open PRs from GitHub

squads status now queries GitHub API for real operational data:
- Milestone progress bars across product repos (cli, console, api)
- Open PRs targeting develop with repo and number

Replaces vanity-only output with actionable org health metrics.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(status): discover repos dynamically from squad definitions

Replace hardcoded PRODUCT_REPOS array with dynamic discovery:
- Read `repo` field from each SQUAD.md frontmatter
- Deduplicate and pass to fetchOperationalStatus()
- GitHub org derived from squad config, not hardcoded
- Dynamic column widths based on actual repo names
- Show all open PRs (not just develop-targeted)

Any user's squads with `repo:` in SQUAD.md will show milestones + PRs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: rewrite CLAUDE.md as user-facing guide

Remove internal references, org names, and dev-specific content. Focus on
teaching users how to define squads, run agents, and monitor work. Git-provider
agnostic. Engineering standards now live in hq CLAUDE.md (internal only).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: kokevidaurre <kokevidaurre@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Closes #24

Converts ~50 static command imports to dynamic import() inside action
handlers. Only the invoked command's dependencies (pg, supabase, inquirer,
ora) are loaded, saving ~300ms+ on cold start.

Changes:
- All command handlers use dynamic import() in their .action() callbacks
- autoUpdateOnStartup skipped for --help/--version (instant response)
- register*Command imports kept static (needed for subcommand structure)
- Type-only import for SessionSummaryData (zero runtime cost)

Co-Authored-By: engineering/issue-solver <engineering-issue-solver@agents-squads.com>

Agent: engineering/issue-solver
Squad: engineering
Trigger: manual
Model: claude-opus-4-6

Co-authored-by: Squads Cloud Worker <cloud@agents-squads.com>
)

Closes #297

Show "squads dash" hints at key touchpoints:
- After successful foreground/background agent execution
- After lead session completion
- After parallel agent launch
- In squad detail status commands section

Co-Authored-By: engineering/issue-solver <engineering-issue-solver@agents-squads.com>

Agent: engineering/issue-solver
Squad: engineering
Trigger: manual
Model: claude-opus-4-6

Co-authored-by: Squads Cloud Worker <cloud@agents-squads.com>
Breaks down the 350-line executeWithClaude into 6 focused functions:
- buildAgentEnv: consolidates 3x duplicated env construction
- logVerboseExecution: DRYs up verbose config logging (was 2x identical)
- createAgentWorktree: isolates Node.js worktree creation
- buildDetachedShellScript: shared shell script for watch/background
- prepareLogFiles: shared log directory setup
- executeForeground: foreground spawn + status tracking
- executeWatch: watch mode (background + tail)

executeWithClaude is now a ~80-line coordinator that delegates to
the appropriate mode function.

Closes #158

Co-Authored-By: engineering/issue-solver <engineering-issue-solver@agents-squads.com>

Agent: engineering/issue-solver
Squad: engineering
Model: claude-opus-4-6

Co-authored-by: Squads Cloud Worker <cloud@agents-squads.com>
…dless flags

Closes #371

Two fixes for Google/Gemini provider execution:

1. Add --yolo flag to Gemini CLI args for headless auto-approval.
   Without this, Gemini denies all tool calls when running in background
   because it can't prompt for interactive confirmation.

2. Copy .agents directory into worktree and rewrite prompt paths.
   Gemini CLI sandboxes file access to its workspace directory.
   The prompt references agent definitions at the original project root,
   which Gemini blocks as "Path not in workspace". Now we copy .agents
   into the worktree and rewrite absolute paths so Gemini can resolve them.

Co-Authored-By: engineering/issue-solver <engineering-issue-solver@agents-squads.com>

Agent: engineering/issue-solver
Squad: engineering
Model: claude-opus-4-6

Co-authored-by: Squads Cloud Worker <cloud@agents-squads.com>
Closes #280

Implements `squads create <name>` that creates:
- .agents/squads/<name>/SQUAD.md (from template)
- .agents/squads/<name>/lead.md (starter agent)
- .agents/memory/<name>/lead/ (memory directory)

Supports --description, --goal, --model flags for non-interactive use,
and interactive prompts via inquirer when flags are omitted.
Includes --force for overwriting and --yes for CI/scripting.

Note: organization.yaml is not used — squads are discovered dynamically
via filesystem (squad-parser.ts findSquadsDir + listSquads).

11 tests covering directory creation, content, naming, overwrite
protection, and squad discoverability.

Co-Authored-By: engineering/issue-solver <engineering-issue-solver@agents-squads.com>

Agent: engineering/issue-solver
Squad: engineering
Trigger: manual
Model: claude-opus-4-6

Co-authored-by: Squads Cloud Worker <cloud@agents-squads.com>
Closes #366

When --cloud is set, the CLI dispatches agent execution to the platform
API instead of running locally. Requires `squads login` session and
SQUADS_API_URL environment variable.

Flow:
- POST /agent-dispatch to create dispatch request
- Poll /agent-executions for status updates
- Display execution summary on completion

Co-Authored-By: engineering/issue-solver <engineering-issue-solver@agents-squads.com>

Agent: engineering/issue-solver
Squad: engineering
Trigger: smart
Model: claude-opus-4-6

Co-authored-by: Squads Cloud Worker <cloud@agents-squads.com>
Closes #316

Added 63 tests covering 2 of the 6 lib modules listed in the issue:
- setup-checks.ts (48 tests): providers registry, commandExists,
  isDockerRunning, checkDockerPrereqs, checkGhCli, checkGhPermissions,
  checkClaudeCli, checkProviderAuth, runPrereqChecks, runAuthChecks,
  displayCheckResults, attemptFix, waitForService
- local.ts (15 tests): getLocalEnvVars, formatLocalStatus,
  isLangfuseLocal, getLocalStackStatus

Co-authored-by: Squads Cloud Worker <cloud@agents-squads.com>
Co-authored-by: Claude <noreply@anthropic.com>
…urces (#382)

Closes #314. Adds 115 tests across 4 test files achieving 92% statement
coverage and 80% branch coverage on the dashboard module:

- dashboard-loader.test.ts: 16 tests for findDashboardsDir, listDashboards,
  loadDashboard, clearDashboardCache, loadAllDashboards, findDashboard
- dashboard-renderers.test.ts: 49 tests for formatValue (all formats),
  getThresholdColor, calculateColumnWidths, and renderView (all view types)
- dashboard-sources.test.ts: 31 tests for buildQuery, buildWhereClause,
  parseDateRange, and postgresSource stub
- dashboard-engine.test.ts: 19 tests for executeDashboard, renderDashboard,
  and showAvailableDashboards with mocked dependencies

Co-authored-by: kokevidaurre <kokevidaurre@users.noreply.github.com>
Co-authored-by: Claude <noreply@anthropic.com>
…381)

Closes #51

Changes:
- db.test.ts: Enable 4 previously skipped baseline tests (saveBaseline,
  getLatestBaseline, getBaselineByName, listBaselines) — stubs are
  implemented, tests were incorrectly marked as not-yet-implemented
- sessions.test.ts: Add 30 new tests covering file-system operations:
  findAgentsDir, getSessionsDir, getHistoryFilePath, getActiveSessions,
  getSessionSummary, startSession, stopSession, updateHeartbeat,
  cleanupStaleSessions — all use temp dirs to avoid test pollution
  Also expanded detectSquad, detectAIProcessesFast, getLiveSessionSummaryFast

Total: 63 → 104 tests passing, 0 skipped

Co-authored-by: kokevidaurre <kokevidaurre@users.noreply.github.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: kokevidaurre <kokevidaurre@users.noreply.github.com>
Co-authored-by: Claude <noreply@anthropic.com>
Post-execution instructions (branch, commit, PR workflow) now loaded from
.agents/config/post-execution.md instead of inline template string in run.ts.
Separates prompt content from code. Same pattern as approval-instructions.md.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This reverts commit 9999f92700c02af522e15cae29097a60f249cf15.
…eck (#389)

* fix(ci): run CI on PRs to develop — quality gate for agent PRs

Agents create PRs targeting develop. Without CI on develop PRs,
broken code gets merged undetected. This is the #1 quality gap.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(quality): pre-commit hook runs build + tests on source changes

Agents were committing broken code (e.g. #384: tests that fail on
import). Now any commit touching .ts/.tsx/.js files must pass both
`npm run build` and `npm run test` before the commit goes through.

This is the #1 quality gate — prevents slop at the source.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(tests): align failing tests with implementation

- deploy.test: capture process.stdout.write instead of console.log
  (deployCommand uses writeLine which writes to stdout)
- eval.test: same stdout capture fix for JSON output test
- infra.test: use POSTGRES_PORT env var (default 5433) to match
  docker-compose pattern
- local.test: expect port 5432 in DATABASE_URL matching getLocalEnvVars()
- setup-checks.test: expect 'warning' (not 'missing') when Docker
  is not installed, matching checkDockerPrereqs() implementation
- Deleted verify-token.test.ts (tested nonexistent verifyToken export)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix(agents): proper PR workflow — target develop, daemon env, auth check

- Post-execution: agents now open PRs targeting `develop` with structured body
- Daemon (autonomous.ts): unset CLAUDECODE env to allow nested claude sessions
- Auth check: downgrade missing credentials from block to warn (keychain auth)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor(run): extract post-execution prompt to template file

Post-execution instructions (branch, commit, PR workflow) now loaded from
.agents/config/post-execution.md instead of inline template string.
Separates prompt content from code. Same pattern as approval-instructions.md.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: kokevidaurre <kokevidaurre@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
- Add missing env-config.ts (imported by run.ts but never committed)
- Fix Commander action spread types with @ts-expect-error directives
- Add inquirer type declaration for create command

Co-authored-by: kokevidaurre <kokevidaurre@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…tines' (#392)

Regex only matched '## Routines' exactly, missing Engineering squad's
'## Growth Routines' header. Now matches any word before 'Routines'.

Co-authored-by: kokevidaurre <kokevidaurre@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Multi-agent conversation orchestration for squad runs:
- Lead briefs → scanners discover → workers execute → lead reviews → verifiers check
- Shared transcript between agents for context continuity
- Convergence detection (continuation signals beat convergence signals)
- Cost ceiling ($25 default) and max turns (20 default) safety limits
- --task flag for founder directives (replaces lead briefing)
- Transcript persistence to .agents/conversations/{squad}/

New files:
- src/lib/conversation.ts — types, transcript, agent classification, convergence
- src/lib/workflow.ts — turn execution, orchestration loop, transcript persistence

`squads run <squad>` now runs a full conversation instead of just the lead agent.
`squads run <squad> -a <agent>` still runs individual agents (unchanged).

Co-authored-by: kokevidaurre <kokevidaurre@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix(auth): add verifyToken function and passing test suite

Closes #384

Adds verifyToken(token, apiUrl) to src/lib/auth.ts:
- Calls GET /auth/verify with Bearer token header
- Maps snake_case API response to camelCase (display_name→name, subscription_plan→plan)
- Returns null on non-ok responses, network errors, and timeouts/aborts
- 5-second abort timeout to prevent hanging

Creates test/verify-token.test.ts with all 6 specified tests:
1. Returns user data on 200 with snake_case→camelCase mapping
2. Returns null on non-ok response (e.g. 401)
3. Returns null on network error (silent)
4. Returns null on timeout/abort
5. Sends Bearer token in Authorization header
6. Builds correct URL from apiUrl param

Co-Authored-By: cli/issue-solver <cli-issue-solver@agents-squads.com>

Agent: cli/issue-solver
Squad: cli

* fix(auth): update verifyToken signature and response to match API spec

Revises the initial implementation based on actual API contract:
- Parameter order: verifyToken(apiUrl, token) — apiUrl first
- Endpoint: /auth/cli/verify (not /auth/verify)
- Response shape: { email, tenantId, tenantSlug, tenantName, status }
  mapping from snake_case { tenant_id, tenant_slug, tenant_name }
- Updates test/verify-token.test.ts to use vi.stubGlobal per-test
  with afterEach cleanup for better test isolation

All 6 tests pass.

Co-Authored-By: cli/issue-solver <cli-issue-solver@agents-squads.com>

Agent: cli/issue-solver
Squad: cli

---------

Co-authored-by: kokevidaurre <kokevidaurre@users.noreply.github.com>
* test(commands): add unit tests for goal and list commands

Adds 21 new tests covering:
- goal.test.ts (14 tests): goalSetCommand, goalListCommand,
  goalCompleteCommand, goalProgressCommand — including edge cases
  for invalid indexes, non-existent squads, metric annotations
- list.test.ts (7 tests): JSON output validation, agent counts,
  no-project error handling, table and agents view rendering

Partial fix for #47 — covers 2 of 19 untested command files.

Co-Authored-By: engineering/issue-solver <engineering-issue-solver@agents-squads.com>

Agent: engineering/issue-solver
Squad: engineering
Model: claude-opus-4-6

* test: add unit tests for feedback and progress commands

Closes #47 (partial — 2 of 15 untested commands)

Added 19 tests covering:
- feedback: add, show, parse history, rating validation, learnings
- progress: start/complete tasks, display, verbose mode, task IDs

Co-Authored-By: engineering/issue-solver <engineering-issue-solver@agents-squads.com>

Agent: engineering/issue-solver
Squad: engineering
Model: claude-opus-4-6

---------

Co-authored-by: kokevidaurre <kokevidaurre@users.noreply.github.com>
…ification

- classifyAgent now uses role descriptions from SQUAD.md (primary) with
  name-based fallback — no more regex substring collisions
- Strip **bold** markers from agent names in table parser
- Replace regex convergence/continuation signals with phrase matching
- "keychain auth" → "OAuth" in run output

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- session.test.ts: 11 tests covering sessionStartCommand,
  sessionStopCommand, sessionHeartbeatCommand, and detectSquadCommand
  (start/stop/heartbeat lifecycle, quiet mode, missing .agents dir)
- learn.test.ts: 14 tests covering learnCommand, learnShowCommand,
  and learnSearchCommand (default squad, specific squad, fallback,
  category inference, tag extraction, search, filters)

Part of #47 — adds coverage for 2 more previously untested commands.

Co-Authored-By: cli/issue-solver <cli-issue-solver@agents-squads.com>

Agent: cli/issue-solver
Squad: cli

Co-authored-by: kokevidaurre <kokevidaurre@users.noreply.github.com>
Jorge Vidaurre and others added 23 commits March 13, 2026 16:14
Co-Authored-By: Claude <noreply@anthropic.com>
…elease

Also adds permissions: contents: read to ci.yml (CodeQL fix)

Co-Authored-By: Claude <noreply@anthropic.com>
* feat: upgrade squads-cli skill template with full CLI reference

Rewrites the squads-cli skill from 85-line command list to 329-line
comprehensive guide: context cascade, autopilot patterns, agent self-context
patterns, memory workflows, troubleshooting. Full command reference extracted
to references/commands.md. Init now copies references/ alongside SKILL.md.

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: substitute {{CURRENT_DATE}} template variable in state.md (#603)

Add CURRENT_DATE to the template variables map in initCommand() so
that state.md seed templates (research/lead, product/lead) get the
actual date instead of the raw placeholder.

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com>
Co-authored-by: Claude <noreply@anthropic.com>
* refactor: extract run-types.ts and run-utils.ts from run.ts (#447)

Step 1-2 of run.ts decomposition:
- run-types.ts: RunOptions, ExecutionContext interfaces + constants
- run-utils.ts: pure utility functions (generateExecutionId, selectMcpConfig,
  detectTaskType, resolveModel, ensureProjectTrusted, getProjectRoot,
  formatDuration, checkClaudeCliAvailable, getClaudeModelAlias)

Co-Authored-By: Claude <noreply@anthropic.com>

* refactor: extract execution-log.ts from run.ts (#447)

Step 3: Extract bridge/API helpers, execution logging, cooldown tracking,
and event emission into dedicated module.

Co-Authored-By: Claude <noreply@anthropic.com>

* refactor: extract cloud-dispatch.ts from run.ts (#447)

Step 4: Extract self-contained cloud worker dispatch and polling logic.

Co-Authored-By: Claude <noreply@anthropic.com>

* refactor: extract execution-engine.ts from run.ts (#447)

Step 5: Extract executeWithClaude, executeWithProvider, and all execution
helpers (worktree management, foreground/background/watch modes, auto-commit,
verification, preflight checks) into dedicated module. -769 lines from run.ts.

Co-Authored-By: Claude <noreply@anthropic.com>

* refactor: extract agent-runner.ts and run-modes.ts from run.ts (#447)

Steps 6-7: Extract runAgent() into agent-runner.ts and all squad execution
modes (autopilot, lead, squad loop, post-evaluation) into run-modes.ts.

run.ts: 2930 → 320 lines (89% reduction). Now contains only command
registration, runCommand() routing, and runSquad() dispatch.

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com>
Co-authored-by: Claude <noreply@anthropic.com>
- Add TOOL_USE_PROVIDERS constant (anthropic, google) to gate conversation/lead modes
- Route non-tool-use providers to new runSequentialMode() — runs agents one at a time
- Block --lead for non-tool-use providers with clear alternatives
- Add stdinPrompt to CLIConfig interface; Ollama pipes prompt via stdin
- Pass model option through to executeWithProvider for per-agent model selection
- Update dry-run preview to show sequential vs conversation mode

Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com>
Co-authored-by: Claude <noreply@anthropic.com>
…ion (#609)

* refactor: update context system — goals layer, remove SQUAD.md injection

Context loading changes:
- Removed L2 (SQUAD.md body injection) — SQUAD.md is now metadata
  only for CLI routing (repo, agents, config). Not injected into prompt.
- Split old L3 (priorities OR goals) into L2 (priorities.md) and
  L3 (goals.md) as separate layers loaded independently.
- Removed L7 (active-work.md) and L8 (briefs/) from context loading.
  These files still exist but no longer consume context budget.
- Renumbered: L6=feedback, L7=daily-briefing, L8=cross-squad learnings.

Role-based access updated:
- scanner: L1-L5 (company, priorities, goals, agent, state)
- worker/verifier: L1-L6 (+ feedback)
- lead/coo: L1-L8 (+ daily briefing + cross-squad)

Role resolution:
- Direct match for new schema (role: "lead" → lead, no scoring needed)
- Falls back to token scoring for legacy free-text roles

Agent prompt updated to reflect new layer names.

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: strip frontmatter from all context layers before injection

company.md, priorities.md, goals.md, state.md all have YAML
frontmatter for CLI metadata. LLMs don't need it — strip before
injecting into prompt. Saves ~80 tokens per run.

Also: DRYRUN_CONTEXT_MAX_CHARS now configurable via env var
SQUADS_DRYRUN_MAX_CHARS for debugging full context output.

Co-Authored-By: Claude <noreply@anthropic.com>

* feat: add IDP commands — catalog, scorecard, release pre-check

New commands:
- squads catalog list — show all services grouped by type
- squads catalog show <service> — detailed service view
- squads catalog check [service] — run scorecard checks (all or one)
- squads release pre-check <service> — validate dependencies before deploy

New lib modules:
- lib/idp/types.ts — TypeScript interfaces matching IDP YAML schema
- lib/idp/resolver.ts — find IDP directory (env var → co-located → sibling → absolute)
- lib/idp/catalog-loader.ts — parse YAML catalog entries via gray-matter
- lib/idp/scorecard-engine.ts — evaluate services against quality checks

Scorecard sources: local filesystem, gh CLI, git log. Graceful
degradation when gh is unavailable (shows "unknown" vs failing).

No new dependencies — YAML parsed via gray-matter's engine.

Co-Authored-By: Claude <noreply@anthropic.com>

* feat: add Docker fresh-user test — simulates new user first-run

Dockerfile.fresh-user: clean Node 22 container, npm install -g
squads-cli, empty git repo. No config, no .agents, nothing.

test-fresh-user.sh: 9-step automated test suite covering the
complete first-run flow (version, help, init, status, list,
catalog, doctor, unknown command).

Current results: 4/9 pass. squads init is broken (#610).

Usage:
  ./test/docker/test-fresh-user.sh --auto    # automated
  ./test/docker/test-fresh-user.sh           # interactive

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: add missing CatalogEntry type import in catalog.ts

Fixes TypeScript build error: TS2304 Cannot find name 'CatalogEntry'

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com>
Co-authored-by: Claude <noreply@anthropic.com>
* fix(init): squads init works in clean environment

- Missing provider CLIs treated as warning, not error — users can
  scaffold first and install providers later
- Restored `squads list` as alias for `squads status`
- Dockerfile builds from local source via npm pack
- Added .dockerignore for fast Docker builds

All 9 fresh-user Docker test steps pass. Closes #610.

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: update tests — missing provider is now warning, not error

Tests expected 'missing' status for uninstalled providers, but
the init fix changed this to 'warning' so users can scaffold first.

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com>
Co-authored-by: Claude <noreply@anthropic.com>
…, #614) (#616)

## Issue #613: IDP catalog creation
- `squads init` now creates `.agents/idp/catalog/<name>.yaml` with auto-detected service entry
- Service name derived from git remote (last segment) or directory name
- Type auto-detected: product (if package.json/go.mod/requirements.txt) or domain
- Stack detected: node/react/next/vue/astro, python, go, ruby, rust
- Build/test commands inferred from stack
- Branch workflow: pr-to-develop (product) or direct-to-main (domain)
- Existing `.agents/idp/` directory is never overwritten
- `squads catalog list` works immediately after init

## Issue #614: Structured frontmatter schemas
- All 18 agent seed templates updated with full frontmatter: squad, provider, trigger, cooldown, timeout, max_retries
- Agent body sections standardized to: Role, How You Work, Output, Constraints
- priorities.md and goals.md created for each squad (with squad/owner/review_by frontmatter)
- company.md created at .agents/memory/company/company.md (Layer 1 context)
- SYSTEM.md updated with version/scope/authority frontmatter

Closes #613
Closes #614

Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com>
Co-authored-by: Claude <noreply@anthropic.com>
#618)

Every squads run now logs to .agents/observability/executions.jsonl:
- Tokens (input, output, cache read, cache write)
- Cost in USD (calculated from model pricing table)
- Duration, status, squad, agent, model, trigger

Token capture: after each run, parses Claude Code's session JSONL
files (~/.claude/projects/) to extract actual usage data. No API
calls, no external dependencies.

New commands:
- squads obs history — execution history with tokens and cost
- squads obs cost — spend summary by squad, model, time period

Tested: research-scanner run captured 5,976 tokens, $0.503 on
claude-sonnet-4-6. Cost summary shows per-squad breakdown.

Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com>
Co-authored-by: Claude <noreply@anthropic.com>
#619 tier-detect.ts: detects Tier 1 (files) vs Tier 2 (local services)
by probing localhost:8090 (API) and localhost:8088 (bridge). Cached
per process. Foundation for tier-aware behavior.

#620 Silenced Tier 1 noise:
- registerContextWithBridge: skip when no bridge URL
- checkPreflightGates: skip when no bridge URL
- cognition fetch: skip when no API URL
- All silent — zero warnings for file-only users

#621 squads tier command: shows active tier, data sources (executions,
squads, memory, IDP), service health, upgrade instructions.

Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com>
Co-authored-by: Claude <noreply@anthropic.com>
…#631)

#624 repo-enforcement.ts: validates workspace before agent execution.
Checks: SQUAD.md repo exists as sibling, no nested .git in hq,
no .git in .agents/idp/. Blocks on errors, warns on mismatches.
Wired into agent-runner — runs before every squads run.

#622 services.ts: manages Tier 2 Docker lifecycle.
- squads services up: starts docker compose, waits for health, shows URLs
- squads services down: stops containers, falls back to Tier 1
- squads services status: shows containers, health, DB stats
Supports --webhooks and --telemetry profiles.

Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com>
Co-authored-by: Claude <noreply@anthropic.com>
…st (#623, #627) (#632)

* test: expand Docker fresh-user test to 13 steps

Added: tier, obs history, obs cost, services status, unknown command.
All 13 pass in clean Docker environment (Tier 1, no services).

Co-Authored-By: Claude <noreply@anthropic.com>

* feat: observability dual-write (#623) + context_from fix (#627) + tests

#623: Tier 2 dual-write — JSONL + API POST (fire-and-forget)
#627: context_from verbose warning for missing learnings
Tests: Docker fresh-user expanded to 13 steps, all pass

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com>
Co-authored-by: Claude <noreply@anthropic.com>
#628 SECURITY: replaced --dangerously-skip-permissions with scoped
--allowedTools. Agents get: Read, Write, Edit, Glob, Grep, Bash(git),
Bash(gh), Bash(npm), Bash(curl), Bash(docker), Bash(squads), Agent,
WebFetch, WebSearch. Opt-in bypass via SQUADS_SKIP_PERMISSIONS=1 for
sandboxed environments only.

#625: fetchTriggersFromApi() queries GET /triggers/pending when Tier 2
active. Falls back to local scoreSquads() when API unavailable.

13/13 Docker fresh-user tests pass.

Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com>
Co-authored-by: Claude <noreply@anthropic.com>
Reads all JSONL execution records and POSTs to API when Tier 2 is
active. Dedup by execution_id (409 = already exists, skipped).
Supports --dry-run to preview before sending.

Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com>
Co-authored-by: Claude <noreply@anthropic.com>
Runs the whole organization as a system:
1. SCAN: check all squads (priorities freshness, goals, frozen status)
2. PLAN: skip frozen, prioritize by goals + staleness
3. EXECUTE: run each lead sequentially
4. REPORT: org-level summary (completed, failed, duration)

Scans 19 squads, skips 5 frozen, plans 14 for execution.
Supports --dry-run to preview without running.

New files: src/lib/org-cycle.ts (scan, plan, display)
Modified: run.ts (--org mode), cli.ts (--org flag), run-types.ts

Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com>
Co-authored-by: Claude <noreply@anthropic.com>
…#636)

Every execution now captures:
- goals_before: goal name → status before the run
- goals_after: goal name → status after the run
- goals_changed: list of goals that moved (e.g., not-started → in-progress)

Org cycle (squads run --org) shows goal changes summary at the end.
obs history shows goal changes per execution record.

Also added grade/grade_score fields to ObservabilityRecord for
future COO eval integration.

This enables measuring: cost per goal progressed — the key metric
for self-improving autonomous operation.

Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com>
Co-authored-by: Claude <noreply@anthropic.com>
Bug 1: runAgent was called with (path, name) instead of (name, path).
Caused agent names to show as full file paths in observability.

Bug 2: when Claude hits quota limit ("hit your limit", exit code 1),
org cycle now detects it (two consecutive fast failures <10s) and
stops instead of trying all remaining squads.

Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com>
Co-authored-by: Claude <noreply@anthropic.com>
Replace placeholder tests (that only tested Node.js filesystem ops) with
real tests that call initCommand with mocked dependencies:

- Core directory structure (squads, memory, skills, config)
- Squad file creation from templates
- Memory state files (writeIfNew idempotency)
- Root-level files (AGENTS.md, BUSINESS_BRIEF.md, README.md)
- Claude provider setup (.claude dir, CLAUDE.md, settings.json)
- Non-Claude provider skips Claude-specific files
- Pack support (engineering, marketing, operations, all, dedup)
- IDP catalog creation + stack detection (Node/Go/Python/Rust/Ruby)
- Template variable passing (business name, provider, date)
- Auto-commit behavior
- Telemetry tracking (event, agent/squad counts)
- Prerequisite check flow (--force bypass, exit on failure)
- Error handling (EACCES, ENOENT)

Closes #577

Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Resolved conflicts in: package.json, cli.ts, init.ts, run.ts,
and 13 template seed files. Develop versions kept (ahead of main).

Co-Authored-By: Claude <noreply@anthropic.com>
The merge conflict resolution incorrectly used main's older templates.
Restored from pre-merge develop: templates/, cli.ts, init.ts, run.ts.

All 1779 tests pass.

Co-Authored-By: Claude <noreply@anthropic.com>
Restricts GITHUB_TOKEN to contents:read per CodeQL recommendation.
Workflow only needs npm registry access via NPM_TOKEN, not GitHub write.

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly expands the CLI's capabilities by introducing an Internal Developer Platform (IDP) for service cataloging and scorecard validation, local observability for execution and cost tracking, and a tiered infrastructure system. It also refactors the core execution logic into modular components and implements a layered context system for agents. The review feedback correctly identifies a bug in a GitHub API query within the scorecard engine and suggests refactoring duplicated logic in the git push automation.

Comment on lines +45 to +48
const out = exec(`gh api repos/${repo}/actions/runs?per_page=1&status=completed --jq '.[0].conclusion // empty'`);
// GitHub API returns runs array directly
const out2 = exec(`gh api repos/${repo}/actions/runs --jq '.workflow_runs[0].conclusion // empty'`);
const conclusion = out || out2;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The first exec call with jq '.[0].conclusion' appears to be incorrect, as the GitHub API for listing repository action runs returns an object with a workflow_runs array, not a direct array. The second call correctly queries .workflow_runs[0].conclusion. The first call is likely redundant and the accompanying comment is misleading. This can be simplified to a single API call.

Suggested change
const out = exec(`gh api repos/${repo}/actions/runs?per_page=1&status=completed --jq '.[0].conclusion // empty'`);
// GitHub API returns runs array directly
const out2 = exec(`gh api repos/${repo}/actions/runs --jq '.workflow_runs[0].conclusion // empty'`);
const conclusion = out || out2;
const conclusion = exec(`gh api repos/${repo}/actions/runs?per_page=1&status=completed --jq '.workflow_runs[0].conclusion // empty'`);

Comment on lines +127 to +137
if (repo && /^[\w.-]+\/[\w.-]+$/.test(repo)) {
const pushUrl = await getBotPushUrl(repo);
if (pushUrl) {
// Use spawnSync with args array to avoid shell injection
spawnSync('git', ['push', pushUrl, 'HEAD'], { ...execOpts, stdio: 'pipe' });
} else {
spawnSync('git', ['push', 'origin', 'HEAD'], { ...execOpts, stdio: 'pipe' });
}
} else {
spawnSync('git', ['push', 'origin', 'HEAD'], { ...execOpts, stdio: 'pipe' });
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The else blocks on lines 132-134 and 135-137 are identical, leading to code duplication. This logic can be simplified by determining the push target first and then making a single spawnSync call. This improves readability and maintainability.

      let pushTarget = 'origin';
      // Validate repo format (org/name) to prevent injection
      if (repo && /^[\w.-]+\/[\w.-]+$/.test(repo)) {
        const pushUrl = await getBotPushUrl(repo);
        if (pushUrl) {
          pushTarget = pushUrl;
        }
      }
      spawnSync('git', ['push', pushTarget, 'HEAD'], { ...execOpts, stdio: 'pipe' });

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant