Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
9ffc3ec
feat(providers): add Anthropic Claude (OAuth) to Subscription/OAuth P…
Avi-Bendetsky Jun 7, 2026
213323b
Merge pull request #1 from Avi-Bendetsky/feat/anthropic-oauth-card
Avi-Bendetsky Jun 7, 2026
f4f9af6
fix(providers): thread selected profile through Anthropic OAuth persist
Avi-Bendetsky Jun 7, 2026
ec33e40
Merge pull request #2 from BAS-More/feat/anthropic-oauth-card
Avi-Bendetsky Jun 7, 2026
5aae6fc
Merge branch 'fathah:main' into main
Avi-Bendetsky Jun 8, 2026
2742efe
feat(providers): multi-account OAuth UX + credential-pool status badges
Avi-Bendetsky Jun 11, 2026
bd666ad
fix(gateway): trust local TLS-intercepting proxy CA on gateway start
Avi-Bendetsky Jun 11, 2026
c560ad5
feat(factory): Factory tab — governance/budget/orchestration/activity…
Avi-Bendetsky Jun 14, 2026
3d7d615
feat(factory): Factory tab v2 — editable, monitored, guard-railed
Avi-Bendetsky Jun 14, 2026
a8774df
feat(factory): per-agent model picker + editable Orchestration
Avi-Bendetsky Jun 14, 2026
bc7d0cd
feat(factory): orchestrator closed-loop UI — Builds oversight pane + …
Avi-Bendetsky Jun 14, 2026
4fcf2e2
feat(factory): click-through from Builds pane focuses the task in Kanban
Avi-Bendetsky Jun 14, 2026
c5eb035
Merge remote-tracking branch 'origin/main' into feat/factory-tab
Avi-Bendetsky Jun 14, 2026
bd807fe
Merge pull request #3 from BAS-More/feat/factory-tab
Avi-Bendetsky Jun 14, 2026
84071ad
feat(chat): in-chat Factory toggle + live Factory panel
Avi-Bendetsky Jun 15, 2026
65b82c3
Merge pull request #4 from BAS-More/feat/chat-factory-panel
Avi-Bendetsky Jun 15, 2026
7c81bcf
Merge tag 'v0.6.1' into chore/merge-upstream-0.6.1
Avi-Bendetsky Jun 15, 2026
5f998d7
chore(release): bump to 0.6.2 + repoint auto-update feed to the fork
Avi-Bendetsky Jun 15, 2026
71d4b22
chore: sync package-lock version to 0.6.2
Avi-Bendetsky Jun 15, 2026
413b574
ci: add workflow_dispatch job to build unsigned Mac DMG on macOS runners
Avi-Bendetsky Jun 15, 2026
cf55017
test: stabilize config-health CI fixture
Copilot Jun 15, 2026
00c3d9c
Merge pull request #5 from BAS-More/chore/merge-upstream-0.6.1
Avi-Bendetsky Jun 15, 2026
1b129c5
feat(factory+sessions): connect Factory to Office bots, panel clickth…
Avi-Bendetsky Jun 15, 2026
ab30f6e
Merge pull request #6 from BAS-More/feat/factory-office-bots-and-sess…
Avi-Bendetsky Jun 15, 2026
f94caf5
chore: bump version to 0.6.3
Avi-Bendetsky Jun 15, 2026
b300d83
fix(sessions): apply 3-agent UI/UX audit — tokens, a11y, undo, contrast
Avi-Bendetsky Jun 15, 2026
c7d881a
Merge pull request #7 from BAS-More/feat/sessions-audit-fixes
Avi-Bendetsky Jun 15, 2026
c9530bf
fix(sessions): Tier-3 a11y structural — focus-managed menu, modal tra…
Avi-Bendetsky Jun 15, 2026
6f6d3ec
Merge pull request #8 from BAS-More/feat/sessions-a11y-structural
Avi-Bendetsky Jun 15, 2026
ae47173
chore: bump version to 0.6.5
Avi-Bendetsky Jun 15, 2026
91b691d
feat(council): LLM Council — composer convene button + Models tab (PA…
Avi-Bendetsky Jun 18, 2026
219671d
test(soul): cover persona editor — load, per-agent read, debounced au…
Avi-Bendetsky Jun 18, 2026
c41b88e
Merge pull request #10 from BAS-More/feat/agent-detail-persona-skills…
Avi-Bendetsky Jun 18, 2026
e3a853e
fix(secrets): resolve POSIX shell cross-platform; fix config-health W…
Avi-Bendetsky Jun 18, 2026
dce61ce
feat(agent-detail): per-agent persona/skills/tools panel + coverage +…
Avi-Bendetsky Jun 18, 2026
b9e00e0
salvage(WIP): Claude Code Bridge status pill + CSP allowlist — DO NOT…
Avi-Bendetsky Jun 18, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
99 changes: 99 additions & 0 deletions .github/workflows/mac-dmg.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
name: Mac DMG (unsigned, on-demand)

# One-shot, manually-triggered build of an UNSIGNED macOS .dmg on GitHub's
# hosted macOS runners. Unlike release.yml's release_mac job, this needs NO
# Apple secrets (no notarization, no code-signing) and publishes nothing — it
# just uploads the .dmg as a workflow artifact for download.
#
# Gatekeeper note: because the result is unsigned + un-notarized, on first launch
# macOS will block it. The user opens it once via right-click -> Open, or runs:
# xattr -dr com.apple.quarantine "/Applications/Hermes One.app"

on:
workflow_dispatch:
inputs:
arch:
description: "Target architecture"
type: choice
options:
- both
- arm64
- x64
default: both

jobs:
build:
name: Build macOS DMG (${{ matrix.arch }})
runs-on: macos-latest
strategy:
fail-fast: false
matrix:
# Build both by default; the if-guard skips the arch the user didn't pick.
arch: [arm64, x64]
steps:
- name: Skip unselected arch
id: gate
run: |
want='${{ github.event.inputs.arch }}'
if [ "$want" = "both" ] || [ "$want" = "${{ matrix.arch }}" ]; then
echo "run=true" >> "$GITHUB_OUTPUT"
else
echo "run=false" >> "$GITHUB_OUTPUT"
echo "Skipping ${{ matrix.arch }} (user selected $want)"
fi

- name: Check out repository
if: steps.gate.outputs.run == 'true'
uses: actions/checkout@v4

- name: Set up Node.js
if: steps.gate.outputs.run == 'true'
uses: actions/setup-node@v4
with:
node-version: 22
cache: npm

- name: Install dependencies
if: steps.gate.outputs.run == 'true'
run: npm ci

- name: Rebuild native dependencies for target arch
if: steps.gate.outputs.run == 'true'
run: npx electron-builder install-app-deps --arch=${{ matrix.arch }}

- name: Build app
if: steps.gate.outputs.run == 'true'
run: npm run build

- name: Package UNSIGNED macOS DMG
if: steps.gate.outputs.run == 'true'
env:
# Force-disable code discovery so electron-builder does not try to sign.
CSC_IDENTITY_AUTO_DISCOVERY: "false"
run: >-
npx electron-builder --mac dmg --${{ matrix.arch }}
--publish never
-c.mac.notarize=false
-c.mac.identity=null

- name: Verify native module architecture
if: steps.gate.outputs.run == 'true'
run: |
set -euo pipefail
NODE_FILE="dist/mac-${{ matrix.arch }}/Hermes One.app/Contents/Resources/app.asar.unpacked/node_modules/better-sqlite3/build/Release/better_sqlite3.node"
if [ ! -f "$NODE_FILE" ]; then
NODE_FILE="dist/mac/Hermes One.app/Contents/Resources/app.asar.unpacked/node_modules/better-sqlite3/build/Release/better_sqlite3.node"
fi
file "$NODE_FILE"
case "${{ matrix.arch }}" in
x64) file "$NODE_FILE" | grep -q "x86_64" ;;
arm64) file "$NODE_FILE" | grep -q "arm64" ;;
esac

- name: Upload DMG artifact
if: steps.gate.outputs.run == 'true'
uses: actions/upload-artifact@v4
with:
name: hermes-mac-${{ matrix.arch }}-dmg
path: dist/*.dmg
if-no-files-found: error
4 changes: 2 additions & 2 deletions dev-app-update.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
provider: github
owner: fathah
repo: hermes-desktop
owner: BAS-More
repo: hermes-desktop-Working-
updaterCacheDirName: hermes-desktop-updater
175 changes: 175 additions & 0 deletions docs/orchestrator-loop-PRD.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
# Orchestrator Closed-Loop — PRD

Status: DRAFT FOR APPROVAL
Engine: BAS-More/hermes-agent fork · Desktop: feat/factory-tab
Author: Avi + Claude (Opus 4.8), 2026-06-14

## The vision (Avi's words)

> The orchestrator is given the results he must achieve and the guidelines (code
> quality, security, etc.). He picks the team of agents/sub-agents for the task,
> oversees what they're doing, and keeps them running in the react loop to the
> successful finish.

## What exists vs. the gap

| Duty | Status | Note |
|---|---|---|
| Given guidelines (quality/security) | ✅ done | the governor + secret scanner + avi-os-gates/tdad skills |
| Picks the team | ✅ done | the decomposer routes each child to the best profile |
| Given results to achieve | ◑ partial | a card body is the goal; no structured acceptance criteria |
| **Oversees in-flight** | ❌ gap | fans out ONCE, then sleeps until all children finish |
| **React loop to finish** | ❌ gap | when children finish the root just auto-promotes; nobody verifies the ASSEMBLED result vs the goal and spawns corrective work |

Today = **fan-out-once-then-assemble**. Target = **plan → verify → re-plan → loop-until-done**. The difference is the closed loop.

## Architecture decision (grounded)

Avi chose "engine dispatcher, autonomous, headless." The honest synthesis: the
**dispatcher (Python) owns the loop control-flow** (detect state, spawn, enforce
bounds), and a **spawned orchestrator worker (LLM) does the judgment** — the
dispatcher cannot itself judge "does this meet the goal?". This reuses the proven
review-worker machinery (`status='review'`, `claim_review_task`, the review
dispatch block at kanban_db.py:6439) — the orchestrator-verify is essentially a
review worker for the build ROOT that can re-open the build.

## The closed loop (new behavior)

```
triage card (goal + guidelines)
│ decompose (EXISTING) — orchestrator picks team, seeds .ezra governance
children run (EXISTING) — per-task goal-mode workers, governed
│ all children done
ROOT → 'review' (NEW: instead of auto-promote to ready/done)
│ dispatcher spawns the ORCHESTRATOR profile as a build-verify worker
orchestrator-verify worker (NEW skill: orchestrator-verify)
reads: the build goal + acceptance criteria + each child's result/artifacts
judges: does the ASSEMBLED result meet the goal + guidelines?
├─ PASS → complete the root (build done) ✅
└─ FAIL → record the gap + create N corrective child tasks under the root
→ root back to 'todo' (waits on the new children)
corrective children run → root → 'review' again → re-verify (THE LOOP)
bounded by: build iteration ceiling + per-block retry cap + a NEW
max_verify_rounds; on exhaustion → park root 'blocked' for human.
```

## Engine work

### E1. Acceptance criteria as a first-class build record
- At decompose, the orchestrator records the build's **acceptance criteria**
("done when…") into the build's `.ezra/` (new `acceptance.yaml` or a field in
governance.yaml) + the root task metadata. Source: extracted from the card
body by the decomposer LLM (it already produces structured JSON — add an
`acceptance` field to its output schema).
- Read by the verify worker; surfaced in `govern --json`.

### E2. Root → review instead of auto-promote
- `recompute_ready` / the child-completion path: when ALL children of a root are
done AND the root is a build-root (has an acceptance record / a new
`is_build_root` flag), transition the root to `review` + assign the
orchestrator profile, instead of promoting to `ready`/auto-done.
- Gate this on a config flag `kanban.orchestrator_loop: true` (default OFF first
— this changes core dispatch; opt-in until proven) so existing builds are
unaffected.

### E3. orchestrator-verify skill + worker
- A new skill `orchestrator-verify` (HOME skills/, like sdlc-review): instructs
the worker to read the goal + acceptance + child results, judge PASS/FAIL with
reasons, and on FAIL emit a structured list of corrective child tasks.
- The review dispatch already force-loads `sdlc-review` for review workers; add
branch: if the review task is a BUILD ROOT, load `orchestrator-verify` instead.

### E4. Re-decompose on FAIL (the corrective spawn)
- A helper `reopen_build_with_children(root_id, children, reason)`: creates the
corrective child tasks under the existing root, links them, transitions the
root `review → todo` (waits on new children), records the verify verdict +
reason as a root event. Reuses `decompose_triage_task`'s child-insert logic
(factor out the shared insert).

### E5. Loop bounds (the runaway guards — non-negotiable)
- `max_verify_rounds` (default 3): root metadata counter; each FAIL→reopen
increments. On exhaustion the root parks `blocked` ("verify loop exhausted —
human review") rather than looping forever.
- The EXISTING budget breaker (wallclock + iteration ceiling) still bounds the
whole subtree — the corrective children count as iterations.
- The EXISTING per-block retry cap still applies per worker.
- So three independent ceilings bound the loop; it cannot run away (critical
given the ~93% weekly quota reality).

### E6. govern --json surfacing
- Build status: per-build goal, acceptance criteria, current verify round,
last verdict + reason, loop state (running / verifying / corrective / done /
parked). Feeds the Factory tab.

## Desktop work (after engine proven)
- Factory tab: a "Builds" view or section showing each active build — goal,
acceptance criteria, which agents are on it, verify round N/max, last verdict,
live state. This is the "oversees what they're doing" pane.
- (Optional) a build detail with the corrective-task history (the loop made
visible).

## Verification plan
- Unit: acceptance record write/read; root→review transition gated by flag;
reopen_build_with_children; max_verify_rounds parks at the cap.
- Live proof: a real build whose first attempt is deliberately incomplete →
orchestrator-verify FAILS → spawns a corrective task → it runs → re-verify
PASSES → root done. Then a build that can never pass → confirm it parks at
max_verify_rounds (loop doesn't run away).
- Regression: with `orchestrator_loop: false` (default), existing fan-out
behavior is byte-identical (no regression for current builds).

## Safety + rollout
- `orchestrator_loop` defaults OFF. Turn ON for one board / one build to prove,
then default-on once trusted.
- Fork-durable (engine), restore-guard markers, all the usual.
- This touches CORE dispatch (root completion) — the flag-gate + the
byte-identical-when-off regression are mandatory before it ships on.

## LOCKED DECISIONS (Avi, 2026-06-14)
1. **Acceptance criteria: AUTO-extract from the card** (decomposer LLM derives the
"done when…"; editable in UI later).
2. **max_verify_rounds = 3.**
3. **On FAIL the orchestrator COMMANDS DIRECTIVE adjustments** — corrective tasks
are specific ("builder did X wrong; do Y to meet criterion Z"), getting more
pointed each round, guiding the builder to success. NOT vague re-decompose.
4. **On exhaustion (3 rounds): ESCALATE to human with full diagnosis + the
specific recommended fix, and park.** Not silent give-up, not ship-best-effort.
The orchestrator guides autonomously within the 3-round envelope; the ceiling
exists only because unbounded looping risks the ~93% weekly quota.
5. **Rollout: flag OFF by default** (`kanban.orchestrator_loop`), prove on one
build, then consider default-on. Byte-identical-when-off regression mandatory.

Design consequence of #3: E3 (orchestrator-verify) must output, on FAIL, a
structured PER-CRITERION gap analysis + directive correction per gap — the
corrective task bodies are those directives. E4 carries the orchestrator's
reason/diagnosis into each corrective task so the builder gets specific guidance,
not a re-statement of the original goal.

## FOLDED FROM "Loop Engineering" research (Avi, 2026-06-14)

The datasciencedojo loop-engineering guide names guardrails we were missing.
Both folded into the build:

6. **No-progress detection (REQUIRED guardrail).** The verify worker records a
fingerprint of each round's assembled result. If round N's fingerprint equals
round N-1's, the corrective work produced NO change ("silent failure" /
"insistent failure" — the article's hardest-to-catch mode). Don't burn the
remaining rounds: escalate to human immediately with the diagnosis. Lives in
the verify-result handler (E5), alongside the round cap.
7. **Deterministic-first verification.** The article: verification must be
deterministic (tests/type-check) OR a separate evaluator — never agent
self-assessment alone. Our verify worker IS the separate evaluator (good), but
the orchestrator-verify skill now instructs: for any criterion checkable by a
command (tests pass, build green, lint/type clean), RUN it and judge on the
real exit/result; reserve LLM judgment for genuinely subjective criteria. A
criterion marked PASS on a deterministic check is more trustworthy than
"looks done".

These cost almost nothing relative to the loop and close the two failure modes
the article stresses most (silent failure + weak verification). The three
runaway ceilings (max_verify_rounds + budget breaker + retry cap) stay; the
no-progress detector is a FOURTH, earlier guard.
4 changes: 2 additions & 2 deletions electron-builder.yml
Original file line number Diff line number Diff line change
Expand Up @@ -69,5 +69,5 @@ rpm:
npmRebuild: false
publish:
provider: github
owner: fathah
repo: hermes-desktop
owner: BAS-More
repo: hermes-desktop-Working-
Loading