Releases · harnessclaw/harnessclaw-engine

26 May 12:53

github-actions

v0.0.15

5d74d28

v0.0.15 Latest

Latest

Added

L2 scheduler kernel (v3.1) under internal/engine/scheduler/: top-level Scheduler.New + Submit + Start wires bus, kernel, dispatch strategies, and runtime handlers (onSpawn / onResult / onLifecycle / onTerminal / onExpire / onCancellingDrained / onCompletedFromStaging) into a single message-driven L2
8-state task state machine in scheduler/tstate: Kernel with RollbackAdmit + epoch guards, Reader / Writer / StagingWriter interfaces (R2/R4/R10), in-memory + SQLite Store implementations sharing the sessions *sql.DB, named-field UpdateField CAS path
Dispatch strategies in scheduler/dispatch: react.Strategy spawns one leaf via SpawnAndWaitOne (R1/R6 Subscribe-before-Publish); plan.Strategy runs the planner-agent + plan-executor-agent two-phase LLM pipeline, with PlanJudge rule-tier validation and FallbackAggregator graceful degradation
Routing layer in scheduler/router: HeuristicKindSelector (react vs plan, mode-select heuristics) + HeuristicAgentResolver (keyword-scored sub-agent selection); both pluggable via Coordinator config so the project can swap in LLM-based selectors later
scheduler/runtime/host: StartStrategyHost forks G1 + G2, stages refs, publishes lifecycle, runs a 3-pass reaper scan (lease / deadline / cancelling) with notify-on-expiry
Msgbus (internal/msgbus): in-process Bus with Publish / Subscribe / SubscribeOnce / Dequeue / Ack / Nack, six message kinds (lifecycle / control / agent.msg / notify / task / result), six typed payloads, in-memory + SQLite Store implementations with reaper requeue and per-kind typed-struct revival
L1 (internal/engine/emma) and L2 (internal/engine/scheduler) now sit in dedicated subpackages; new public accessors on QueryEngine — ApplyMainAgentConfig / Config / PromptProfile — let emma cross the package boundary without touching private fields
L2 dispatch tool scheduler: emma calls scheduler(task) to enter the L2 layer; coordinator-mode (react / plan) flows via tool.WithCoordinatorMode on the parent context, never via emma's input schema (ops-only knob, D-mode auto-escalation stays internal)
plan_read tool: read-only access to plan.json for plan-executor-agent; TDD-built with full coverage
Plan-mode profiles + agent definitions: plan-agent writes the task breakdown to plan.json, plan-executor-agent reads it back, dispatches step tasks via freelance, updates status; profiles include principles tailored to write/read responsibility split
SubagentType field on spec.TaskSpec: explicit agent pinning for plan-mode steps so the planner can request specific roles instead of relying on resolver heuristics
EscalationInfo on TaskSpec + transient failure reason constants (timeout / rate_limit / overloaded / network) with Retryable accessor for the react → plan D-mode escalation context
cmd/test/ directory consolidates all local-only e2e probes (ask_e2e / emit_e2e / l3_e2e / metrics_e2e / plan_e2e / sched_e2e / subagent_e2e / tool_catalog); the entire subtree is gitignored — only cmd/server/ ships in version control

Changed

L1 → L2 → L3 layering becomes visible in the directory tree: internal/engine/ now exposes emma/ (L1), scheduler/ (L2 kernel + dispatch + coordinator), worker/ (L3 placeholder), loop/ (cross-cutting query-loop helpers — SkillTracker / retry context / drain channel)
L2 dispatch tool renamed from task to freelance (clearer intent: emma delegates a single piece of work to an L3 freelancer; multi-step orchestration uses scheduler instead)
Specialists tool renamed to scheduler with lowercase tool names project-wide (LLM-facing identifiers normalised)
Per-role principles split into internal/engine/prompt/principles/{emma,specialists,worker,explorer,planner,plan-agent,plan-executor-agent}; per-role packages replace the previous monolith so adding a new role no longer touches unrelated principle text
SchedulerCoordinator renamed to scheduler.Coordinator after moving into the scheduler subpackage (the Scheduler prefix was redundant inside the package)
internal/engine/orchestrate moved to internal/engine/scheduler/legacy/ and marked DEPRECATED — Phase-1 plan executor still backs the Orchestrate tool until parity with the new plan strategy is reached
Coordinator-tier SpawnSync now routes through Coordinator.Run (traffic cutover): emma's scheduler tool hits the new scheduler internals by default; the engine wiring deletes the obsolete coordinator_*.go files (Phase 3 migration complete)
QueryEngine exposes only the three accessors the L1 wrapper needs; MainAgentProfile / MainAgentDisplayName / MainAgentAllowedTools / MainAgentMaxTurns are applied through ApplyMainAgentConfig instead of direct field assignment
L2 / L3 principles now instruct sub-agents to use scheduler-provided workspace tools (Promote / ArtifactWrite) instead of Bash mkdir / mv / cp; D13 path-scope boundary documented in the Bash tool description
cmd/ reorganised to cmd/server/ (production) + cmd/test/ (local probes); .gitignore collapsed to a single cmd/test/ rule with no exceptions

Fixed

Plan-mode E2E: tool stripping and session_id propagation for plan-executor-agent so spawned step tasks see the parent session and the executor's tool palette is correctly narrowed
metaRefToLoopResult reads meta.json to surface the L3 summary back to the L1 caller — previously the summary was lost when the engine routed through Coordinator.Run instead of the legacy direct path
scheduler/tstate.RenewLease runs InTx, surfaces cancel cascade errors, validates epoch, and references the lease column via a named field constant (no more silently dropped lease updates)
msgbus/store SQLite revives Payload as the concrete typed struct per Kind so queue consumers can rely on type assertions instead of map-fishing through any
metatool derives task_id + agent from context and stat-fills output bytes; emit/v2 persists parent links across Close for ancestor heartbeat continuity
server/bifrost resolves provider quirks by YAML key first (matches user-visible config) instead of by manifest name
translator clears toolNames + toolsFromPlanning on ToolEnd so subsequent cards do not inherit stale state from a previous tool's lifecycle

Removed

internal/engine/coordinator_*.go files (Phase 3 migration): coordinator_judge.go / coordinator_fallback.go / coordinator_subagent_resolver.go / coordinator_mode_select.go / coordinator_scheduler.go etc. — their responsibilities are now owned by scheduler/dispatch/{plan,react} (judge + fallback aggregator) and scheduler/router (kind selector + agent resolver)
cmd/test/ content is no longer tracked in git: tool_catalog/main.go and metrics_e2e/main.go reverted to local-only status (previously partially tracked); local copies preserved on disk for ad-hoc runs

Assets 8

19 May 13:28

github-actions

v0.0.14

9c5d73b

v0.0.14

Added

Multimodal user input (image + PDF) end-to-end (internal/engine/multimodal): typed IncomingContentBlock parser + size-capped builder (per-block / per-message base64 limits, see MaxBase64BlockBytes / MaxTotalBytesPerMessage), capability Gate enforced at the router before engine dispatch, deterministic UnsupportedModalityError with user-facing message + rejected-modality list, redactor for log-safe payload previews
Per-endpoint capability override: endpoint.model_type yaml field (vision / pdf / audio / video / reasoning / tools / search) overrides the manifest baseline SupportsFlags; unknown tokens warn-and-drop at startup, rejected with 400 on PATCH /api/v1/providers/{p}/endpoints/{e} (*[]string semantics: omitted = leave alone, [] = clear override and revert to manifest)
Fallback-chain capability intersection: Manager.ChainSupports AND-intersects SupportsFlags across primary + every fallback entry so the multimodal gate rejects inputs that would fail mid-chain on fail-over (correctness over availability — "switch model" upfront beats an opaque 400 from a fallback hop)
GET /api/v1/agent/capabilities endpoint: serves the resolved active-model SupportsFlags + derived capability buckets (multimodal / tools / reasoning / search) using the same bridge the gate uses, so the client can never disagree with the server about what's allowed
capabilities array on /api/v1/models responses: collapsed bucket list derived from SupportsFlags for ergonomic UI chip rendering (granular flags remain authoritative for per-feature gating)
Bifrost adapter image / file block conversion: typed ContentTypeImage / ContentTypeFile blocks rendered as Anthropic / OpenAI Vision payloads via data: URL synthesis from base64 + media_type, or pass-through for remote URLs; Anthropic ephemeral cache_control breakpoints added to image / PDF blocks with post-conversion capImageCacheBreakpoints clamp to the 4-breakpoint per-request limit (oldest dropped first)
L3 freelancer execution mode: skill lifecycle tools (loadskill / unloadskill / searchskill / listloadedskills) with per-session skill_tracker context propagation, skill_block engine event surfacing loaded-skills state to UI, freelancer hydration of agent definition + prompt sections (skills section + principles text)
Artifact blob store: filesystem half at ~/.harnessclaw/artifact-blobs backing large binary artifacts alongside sqlite metadata; new /api/v1/artifacts HTTP handler for client download
ArtifactWrite source_path allow-list: pinned reads to ~/.harnessclaw/workspace + configured skills.dirs (no arbitrary filesystem ingest by the LLM)
Sub-agent token attribution: sessionstats distinguishes immediate parent vs root session buckets so multi-agent dispatches roll up to the right conversation; L3+ sub-agents dual-write LLM and tool stats to the root session tracker
Tools management API (/api/v1/tools): GET (list + per-tool config) + PATCH (per-tool config hot-swap with yaml rollback) backed by tool.Registry.Replace for atomic adapter swap; config/persist.SetToolConfig writes tools.<name>.* yaml blocks while preserving comments / key order
CardSystem card kind in emit/v2 (icon=info, role=system, untracked) + SystemPayload.topic field for framework-level notices independent of card lifecycle; EngineEventSystemNotice event + SystemNotice payload type
SearchGapDetector (internal/engine/capability_gap_detector.go): detects when a user asks for web search but no search tool is enabled, emits a session-deduped card_kind=system notice pointing at the settings page; wired into QueryEngine and sub-agent spawn step 6
iFlyTek Spark v2/search Bearer-token API: WebSearch tool migrated to the new Spark search v2 endpoint with Bearer auth
error.unsupported_modality error type + structured details map on ErrorInfo so the wire format can carry model / rejected_modalities / user_message / error_code to the client in a single error frame
WebSocket per-frame read limit raised to 32 MB to accommodate multimodal user.message frames; wire-layer size-cap check rejects oversized payloads with payload_too_large before they reach the engine

Changed

pkg/types.ContentBlock extended with multimodal fields (MediaType / Data / URL / Path / Filename / Size); zero-valued + omitempty keeps text-only payloads byte-identical on the wire
router.New signature now takes a ModelInfoProvider for capability gating; nil disables gating (used by older tests); production wires a bridge from provider.Manager.ActiveModelKey + registry LookupModel + endpoint-override merge
Emma principles tightened: long user requirements must be passed verbatim to Specialists — task field can't shrink N-item lists to one item, can't add reverse constraints the user didn't state, can't pre-split into "first read, then implement" steps
internal/artifact/sqlite_store applies the same pragmas as the sessions store (busy_timeout=5000 / journal_mode=WAL / synchronous=NORMAL) so concurrent sub-agent writes no longer hit SQLITE_BUSY
llm.call.stream_stuck log line downgraded from WARN to INFO (upstream slowness is not a server-side defect; the retry budget handles it on its own — WARN was inflating monitoring alerts)
agent.Message.ParentMessages semantics documented: text-only by design; if extended to carry typed content blocks, callers MUST re-run multimodal.Gate against the sub-agent's resolved model (else a text-only sub-agent silently receives image data and fails at the provider)
Bifrost adapter convertMessages applies capImageCacheBreakpoints post-conversion clamp (per-block cache_control is added eagerly; the global cap is enforced once at the end)

Fixed

L2 sub-agent loop swallowed EngineEventSystemNotice: the L2 forward-switch whitelist in subagent.go was missing the type, so any system notice (e.g. SearchGapDetector firing) emitted under an L2 dispatcher never reached the WS translator — added to the pass-through case list
Team sub-agents lost web search when TavilySearch was disabled: AllowedTools now lists WebSearch as a fallback so the agent doesn't silently lose the capability when credentials are missing for the primary search provider
/api/v1/tools PATCH could race on concurrent updates: mutex now guards cfg mutation; empty-cfgPath errors surface to the caller instead of silently no-op'ing; rollback failures get logged

Removed

Assets 8

15 May 15:33

github-actions

v0.0.13

1c3f636

v0.0.13

Added

Multi-provider failover dispatcher (internal/provider/failover): four-tier RetryPolicy (Probe 5s / Fast 15s / Medium 30s / Full) plus three-state per-provider health (healthy / tripped / ready_to_probe) with exponential cooldown backoff; classifies which errors cross provider boundaries and which stay local (prompt_too_long / max_output_tokens / ctx.Canceled never trip the provider)
Hot-swappable provider manager (internal/provider/manager): atomic.Pointer wraps the live Failover dispatcher so chain mutations don't disturb in-flight calls; adapter cache reuses bifrost adapters across chain-only mutations
Top-level agent: yaml block replacing llm.fallback_chain: primary (single dotted ref "provider:endpoint"), fallback_chain (ordered backups), max_tokens / temperature (adapter-baked defaults, unified [0,1] temperature scaled per provider type — anthropic ×1, openai/gemini ×2), context_window, max_turns (moved from engine.max_turns), max_tool_calls (0 = unlimited), thinking_intensity (low / medium / high / blank)
endpoint.context_window field per endpoint declares the model's intrinsic context limit; Manager.EffectiveContextWindow() returns min(agent.context_window, primary_endpoint.context_window) with 200_000 fallback; surfaced as effective_context_window in GET /api/v1/agent and the engine initialized startup log
GET / PATCH /api/v1/agent endpoint (replaces /api/v1/fallback-chain): partial-updates any subset of agent fields, validates max_turns ≥ 1, max_tool_calls ≥ 0, thinking_intensity enum
POST /api/v1/providers runtime API for creating new provider entries without restart; gemini added to type whitelist alongside openai / anthropic
Comment-preserving yaml persistence (internal/config/persist): yaml.Node-based mutator rewrites only changed keys, preserving inline comments / key order / hand-tuned indentation across PATCH-driven persistence cycles
Per-call token usage on llm.call ok log: input_tokens / output_tokens / cache_read / cache_write / thinking_tokens now at INFO level (was DEBUG-only via bifrost stream MessageEnd)
Bifrost stream lifecycle DEBUG logs: llm.call.dial (before SDK call) / llm.call.connected (after stream returned) / llm.call.stream_closed (when wrapper goroutine exits) — a hang can now be located between dial / connected / streaming / closing
Stream-stuck WARN watchdog: doSingleLLMCall emits llm.call.stream_stuck every 30 s of no new chunk; observability only — firstByteTimeout / apiTimeout still own hard cancellation
Provider endpoint disabled field (cascading with provider.disabled): effective disabled = provider.Disabled OR endpoint.Disabled; auto-removed from chain on PATCH disabled:true, chain auto-fills with first enabled endpoint when empty

Changed

agent.max_tokens / agent.temperature baked into each bifrost adapter as defaults; ChatRequest.Temperature/MaxTokens == 0 falls back to these. PATCH /api/v1/agent on these fields invalidates the adapter cache so subsequent calls pick up the new defaults
ChatRequest.ContextWindow field added as observability hint (not sent to vendor); stats classify.limit now reads it instead of req.MaxTokens — dashboard ContextWindow.Limit correctly reflects configured budget instead of response cap
Manager.AdapterBuilder signature now takes agent config.AgentConfig so adapters can resolve effective temperature / max_tokens at build time
emmaPrinciples trimmed from 8142 → 3824 chars (−53%): removed redundant repetition, anti-pattern lists rewritten as positive instructions, reordered for LLM head-attention bias; TestPrinciplesSection_* invariants kept intact
Manager AgentSnapshot payload now includes effective_context_window so operators can compare configured vs. capped value
Provider type field is now required (was optional with implicit default); empty / unknown types dropped at startup with WARN, not FATAL — server stays bootable with valid providers when one entry is malformed
Empty fallback chain and bad yaml entries now warn-and-drop instead of FATAL: server enters degraded mode (Chat returns ErrNoEndpoint, management API stays mountable) until operator PATCHes a valid agent config

Fixed

GET /api/v1/sessions/{id}/metrics returned empty sub_agents: executor.go never injected ToolUseContext before t.Execute, so Specialists read parent_session_id="", short-circuited StartSubAgent, and bypassed stats_provider entirely — Specialists + all L3 sub-agent token spend was invisible. Fix injects ToolUseContext (SessionID / ToolCallID / ToolName / ToolInput) at the executor boundary
Task tool rejected legitimate sub-agent types (writer / researcher / analyst / developer / lifestyle / scheduler): hardcoded 3-entry whitelist in agenttool/input.go contradicted the tool description, costing one wasted LLM round-trip per Specialists dispatch (~6.5k token / 6 s). Validate now defers name resolution to the defRegistry at spawn time
Orchestration tool cards (Specialists / Task) false-positive orphan_timeout on the EngineEventToolCall path: only EngineEventToolStart had the isOrchestrationTool check appending WithoutLifecycle(); client-side tool calls were still subject to the 120 s CardTool watchdog, causing UI to show "工具失败" while the underlying multi-minute plan was healthy. Both paths now opt out
QueryEngineConfig.MaxTokens was overloaded as both "response cap" and "context budget" — compactor was misclassifying agent.max_tokens (e.g. 2048) as the context window, triggering auto-compact at trivial token counts. Split into distinct MaxTokens (response cap) and ContextWindow (compactor budget) fields

Removed

/api/v1/fallback-chain endpoint (replaced by /api/v1/agent)
llm.fallback_chain yaml field (migrated to agent.fallback_chain); persist.SetAgent strips the legacy key on first save
engine.max_turns yaml field (moved to agent.max_turns)
Hardcoded 200_000 literal for prompt-context window size in internal/engine/queryloop.go and subagent.go — now derived from qe.contextWindow()

Assets 8

13 May 16:59

github-actions

v0.0.12

ed9885f

v0.0.12

Added

Session metrics dashboard: per-session LLM / sub-agent / tool counters tracked by sessionstats.Tracker, persisted as metrics_json column on session rows, snapshotted via debounced worker, and served at GET /api/v1/sessions/{id}/metrics on the Console port; survives service restart via Tracker.RestoreFrom
Provider-stats decorator: wraps any provider.Provider to record per-model token usage (input / output / reasoning / cache hits) into sessionstats.Registry from a single ctx-key plumbing point, no per-call instrumentation
thinking_tokens field on Usage: bifrost adapter now surfaces upstream reasoning-token counts so the dashboard can break out chain-of-thought cost separately from regular completion tokens
Model registry catalog (internal/provider/registry): YAML manifest declaring ProviderSpec (auth / endpoints / quirks) and ModelSpec (modalities / supports / limits / defaults) for 8 vendors and 16 models (OpenAI, Anthropic, Google, DeepSeek, Zhipu, Moonshot, MiniMax, iFlyTek); embedded default manifest loaded at startup, queryable by manifest key or by provider+model_id
Public models endpoint: GET /api/v1/models and GET /api/v1/models/{provider}/{model_id} on the Console server, JSON-tagged snake_case payload for the front-end capability gate; documented in docs/api/models-registry-api.md
Provider quirks routing in bifrost adapter: ThinkingParamStyle selects the wire format per provider (deepseek_type → extra_params.thinking.type, openai_effort → reasoning.effort, anthropic_budget → reasoning.max_tokens, openrouter → reasoning.enabled); ExtraParamsPassthroughRequired and ToolCallsRequireReasoningField now gate behavior per-quirk instead of being applied unconditionally
enable_thinking provider config option: per-provider override that drives the new quirk routing without changing the call site
iFlyTek Spark X2 Flash support: xunfei/spark-x (256K context, text-only, function_calling + reasoning + web_search) wired through the registry and bifrost adapter
Structured tool error classification: ToolErrorType (tool_timeout / rate_limit / overloaded / contract_fail / dependency_fail / permission_denied / invalid_input / user_aborted / model_error / internal) set by the executor and every built-in tool at the failure site; WebSocket translator emits it as card.close.error.type instead of defaulting all failures to internal

Changed

Session-metrics endpoint moved from the WebSocket channel mux to the Console HTTP server (port 8090) so the front-end has one management origin to consult
Bifrost adapter Config accepts a *ProviderQuirks mirror struct; quirks come from the registry at startup and gate behavior that was previously hard-coded (e.g. ExtraParams passthrough was always-on; now opt-in only for providers that declare it)
Claude model family field regrouped to claude4.5 / claude4.6 / claude4.7 (was claude-haiku / claude-sonnet / claude-opus) so the UI groups by generation rather than tier

Fixed

DeepSeek thinking mode now actually disables reasoning: adapter sends thinking: {"type": "disabled"} via ExtraParams per the official docs, and opts into bifrost's BifrostContextKeyPassthroughExtraParams so the field reaches the wire (the SDK was silently dropping Params.ExtraParams during marshalling, so the legacy enable_thinking: false field was both ignored by the server and never sent)
reasoning_content is now preserved on assistant messages across DeepSeek thinking-mode turns when thinking is enabled, and emitted as an empty-string placeholder on tool_calls assistant messages when the provider quirk requires the field present — fixes 400 "reasoning_content must be passed back to the API"
Reasoning replay across turns is suppressed when enable_thinking is disabled, instead of inflating input tokens with every prior turn's chain-of-thought
Bifrost adapter waits for the synthetic usage chunk after finish_reason instead of emitting MessageEnd immediately, so token totals are no longer lost when bifrost's OpenAI provider holds usage back to a trailing chunk
Per-model stats now key off the provider-reported model name (echoed back in Usage) instead of the adapter's configured default, so usage attribution stays correct when fallbacks or model overrides kick in
New session rows are persisted eagerly in Manager.GetOrCreate so the foreign key for SessionStats.SaveSessionStats exists by the time the first metrics flush fires (previously dropped with no such row)

Assets 8

11 May 15:03

github-actions

v0.0.11

4880bc6

v0.0.11

Added

LLM retry visibility on the wire: every retry the engine schedules emits card.tick(kind=note) carrying attempt number, planned backoff delay, classified error type, and HTTP status — front-ends can render "重试中 (N/M, Xms 后再试)" instead of staring at silent waits
LLM heartbeat events: 30 s keep-alive ticks during in-flight LLM calls propagate up the parent_card_id chain so long-thinking models no longer cause the surrounding step / agent / message cards to orphan-timeout
First-byte timeout on LLM calls: llm.first_byte_timeout (default 120 s) catches "TCP black hole" upstreams that accept the request but never send a chunk; disarms once the first chunk lands so legitimate long thinking preludes are not penalised
Configurable LLM timeouts via yaml: llm.api_timeout, llm.first_byte_timeout, llm.max_retries exposed as top-level config keys
Step-decision gate: when retries / replans exhaust, Scheduler and PlanCoordinator emit prompt.user(kind=step_decision) so the user picks continue / retry / cancel instead of silently falling back
Chain-only lifecycle for orchestration tool cards: Specialists and Task tool cards now opt out of the orphan watchdog but stay tracked in the parent chain — descendant heartbeats still refresh ancestors above
LLM-driven LLMSubagentResolver replaces keyword scoring: structured-output tool call picks the executor from AvailableSubagents with rationale, falls back to heuristic on nil-provider / LLM-error / out-of-set-pick
Planner and SubagentResolver route through retryLLMCall: both now inherit transport-level retry, heartbeats, retry-status events, and FirstByte/API timeouts that L1 emma and L3 sub-agents already have
Per-frame WebSocket trace logging behind channels.websocket.trace_frames toggle for lifecycle-level debugging without dropping the global log level to DEBUG

Changed

Plan card parent is now the emitting Specialists agent card (was the turn card); fixes the topology so writer heartbeats walk writer → step → plan → specialists agent → tool → turn and the Specialists tool card stays alive through the whole plan
ping / pong are now top-level wire frames ({"type":"ping"} / {"type":"pong"}) with no envelope / seq / severity — pure liveness signal, decoupled from session.event
Specialists L2 worker no longer carries a 15-min hard timeout; AgentTool dispatch no longer carries a 5-min hard timeout — long-running plans are bounded by per-call LLM timeouts and the step_decision gate instead of one wall-clock cap that killed legitimate work mid-flight
SubmitTaskResult summary over 200 chars is now truncated with … instead of rejected, so the sub-agent isn't forced to redraft when slightly over budget
LLM retry plumbing uses retry.Retryer.DoWith(ctx, fn, onRetry) — per-call observer keeps retry events scoped to one caller's out channel even when the Retryer is shared across concurrent sub-agents

Fixed

Specialists tool card no longer suffers spurious orphan_timeout closes mid-plan: 120 s CardTool watchdog used to kill the tool card once the planner stopped tick-ing it (e.g. while writer was running), surfacing "工具失败" while the underlying step succeeded
DNS / network errors at the L1 main loop now retry via retry.Retryer with the same exponential backoff path L3 sub-agents already used, instead of failing on the first attempt
LLM call ctx cancellation no longer counts as a retryable transient error: a dead ctx propagates through the Retryer as non-retryable so we stop wasting attempts when the parent has already given up
Wire-frame card.close for sub-agent abort no longer maps to StatusOK: "aborted" now produces StatusCancelled (was silently dropped to ok)

Added

Roster-agnostic LLM planner: replaces the keyword-rule HeuristicPlanner; the LLM produces a step DAG via the emit_plan tool with retry-on-validation, capped at maxSteps, and intentionally does not see the available sub-agent list — SubagentResolver picks the executor at dispatch time
Step-level retry on transient failures: scheduler retries up to two attempts on timeout / rate-limit / overloaded / 5xx / terminal_blocking_limit / terminal_model_error signals before falling back to plan-level replan; step_started fires per attempt and step_completed / step_failed carry cumulative attempts

Changed

plan_review confirmation now matches prompt.user(question) semantics: no card.add(plan) while waiting, no orphan watchdog, ctx deadline stripped — the user can take as long as they need to review (protocol v0.4.0)

Fixed

Tool-result metadata (e.g. WebSearch urls / query / result_count) now flows through the v2.2 translator to ToolPayload.Metadata instead of being dropped after promoting render_hint / language / file_path
Scheduler step failures triggered by terminal sub-agent reasons (model_error / blocking_limit / prompt_too_long) now populate StepResult.Failures with terminal_<reason>: <message>, log a structured Warn at the scheduler boundary, and feed the retry classifier — previously these failed silently with empty step_failed payloads and no server log

Assets 8

08 May 13:24

github-actions

v0.0.10

5979f39

v0.0.10

Added

Roster-agnostic LLM planner: replaces the keyword-rule HeuristicPlanner; the LLM produces a step DAG via the emit_plan tool with retry-on-validation, capped at maxSteps, and intentionally does not see the available sub-agent list — SubagentResolver picks the executor at dispatch time
Step-level retry on transient failures: scheduler retries up to two attempts on timeout / rate-limit / overloaded / 5xx / terminal_blocking_limit / terminal_model_error signals before falling back to plan-level replan; step_started fires per attempt and step_completed / step_failed carry cumulative attempts

Changed

plan_review confirmation now matches prompt.user(question) semantics: no card.add(plan) while waiting, no orphan watchdog, ctx deadline stripped — the user can take as long as they need to review (protocol v0.4.0)

Fixed

Tool-result metadata (e.g. WebSearch urls / query / result_count) now flows through the v2.2 translator to ToolPayload.Metadata instead of being dropped after promoting render_hint / language / file_path
Scheduler step failures triggered by terminal sub-agent reasons (model_error / blocking_limit / prompt_too_long) now populate StepResult.Failures with terminal_<reason>: <message>, log a structured Warn at the scheduler boundary, and feed the retry classifier — previously these failed silently with empty step_failed payloads and no server log

Assets 8

07 May 14:29

github-actions

v0.0.9

c1f7c16

v0.0.9

Added

L2 multi-mode coordinator framework: routes Specialists / Orchestrate work through one of pluggable modes (ReAct, Plan), with Planner, Scheduler, Judge, Escalation, Fallback, Budget, ModeSelect, and SubagentResolver as independently testable components
Plan-mode user-confirmation flow: when user.message.plan_confirmation: "required", the framework emits plan.proposed (carrying the editable step DAG) and blocks until the client returns plan.response with approval / edits / rejection; mirrored by plan.approved echo (protocol v1.15+)
Plan / Step lifecycle emit events from the PlanCoordinator path: plan.created / plan.updated / plan.completed / plan.failed and step.dispatched / step.completed / step.failed / step.skipped (protocol v1.16; envelope/display/metrics are placeholder, payload is fully populated)
Coordinator-mode threading: user.message.coordinator_mode selects ReAct vs Plan per turn; tool.WithCoordinatorMode ctx value flows from router → Specialists → SpawnConfig
LLM call timing breakdown on llmCallResult (firstByteAt / lastChunkAt / endAt) for diagnosing gateway hangs, extended thinking, and network buffering separately

Changed

Plan-step skill field renamed to subagent_type and available_skills to available_subagents to disambiguate from AgentDefinition.Skills (capability tags); subagent_type is now optional, with SubagentResolver picking the executor at dispatch time
emma's task-dispatch principles now require an explicit clarification-merge self-check before any Specialists spawn — original-request plus user-clarification answer must be combined into the task string

Fixed

Bifrost adapter no longer hangs ~6m40s after the model finishes streaming: consumeStream returns as soon as a chunk carries FinishReason, instead of waiting for the upstream chunk channel to close on its own (which only happens at the underlying HTTP idle timeout)
AskUserQuestion tool description gained an explicit reminder that the user's clarification answer must be folded into the next task / prompt — previous wording let the LLM forward the original (un-clarified) request

Assets 8

05 May 15:28

github-actions

v0.0.7

a5ca9b3

v0.0.7

Added

Three-tier agent architecture: emma (L1, user-facing) → Specialists (L2, coordinator) → workers (L3, leaf executors), with strict per-tier tool filtering and prompt isolation
TierSubAgent contract on AgentDefinition: every L3 worker declares OutputSchema, InputSchema, Limitations, ExampleTasks, CostTier, and Temperature; Tier routes spawns to the strict L3 driver
runSubAgentDriver: leaf-only ReAct loop with self-check, nudge cap (3x), reject cap (3x), and SubmitTaskResult-or-EscalateToPlanner termination
7 redesigned built-in workers (writer, researcher, analyst, developer, travel_planner, recommender, scheduler) as pure functional L3 with specialized SystemPrompt and per-worker schemas
SubmitTaskResult tool with server-side validation (lineage, temporal window, role/type/mime, OutputSchema)
EscalateToPlanner tool: L3 needs-planning escape hatch (paired with SubmitTaskResult)
SpawnConfig.Inputs + InputSchema validation in SpawnSync — validated before any LLM call
Artifact subsystem with task contract: SQLite-backed store at ~/.harnessclaw/db/artifacts.db, ArtifactRead / ArtifactWrite tools, available-artifacts preamble auto-injected on sub-agent spawn, TTL janitor, parent-version chains
WebSocket forwarding of artifact refs and L3 sub-agent task/intent for richer client observability (protocol v1.12)
Plan-based DAG executor for Orchestrate tool with PlannerListing rich-metadata routing
Structured lifecycle event protocol via internal/emit package
AskUserQuestion tool for clarification flows; WebSearch updates
SafetyLevel classification (safe / caution / dangerous) on every built-in tool, with WithoutDangerousUnless filter for L3 pools
Minimal in-house JSON-schema validator (ValidateAgainstSchema) shared by SubmitTaskResult and InputSchema enforcement
BuildFunctionalIdentity for L3 workers — team-free, persona-free identity that does not leak emma's L1 prompt

Changed

Worker system prompts no longer inject team affiliation or personality for TierSubAgent; identity becomes purely functional
lifestyle worker split into single-responsibility travel_planner and recommender
Specialists / Orchestrate consume structured PlannerListing for richer routing decisions
All user-facing tool descriptions translated to Chinese for consistency
emma identity phrasing tightened in the L1 prompt
Artifact subsystem redesigned around the task contract (producer task_id stamping, expected-outputs render block)

Fixed

ArtifactRead defends against artifact_id hallucination: detects UUID-shaped fabrications (8-4-4-4-12 dashed and 32-hex compact) and instructs the LLM to escalate instead of retry, preventing the retry-then-fabricate loop that previously consumed long stretches of LLM time
Artifacts guidance section now shows the real ID format example and explicit "don't know an ID? EscalateToPlanner" instruction

Removed

Personality field is no longer rendered into TierSubAgent system prompts (kept on the definition for team-table metadata only)

Assets 8

27 Apr 14:26

github-actions

v0.0.6

46ca4c1

v0.0.6

Added

Agent definition persistence: SQLite-backed AgentService and console HTTP API (/console/v1/agents) for create/list/get/update/delete/import operations; built-in definitions are synced on startup and YAML files are imported on demand
Console management API server with configurable host/port (console.enabled/host/port config keys, defaults to 0.0.0.0:8090)
Per-agent skill whitelist enforcement: AgentDefinition.Skills and SpawnConfig.AllowedSkills are now respected by SkillTool, which rejects invocations of skills not on the list
Per-agent tool whitelist enforcement: AgentDefinition.AllowedTools filters the sub-agent tool pool via the new ToolPool.FilterByNames method
Personality field is injected into the auto-generated worker identity prompt
WebSocket deliverable.ready event: sub-agent file outputs (FileWrite results) surface to the client with file_path, language, and byte_size for direct rendering or download
<summary> tag protocol for sub-agent outputs: Worker / Explore / Plan profiles require sub-agents to wrap their core conclusion in <summary>...</summary>; the engine extracts it into SpawnResult.Summary and returns only summary + deliverables to the parent agent
TaskRegistry: full sub-agent results are stored in-engine by agent ID for context passing and debugging while the parent only sees summaries
SpawnResult now carries structured Summary, Status, and Attempts fields

Changed

Emma system prompt restructured into a three-layer architecture: persona/team/judgment/delivery only; dispatch protocol, retry rules, and multi-agent coordination paragraphs moved to application code
Sub-agent system prompts no longer hard-code agent names in role overrides; dynamic SystemPromptOverride from agent definition takes precedence over static profile SectionOverrides for the role section
Section headings dropped numeric prefixes ("一、二、三") so reordering does not break the prompt
All built-in section content translated to Chinese (env, tools, memory, skills, task, currentdate, artifacts)
FileWrite no longer auto-creates parent directories; callers must ensure the directory exists or the tool returns an explicit error pointing at the missing path; the schema description now hints at a default working directory
Agent definitions are no longer auto-scanned from .harnessclaw/agents/ on startup; use the import endpoint instead

Removed

Coordinator system prompt and CoordinatorProfile: orchestration is now an L2 application-code concern, planned to land as a code-driven Orchestrate tool in a follow-up
Static output / rules prompt sections: their delivery rules are folded into principles

Assets 8

22 Apr 14:57

github-actions

v0.0.5

6bb2051

v0.0.5

Added

Universal artifact store: session-scoped content store for large tool results with automatic threshold-based replacement, frozen replacement decisions for prompt cache stability, and pre-LLM compaction
ArtifactGet tool: LLM retrieves full artifact content by ID without regenerating
Write tool artifact_ref parameter: write artifact content to files by reference, saving output tokens
Artifact-aware compaction: replaces artifact-backed tool results with compact references before LLM summarization
SQLite artifact persistence: artifacts table for persisting large tool results across server restarts
Multi-agent orchestration system: sub-agent spawning (sync/async/fork), loopConfig-parameterized query loop, coordinator mode with team management
Agent tool with SpawnSync/SpawnAsync, InheritedChecker permission inheritance, and LongRunningTool interface for timeout bypass
Task system: TaskCreate/TaskGet/TaskList/TaskUpdate tools with in-memory and SQLite-backed stores
Team management: TeamCreate/TeamDelete tools, MessageBroker with mailbox-based inter-agent messaging, SendMessage tool
@-mention routing: MentionParser extracts @agent_name from user messages, routes to registered agent definitions loaded from YAML
Coordinator mode: system prompt rewrite to dispatcher role with 4-phase workflow (research → synthesis → implementation → verification)
WebSocket sub-agent and multi-agent event protocol: subagent.start/end, agent.routed/spawned/idle/completed/failed, task.created/updated, agent.message, team.created/member_join/member_left/deleted
Render hint metadata on tool results: render_hint, language, file_path fields promoted to top-level in tool.end WebSocket messages
Language detection utility mapping file extensions to language identifiers for render hints
Web search tools: Tavily search and iFly/Xunfei search integrations
LLM retry with exponential backoff for transient provider failures
Mock LLM provider and stream builder for unit testing
Current date section in system prompt
Artifact guidance section in system prompt teaching LLM about artifact usage patterns

Changed

Storage architecture: removed memory/sqlite switch; SQLite is now always the persistence backend, Manager.active map serves as in-memory cache
Bifrost adapter error messages now include HTTP status code, error type/code, and underlying error details for easier troubleshooting
Bifrost stream idle timeout increased to 300s and request timeout to 600s for sub-agent workloads
System prompt role and output style sections rewritten with improved anti-AI-speak guidance

Fixed

Session persistence across server restarts: eliminated config-level memory/sqlite choice that prevented SQLite from being used
Bifrost error messages no longer hide HTTP-level error details behind generic constant strings
Missing SourceSkillDir constant in command source enumeration

Assets 8

Releases: harnessclaw/harnessclaw-engine

v0.0.15

Added

Changed

Fixed

Removed

Uh oh!

v0.0.14

Added

Changed

Fixed

Removed

Uh oh!

v0.0.13

Added

Changed

Fixed

Removed

Uh oh!

v0.0.12

Added

Changed

Fixed

Uh oh!

v0.0.11

Added

Changed

Fixed

Added

Changed

Fixed

Uh oh!

v0.0.10

Added

Changed

Fixed

Uh oh!

v0.0.9

Added

Changed

Fixed

Uh oh!

v0.0.7

Added

Changed

Fixed

Removed

Uh oh!

v0.0.6

Added

Changed

Removed

Uh oh!

v0.0.5

Added

Changed

Fixed

Uh oh!