Releases: harnessclaw/harnessclaw-engine
Releases · harnessclaw/harnessclaw-engine
v0.0.15
Added
- L2 scheduler kernel (v3.1) under
internal/engine/scheduler/: top-levelScheduler.New+Submit+Startwires bus, kernel, dispatch strategies, and runtime handlers (onSpawn/onResult/onLifecycle/onTerminal/onExpire/onCancellingDrained/onCompletedFromStaging) into a single message-driven L2 - 8-state task state machine in
scheduler/tstate:KernelwithRollbackAdmit+ epoch guards,Reader/Writer/StagingWriterinterfaces (R2/R4/R10), in-memory + SQLiteStoreimplementations sharing the sessions*sql.DB, named-field UpdateField CAS path - Dispatch strategies in
scheduler/dispatch:react.Strategyspawns one leaf viaSpawnAndWaitOne(R1/R6 Subscribe-before-Publish);plan.Strategyruns the planner-agent + plan-executor-agent two-phase LLM pipeline, withPlanJudgerule-tier validation andFallbackAggregatorgraceful degradation - Routing layer in
scheduler/router:HeuristicKindSelector(react vs plan, mode-select heuristics) +HeuristicAgentResolver(keyword-scored sub-agent selection); both pluggable viaCoordinatorconfig so the project can swap in LLM-based selectors later scheduler/runtime/host:StartStrategyHostforks G1 + G2, stages refs, publishes lifecycle, runs a 3-pass reaper scan (lease / deadline / cancelling) with notify-on-expiry- Msgbus (
internal/msgbus): in-processBuswithPublish/Subscribe/SubscribeOnce/Dequeue/Ack/Nack, six message kinds (lifecycle/control/agent.msg/notify/task/result), six typed payloads, in-memory + SQLiteStoreimplementations with reaper requeue and per-kind typed-struct revival - L1 (
internal/engine/emma) and L2 (internal/engine/scheduler) now sit in dedicated subpackages; new public accessors onQueryEngine—ApplyMainAgentConfig/Config/PromptProfile— let emma cross the package boundary without touching private fields - L2 dispatch tool
scheduler: emma callsscheduler(task)to enter the L2 layer;coordinator-mode(react / plan) flows viatool.WithCoordinatorModeon the parent context, never via emma's input schema (ops-only knob, D-mode auto-escalation stays internal) plan_readtool: read-only access toplan.jsonfor plan-executor-agent; TDD-built with full coverage- Plan-mode profiles + agent definitions:
plan-agentwrites the task breakdown toplan.json,plan-executor-agentreads it back, dispatches step tasks viafreelance, updates status; profiles include principles tailored to write/read responsibility split SubagentTypefield onspec.TaskSpec: explicit agent pinning for plan-mode steps so the planner can request specific roles instead of relying on resolver heuristicsEscalationInfoonTaskSpec+ transient failure reason constants (timeout/rate_limit/overloaded/network) withRetryableaccessor for the react → plan D-mode escalation contextcmd/test/directory consolidates all local-only e2e probes (ask_e2e/emit_e2e/l3_e2e/metrics_e2e/plan_e2e/sched_e2e/subagent_e2e/tool_catalog); the entire subtree is gitignored — onlycmd/server/ships in version control
Changed
- L1 → L2 → L3 layering becomes visible in the directory tree:
internal/engine/now exposesemma/(L1),scheduler/(L2 kernel + dispatch + coordinator),worker/(L3 placeholder),loop/(cross-cutting query-loop helpers —SkillTracker/ retry context / drain channel) - L2 dispatch tool renamed from
tasktofreelance(clearer intent: emma delegates a single piece of work to an L3 freelancer; multi-step orchestration usesschedulerinstead) - Specialists tool renamed to
schedulerwith lowercase tool names project-wide (LLM-facing identifiers normalised) - Per-role principles split into
internal/engine/prompt/principles/{emma,specialists,worker,explorer,planner,plan-agent,plan-executor-agent}; per-role packages replace the previous monolith so adding a new role no longer touches unrelated principle text SchedulerCoordinatorrenamed toscheduler.Coordinatorafter moving into the scheduler subpackage (theSchedulerprefix was redundant inside the package)internal/engine/orchestratemoved tointernal/engine/scheduler/legacy/and marked DEPRECATED — Phase-1 plan executor still backs theOrchestratetool until parity with the new plan strategy is reached- Coordinator-tier
SpawnSyncnow routes throughCoordinator.Run(traffic cutover): emma'sschedulertool hits the new scheduler internals by default; the engine wiring deletes the obsoletecoordinator_*.gofiles (Phase 3 migration complete) QueryEngineexposes only the three accessors the L1 wrapper needs;MainAgentProfile/MainAgentDisplayName/MainAgentAllowedTools/MainAgentMaxTurnsare applied throughApplyMainAgentConfiginstead of direct field assignment- L2 / L3 principles now instruct sub-agents to use scheduler-provided workspace tools (Promote / ArtifactWrite) instead of
Bash mkdir / mv / cp; D13 path-scope boundary documented in the Bash tool description cmd/reorganised tocmd/server/(production) +cmd/test/(local probes);.gitignorecollapsed to a singlecmd/test/rule with no exceptions
Fixed
- Plan-mode E2E: tool stripping and session_id propagation for
plan-executor-agentso spawned step tasks see the parent session and the executor's tool palette is correctly narrowed metaRefToLoopResultreadsmeta.jsonto surface the L3 summary back to the L1 caller — previously the summary was lost when the engine routed throughCoordinator.Runinstead of the legacy direct pathscheduler/tstate.RenewLeaserunsInTx, surfaces cancel cascade errors, validates epoch, and references the lease column via a named field constant (no more silently dropped lease updates)msgbus/storeSQLite revivesPayloadas the concrete typed struct perKindso queue consumers can rely on type assertions instead of map-fishing throughanymetatoolderivestask_id+agentfrom context and stat-fills output bytes;emit/v2persists parent links acrossClosefor ancestor heartbeat continuityserver/bifrostresolves provider quirks by YAML key first (matches user-visible config) instead of by manifest nametranslatorclearstoolNames+toolsFromPlanningonToolEndso subsequent cards do not inherit stale state from a previous tool's lifecycle
Removed
internal/engine/coordinator_*.gofiles (Phase 3 migration):coordinator_judge.go/coordinator_fallback.go/coordinator_subagent_resolver.go/coordinator_mode_select.go/coordinator_scheduler.goetc. — their responsibilities are now owned byscheduler/dispatch/{plan,react}(judge + fallback aggregator) andscheduler/router(kind selector + agent resolver)cmd/test/content is no longer tracked in git:tool_catalog/main.goandmetrics_e2e/main.goreverted to local-only status (previously partially tracked); local copies preserved on disk for ad-hoc runs
v0.0.14
Added
- Multimodal user input (image + PDF) end-to-end (
internal/engine/multimodal): typedIncomingContentBlockparser + size-capped builder (per-block / per-message base64 limits, seeMaxBase64BlockBytes/MaxTotalBytesPerMessage), capabilityGateenforced at the router before engine dispatch, deterministicUnsupportedModalityErrorwith user-facing message + rejected-modality list, redactor for log-safe payload previews - Per-endpoint capability override:
endpoint.model_typeyaml field (vision/pdf/audio/video/reasoning/tools/search) overrides the manifest baselineSupportsFlags; unknown tokens warn-and-drop at startup, rejected with 400 onPATCH /api/v1/providers/{p}/endpoints/{e}(*[]stringsemantics: omitted = leave alone,[]= clear override and revert to manifest) - Fallback-chain capability intersection:
Manager.ChainSupportsAND-intersectsSupportsFlagsacross primary + every fallback entry so the multimodal gate rejects inputs that would fail mid-chain on fail-over (correctness over availability — "switch model" upfront beats an opaque 400 from a fallback hop) GET /api/v1/agent/capabilitiesendpoint: serves the resolved active-modelSupportsFlags+ derived capability buckets (multimodal/tools/reasoning/search) using the same bridge the gate uses, so the client can never disagree with the server about what's allowedcapabilitiesarray on/api/v1/modelsresponses: collapsed bucket list derived fromSupportsFlagsfor ergonomic UI chip rendering (granular flags remain authoritative for per-feature gating)- Bifrost adapter image / file block conversion: typed
ContentTypeImage/ContentTypeFileblocks rendered as Anthropic / OpenAI Vision payloads viadata:URL synthesis from base64 + media_type, or pass-through for remote URLs; Anthropic ephemeralcache_controlbreakpoints added to image / PDF blocks with post-conversioncapImageCacheBreakpointsclamp to the 4-breakpoint per-request limit (oldest dropped first) - L3 freelancer execution mode: skill lifecycle tools (
loadskill/unloadskill/searchskill/listloadedskills) with per-sessionskill_trackercontext propagation,skill_blockengine event surfacing loaded-skills state to UI, freelancer hydration of agent definition + prompt sections (skills section + principles text) - Artifact blob store: filesystem half at
~/.harnessclaw/artifact-blobsbacking large binary artifacts alongside sqlite metadata; new/api/v1/artifactsHTTP handler for client download ArtifactWritesource_path allow-list: pinned reads to~/.harnessclaw/workspace+ configuredskills.dirs(no arbitrary filesystem ingest by the LLM)- Sub-agent token attribution:
sessionstatsdistinguishes immediate parent vs root session buckets so multi-agent dispatches roll up to the right conversation; L3+ sub-agents dual-write LLM and tool stats to the root session tracker - Tools management API (
/api/v1/tools): GET (list + per-tool config) + PATCH (per-tool config hot-swap with yaml rollback) backed bytool.Registry.Replacefor atomic adapter swap;config/persist.SetToolConfigwritestools.<name>.*yaml blocks while preserving comments / key order CardSystemcard kind inemit/v2(icon=info, role=system, untracked) +SystemPayload.topicfield for framework-level notices independent of card lifecycle;EngineEventSystemNoticeevent +SystemNoticepayload typeSearchGapDetector(internal/engine/capability_gap_detector.go): detects when a user asks for web search but no search tool is enabled, emits a session-dedupedcard_kind=systemnotice pointing at the settings page; wired intoQueryEngineand sub-agent spawn step 6- iFlyTek Spark v2/search Bearer-token API: WebSearch tool migrated to the new Spark search v2 endpoint with Bearer auth
error.unsupported_modalityerror type + structureddetailsmap onErrorInfoso the wire format can carrymodel/rejected_modalities/user_message/error_codeto the client in a single error frame- WebSocket per-frame read limit raised to 32 MB to accommodate multimodal
user.messageframes; wire-layer size-cap check rejects oversized payloads withpayload_too_largebefore they reach the engine
Changed
pkg/types.ContentBlockextended with multimodal fields (MediaType/Data/URL/Path/Filename/Size); zero-valued +omitemptykeeps text-only payloads byte-identical on the wirerouter.Newsignature now takes aModelInfoProviderfor capability gating;nildisables gating (used by older tests); production wires a bridge fromprovider.Manager.ActiveModelKey+ registryLookupModel+ endpoint-override mergeEmmaprinciples tightened: long user requirements must be passed verbatim to Specialists — task field can't shrink N-item lists to one item, can't add reverse constraints the user didn't state, can't pre-split into "first read, then implement" stepsinternal/artifact/sqlite_storeapplies the same pragmas as the sessions store (busy_timeout=5000/journal_mode=WAL/synchronous=NORMAL) so concurrent sub-agent writes no longer hitSQLITE_BUSYllm.call.stream_stucklog line downgraded from WARN to INFO (upstream slowness is not a server-side defect; the retry budget handles it on its own — WARN was inflating monitoring alerts)agent.Message.ParentMessagessemantics documented: text-only by design; if extended to carry typed content blocks, callers MUST re-runmultimodal.Gateagainst the sub-agent's resolved model (else a text-only sub-agent silently receives image data and fails at the provider)- Bifrost adapter
convertMessagesappliescapImageCacheBreakpointspost-conversion clamp (per-blockcache_controlis added eagerly; the global cap is enforced once at the end)
Fixed
- L2 sub-agent loop swallowed
EngineEventSystemNotice: the L2 forward-switch whitelist insubagent.gowas missing the type, so any system notice (e.g.SearchGapDetectorfiring) emitted under an L2 dispatcher never reached the WS translator — added to the pass-through case list - Team sub-agents lost web search when
TavilySearchwas disabled:AllowedToolsnow listsWebSearchas a fallback so the agent doesn't silently lose the capability when credentials are missing for the primary search provider /api/v1/toolsPATCH could race on concurrent updates: mutex now guardscfgmutation; empty-cfgPath errors surface to the caller instead of silently no-op'ing; rollback failures get logged
Removed
- N/A
v0.0.13
Added
- Multi-provider failover dispatcher (
internal/provider/failover): four-tier RetryPolicy (Probe 5s / Fast 15s / Medium 30s / Full) plus three-state per-provider health (healthy/tripped/ready_to_probe) with exponential cooldown backoff; classifies which errors cross provider boundaries and which stay local (prompt_too_long / max_output_tokens / ctx.Canceled never trip the provider) - Hot-swappable provider manager (
internal/provider/manager): atomic.Pointer wraps the live Failover dispatcher so chain mutations don't disturb in-flight calls; adapter cache reuses bifrost adapters across chain-only mutations - Top-level
agent:yaml block replacingllm.fallback_chain:primary(single dotted ref"provider:endpoint"),fallback_chain(ordered backups),max_tokens/temperature(adapter-baked defaults, unified [0,1] temperature scaled per provider type — anthropic ×1, openai/gemini ×2),context_window,max_turns(moved fromengine.max_turns),max_tool_calls(0 = unlimited),thinking_intensity(low/medium/high/ blank) endpoint.context_windowfield per endpoint declares the model's intrinsic context limit;Manager.EffectiveContextWindow()returnsmin(agent.context_window, primary_endpoint.context_window)with 200_000 fallback; surfaced aseffective_context_windowinGET /api/v1/agentand theengine initializedstartup logGET/PATCH /api/v1/agentendpoint (replaces/api/v1/fallback-chain): partial-updates any subset of agent fields, validatesmax_turns ≥ 1,max_tool_calls ≥ 0,thinking_intensityenumPOST /api/v1/providersruntime API for creating new provider entries without restart;geminiadded totypewhitelist alongsideopenai/anthropic- Comment-preserving yaml persistence (
internal/config/persist):yaml.Node-based mutator rewrites only changed keys, preserving inline comments / key order / hand-tuned indentation across PATCH-driven persistence cycles - Per-call token usage on
llm.call oklog:input_tokens/output_tokens/cache_read/cache_write/thinking_tokensnow at INFO level (was DEBUG-only viabifrost stream MessageEnd) - Bifrost stream lifecycle DEBUG logs:
llm.call.dial(before SDK call) /llm.call.connected(after stream returned) /llm.call.stream_closed(when wrapper goroutine exits) — a hang can now be located between dial / connected / streaming / closing - Stream-stuck WARN watchdog:
doSingleLLMCallemitsllm.call.stream_stuckevery 30 s of no new chunk; observability only —firstByteTimeout/apiTimeoutstill own hard cancellation - Provider endpoint
disabledfield (cascading withprovider.disabled): effective disabled =provider.Disabled OR endpoint.Disabled; auto-removed from chain on PATCHdisabled:true, chain auto-fills with first enabled endpoint when empty
Changed
agent.max_tokens/agent.temperaturebaked into each bifrost adapter as defaults;ChatRequest.Temperature/MaxTokens == 0falls back to these.PATCH /api/v1/agenton these fields invalidates the adapter cache so subsequent calls pick up the new defaultsChatRequest.ContextWindowfield added as observability hint (not sent to vendor);stats classify.limitnow reads it instead ofreq.MaxTokens— dashboardContextWindow.Limitcorrectly reflects configured budget instead of response capManager.AdapterBuildersignature now takesagent config.AgentConfigso adapters can resolve effective temperature / max_tokens at build timeemmaPrinciplestrimmed from 8142 → 3824 chars (−53%): removed redundant repetition, anti-pattern lists rewritten as positive instructions, reordered for LLM head-attention bias;TestPrinciplesSection_*invariants kept intact- Manager
AgentSnapshotpayload now includeseffective_context_windowso operators can compare configured vs. capped value - Provider
typefield is now required (was optional with implicit default); empty / unknown types dropped at startup with WARN, not FATAL — server stays bootable with valid providers when one entry is malformed - Empty fallback chain and bad yaml entries now warn-and-drop instead of FATAL: server enters degraded mode (Chat returns
ErrNoEndpoint, management API stays mountable) until operator PATCHes a valid agent config
Fixed
GET /api/v1/sessions/{id}/metricsreturned emptysub_agents:executor.gonever injectedToolUseContextbeforet.Execute, so Specialists readparent_session_id="", short-circuitedStartSubAgent, and bypassedstats_providerentirely — Specialists + all L3 sub-agent token spend was invisible. Fix injectsToolUseContext(SessionID / ToolCallID / ToolName / ToolInput) at the executor boundary- Task tool rejected legitimate sub-agent types (
writer/researcher/analyst/developer/lifestyle/scheduler): hardcoded 3-entry whitelist inagenttool/input.gocontradicted the tool description, costing one wasted LLM round-trip per Specialists dispatch (~6.5k token / 6 s). Validate now defers name resolution to thedefRegistryat spawn time - Orchestration tool cards (Specialists / Task) false-positive
orphan_timeouton theEngineEventToolCallpath: onlyEngineEventToolStarthad theisOrchestrationToolcheck appendingWithoutLifecycle(); client-side tool calls were still subject to the 120 sCardToolwatchdog, causing UI to show "工具失败" while the underlying multi-minute plan was healthy. Both paths now opt out QueryEngineConfig.MaxTokenswas overloaded as both "response cap" and "context budget" — compactor was misclassifying agent.max_tokens (e.g. 2048) as the context window, triggering auto-compact at trivial token counts. Split into distinctMaxTokens(response cap) andContextWindow(compactor budget) fields
Removed
/api/v1/fallback-chainendpoint (replaced by/api/v1/agent)llm.fallback_chainyaml field (migrated toagent.fallback_chain);persist.SetAgentstrips the legacy key on first saveengine.max_turnsyaml field (moved toagent.max_turns)- Hardcoded
200_000literal for prompt-context window size ininternal/engine/queryloop.goandsubagent.go— now derived fromqe.contextWindow()
v0.0.12
Added
- Session metrics dashboard: per-session LLM / sub-agent / tool counters tracked by
sessionstats.Tracker, persisted asmetrics_jsoncolumn on session rows, snapshotted via debounced worker, and served atGET /api/v1/sessions/{id}/metricson the Console port; survives service restart viaTracker.RestoreFrom - Provider-stats decorator: wraps any
provider.Providerto record per-model token usage (input / output / reasoning / cache hits) intosessionstats.Registryfrom a single ctx-key plumbing point, no per-call instrumentation thinking_tokensfield onUsage: bifrost adapter now surfaces upstream reasoning-token counts so the dashboard can break out chain-of-thought cost separately from regular completion tokens- Model registry catalog (
internal/provider/registry): YAML manifest declaringProviderSpec(auth / endpoints / quirks) andModelSpec(modalities / supports / limits / defaults) for 8 vendors and 16 models (OpenAI, Anthropic, Google, DeepSeek, Zhipu, Moonshot, MiniMax, iFlyTek); embedded default manifest loaded at startup, queryable by manifest key or by provider+model_id - Public models endpoint:
GET /api/v1/modelsandGET /api/v1/models/{provider}/{model_id}on the Console server, JSON-tagged snake_case payload for the front-end capability gate; documented indocs/api/models-registry-api.md - Provider quirks routing in bifrost adapter:
ThinkingParamStyleselects the wire format per provider (deepseek_type→extra_params.thinking.type,openai_effort→reasoning.effort,anthropic_budget→reasoning.max_tokens,openrouter→reasoning.enabled);ExtraParamsPassthroughRequiredandToolCallsRequireReasoningFieldnow gate behavior per-quirk instead of being applied unconditionally enable_thinkingprovider config option: per-provider override that drives the new quirk routing without changing the call site- iFlyTek Spark X2 Flash support:
xunfei/spark-x(256K context, text-only, function_calling + reasoning + web_search) wired through the registry and bifrost adapter - Structured tool error classification:
ToolErrorType(tool_timeout/rate_limit/overloaded/contract_fail/dependency_fail/permission_denied/invalid_input/user_aborted/model_error/internal) set by the executor and every built-in tool at the failure site; WebSocket translator emits it ascard.close.error.typeinstead of defaulting all failures tointernal
Changed
- Session-metrics endpoint moved from the WebSocket channel mux to the Console HTTP server (port 8090) so the front-end has one management origin to consult
- Bifrost adapter
Configaccepts a*ProviderQuirksmirror struct; quirks come from the registry at startup and gate behavior that was previously hard-coded (e.g.ExtraParamspassthrough was always-on; now opt-in only for providers that declare it) - Claude model
familyfield regrouped toclaude4.5/claude4.6/claude4.7(wasclaude-haiku/claude-sonnet/claude-opus) so the UI groups by generation rather than tier
Fixed
- DeepSeek thinking mode now actually disables reasoning: adapter sends
thinking: {"type": "disabled"}via ExtraParams per the official docs, and opts into bifrost'sBifrostContextKeyPassthroughExtraParamsso the field reaches the wire (the SDK was silently droppingParams.ExtraParamsduring marshalling, so the legacyenable_thinking: falsefield was both ignored by the server and never sent) reasoning_contentis now preserved on assistant messages across DeepSeek thinking-mode turns when thinking is enabled, and emitted as an empty-string placeholder on tool_calls assistant messages when the provider quirk requires the field present — fixes 400 "reasoning_content must be passed back to the API"- Reasoning replay across turns is suppressed when
enable_thinkingis disabled, instead of inflating input tokens with every prior turn's chain-of-thought - Bifrost adapter waits for the synthetic usage chunk after
finish_reasoninstead of emittingMessageEndimmediately, so token totals are no longer lost when bifrost's OpenAI provider holds usage back to a trailing chunk - Per-model stats now key off the provider-reported model name (echoed back in
Usage) instead of the adapter's configured default, so usage attribution stays correct when fallbacks or model overrides kick in - New session rows are persisted eagerly in
Manager.GetOrCreateso the foreign key forSessionStats.SaveSessionStatsexists by the time the first metrics flush fires (previously dropped withno such row)
v0.0.11
Added
- LLM retry visibility on the wire: every retry the engine schedules emits
card.tick(kind=note)carrying attempt number, planned backoff delay, classified error type, and HTTP status — front-ends can render "重试中 (N/M, Xms 后再试)" instead of staring at silent waits - LLM heartbeat events: 30 s keep-alive ticks during in-flight LLM calls propagate up the parent_card_id chain so long-thinking models no longer cause the surrounding step / agent / message cards to orphan-timeout
- First-byte timeout on LLM calls:
llm.first_byte_timeout(default 120 s) catches "TCP black hole" upstreams that accept the request but never send a chunk; disarms once the first chunk lands so legitimate long thinking preludes are not penalised - Configurable LLM timeouts via yaml:
llm.api_timeout,llm.first_byte_timeout,llm.max_retriesexposed as top-level config keys - Step-decision gate: when retries / replans exhaust, Scheduler and PlanCoordinator emit
prompt.user(kind=step_decision)so the user pickscontinue/retry/cancelinstead of silently falling back - Chain-only lifecycle for orchestration tool cards: Specialists and Task tool cards now opt out of the orphan watchdog but stay tracked in the parent chain — descendant heartbeats still refresh ancestors above
- LLM-driven
LLMSubagentResolverreplaces keyword scoring: structured-output tool call picks the executor fromAvailableSubagentswith rationale, falls back to heuristic on nil-provider / LLM-error / out-of-set-pick - Planner and SubagentResolver route through
retryLLMCall: both now inherit transport-level retry, heartbeats, retry-status events, and FirstByte/API timeouts that L1 emma and L3 sub-agents already have - Per-frame WebSocket trace logging behind
channels.websocket.trace_framestoggle for lifecycle-level debugging without dropping the global log level to DEBUG
Changed
- Plan card parent is now the emitting Specialists agent card (was the turn card); fixes the topology so writer heartbeats walk writer → step → plan → specialists agent → tool → turn and the Specialists tool card stays alive through the whole plan
ping/pongare now top-level wire frames ({"type":"ping"}/{"type":"pong"}) with no envelope / seq / severity — pure liveness signal, decoupled fromsession.event- Specialists L2 worker no longer carries a 15-min hard timeout; AgentTool dispatch no longer carries a 5-min hard timeout — long-running plans are bounded by per-call LLM timeouts and the step_decision gate instead of one wall-clock cap that killed legitimate work mid-flight
- SubmitTaskResult
summaryover 200 chars is now truncated with…instead of rejected, so the sub-agent isn't forced to redraft when slightly over budget - LLM retry plumbing uses
retry.Retryer.DoWith(ctx, fn, onRetry)— per-call observer keeps retry events scoped to one caller'soutchannel even when the Retryer is shared across concurrent sub-agents
Fixed
- Specialists tool card no longer suffers spurious
orphan_timeoutcloses mid-plan: 120 sCardToolwatchdog used to kill the tool card once the planner stopped tick-ing it (e.g. while writer was running), surfacing "工具失败" while the underlying step succeeded - DNS / network errors at the L1 main loop now retry via
retry.Retryerwith the same exponential backoff path L3 sub-agents already used, instead of failing on the first attempt - LLM call ctx cancellation no longer counts as a retryable transient error: a dead ctx propagates through the Retryer as non-retryable so we stop wasting attempts when the parent has already given up
- Wire-frame
card.closefor sub-agent abort no longer maps toStatusOK:"aborted"now producesStatusCancelled(was silently dropped to ok)
Added
- Roster-agnostic LLM planner: replaces the keyword-rule HeuristicPlanner; the LLM produces a step DAG via the
emit_plantool with retry-on-validation, capped atmaxSteps, and intentionally does not see the available sub-agent list —SubagentResolverpicks the executor at dispatch time - Step-level retry on transient failures: scheduler retries up to two attempts on timeout / rate-limit / overloaded / 5xx /
terminal_blocking_limit/terminal_model_errorsignals before falling back to plan-level replan;step_startedfires per attempt andstep_completed/step_failedcarry cumulativeattempts
Changed
plan_reviewconfirmation now matchesprompt.user(question)semantics: nocard.add(plan)while waiting, no orphan watchdog, ctx deadline stripped — the user can take as long as they need to review (protocol v0.4.0)
Fixed
- Tool-result metadata (e.g. WebSearch
urls/query/result_count) now flows through the v2.2 translator toToolPayload.Metadatainstead of being dropped after promotingrender_hint/language/file_path - Scheduler step failures triggered by terminal sub-agent reasons (
model_error/blocking_limit/prompt_too_long) now populateStepResult.Failureswithterminal_<reason>: <message>, log a structured Warn at the scheduler boundary, and feed the retry classifier — previously these failed silently with emptystep_failedpayloads and no server log
v0.0.10
Added
- Roster-agnostic LLM planner: replaces the keyword-rule HeuristicPlanner; the LLM produces a step DAG via the
emit_plantool with retry-on-validation, capped atmaxSteps, and intentionally does not see the available sub-agent list —SubagentResolverpicks the executor at dispatch time - Step-level retry on transient failures: scheduler retries up to two attempts on timeout / rate-limit / overloaded / 5xx /
terminal_blocking_limit/terminal_model_errorsignals before falling back to plan-level replan;step_startedfires per attempt andstep_completed/step_failedcarry cumulativeattempts
Changed
plan_reviewconfirmation now matchesprompt.user(question)semantics: nocard.add(plan)while waiting, no orphan watchdog, ctx deadline stripped — the user can take as long as they need to review (protocol v0.4.0)
Fixed
- Tool-result metadata (e.g. WebSearch
urls/query/result_count) now flows through the v2.2 translator toToolPayload.Metadatainstead of being dropped after promotingrender_hint/language/file_path - Scheduler step failures triggered by terminal sub-agent reasons (
model_error/blocking_limit/prompt_too_long) now populateStepResult.Failureswithterminal_<reason>: <message>, log a structured Warn at the scheduler boundary, and feed the retry classifier — previously these failed silently with emptystep_failedpayloads and no server log
v0.0.9
Added
- L2 multi-mode coordinator framework: routes Specialists / Orchestrate work through one of pluggable modes (ReAct, Plan), with
Planner,Scheduler,Judge,Escalation,Fallback,Budget,ModeSelect, andSubagentResolveras independently testable components - Plan-mode user-confirmation flow: when
user.message.plan_confirmation: "required", the framework emitsplan.proposed(carrying the editable step DAG) and blocks until the client returnsplan.responsewith approval / edits / rejection; mirrored byplan.approvedecho (protocol v1.15+) - Plan / Step lifecycle emit events from the PlanCoordinator path:
plan.created/plan.updated/plan.completed/plan.failedandstep.dispatched/step.completed/step.failed/step.skipped(protocol v1.16; envelope/display/metrics are placeholder, payload is fully populated) - Coordinator-mode threading:
user.message.coordinator_modeselects ReAct vs Plan per turn;tool.WithCoordinatorModectx value flows from router → Specialists → SpawnConfig - LLM call timing breakdown on
llmCallResult(firstByteAt/lastChunkAt/endAt) for diagnosing gateway hangs, extended thinking, and network buffering separately
Changed
- Plan-step
skillfield renamed tosubagent_typeandavailable_skillstoavailable_subagentsto disambiguate fromAgentDefinition.Skills(capability tags);subagent_typeis now optional, withSubagentResolverpicking the executor at dispatch time - emma's task-dispatch principles now require an explicit clarification-merge self-check before any Specialists spawn — original-request plus user-clarification answer must be combined into the task string
Fixed
- Bifrost adapter no longer hangs ~6m40s after the model finishes streaming:
consumeStreamreturns as soon as a chunk carriesFinishReason, instead of waiting for the upstream chunk channel to close on its own (which only happens at the underlying HTTP idle timeout) - AskUserQuestion tool description gained an explicit reminder that the user's clarification answer must be folded into the next task / prompt — previous wording let the LLM forward the original (un-clarified) request
v0.0.7
Added
- Three-tier agent architecture: emma (L1, user-facing) → Specialists (L2, coordinator) → workers (L3, leaf executors), with strict per-tier tool filtering and prompt isolation
TierSubAgentcontract onAgentDefinition: every L3 worker declaresOutputSchema,InputSchema,Limitations,ExampleTasks,CostTier, andTemperature;Tierroutes spawns to the strict L3 driverrunSubAgentDriver: leaf-only ReAct loop with self-check, nudge cap (3x), reject cap (3x), and SubmitTaskResult-or-EscalateToPlanner termination- 7 redesigned built-in workers (
writer,researcher,analyst,developer,travel_planner,recommender,scheduler) as pure functional L3 with specializedSystemPromptand per-worker schemas SubmitTaskResulttool with server-side validation (lineage, temporal window, role/type/mime, OutputSchema)EscalateToPlannertool: L3 needs-planning escape hatch (paired with SubmitTaskResult)SpawnConfig.Inputs+ InputSchema validation inSpawnSync— validated before any LLM call- Artifact subsystem with task contract: SQLite-backed store at
~/.harnessclaw/db/artifacts.db,ArtifactRead/ArtifactWritetools, available-artifacts preamble auto-injected on sub-agent spawn, TTL janitor, parent-version chains - WebSocket forwarding of artifact refs and L3 sub-agent task/intent for richer client observability (protocol v1.12)
- Plan-based DAG executor for
Orchestratetool withPlannerListingrich-metadata routing - Structured lifecycle event protocol via
internal/emitpackage AskUserQuestiontool for clarification flows;WebSearchupdatesSafetyLevelclassification (safe / caution / dangerous) on every built-in tool, withWithoutDangerousUnlessfilter for L3 pools- Minimal in-house JSON-schema validator (
ValidateAgainstSchema) shared by SubmitTaskResult and InputSchema enforcement BuildFunctionalIdentityfor L3 workers — team-free, persona-free identity that does not leak emma's L1 prompt
Changed
- Worker system prompts no longer inject team affiliation or personality for
TierSubAgent; identity becomes purely functional lifestyleworker split into single-responsibilitytravel_plannerandrecommender- Specialists / Orchestrate consume structured
PlannerListingfor richer routing decisions - All user-facing tool descriptions translated to Chinese for consistency
- emma identity phrasing tightened in the L1 prompt
- Artifact subsystem redesigned around the task contract (producer task_id stamping, expected-outputs render block)
Fixed
ArtifactReaddefends againstartifact_idhallucination: detects UUID-shaped fabrications (8-4-4-4-12 dashed and 32-hex compact) and instructs the LLM to escalate instead of retry, preventing the retry-then-fabricate loop that previously consumed long stretches of LLM time- Artifacts guidance section now shows the real ID format example and explicit "don't know an ID? EscalateToPlanner" instruction
Removed
Personalityfield is no longer rendered into TierSubAgent system prompts (kept on the definition for team-table metadata only)
v0.0.6
Added
- Agent definition persistence: SQLite-backed
AgentServiceand console HTTP API (/console/v1/agents) for create/list/get/update/delete/import operations; built-in definitions are synced on startup and YAML files are imported on demand - Console management API server with configurable host/port (
console.enabled/host/portconfig keys, defaults to0.0.0.0:8090) - Per-agent skill whitelist enforcement:
AgentDefinition.SkillsandSpawnConfig.AllowedSkillsare now respected bySkillTool, which rejects invocations of skills not on the list - Per-agent tool whitelist enforcement:
AgentDefinition.AllowedToolsfilters the sub-agent tool pool via the newToolPool.FilterByNamesmethod Personalityfield is injected into the auto-generated worker identity prompt- WebSocket
deliverable.readyevent: sub-agent file outputs (FileWrite results) surface to the client withfile_path,language, andbyte_sizefor direct rendering or download <summary>tag protocol for sub-agent outputs: Worker / Explore / Plan profiles require sub-agents to wrap their core conclusion in<summary>...</summary>; the engine extracts it intoSpawnResult.Summaryand returns only summary + deliverables to the parent agentTaskRegistry: full sub-agent results are stored in-engine by agent ID for context passing and debugging while the parent only sees summariesSpawnResultnow carries structuredSummary,Status, andAttemptsfields
Changed
- Emma system prompt restructured into a three-layer architecture: persona/team/judgment/delivery only; dispatch protocol, retry rules, and multi-agent coordination paragraphs moved to application code
- Sub-agent system prompts no longer hard-code agent names in role overrides; dynamic
SystemPromptOverridefrom agent definition takes precedence over static profileSectionOverridesfor the role section - Section headings dropped numeric prefixes ("一、二、三") so reordering does not break the prompt
- All built-in section content translated to Chinese (env, tools, memory, skills, task, currentdate, artifacts)
FileWriteno longer auto-creates parent directories; callers must ensure the directory exists or the tool returns an explicit error pointing at the missing path; the schema description now hints at a default working directory- Agent definitions are no longer auto-scanned from
.harnessclaw/agents/on startup; use the import endpoint instead
Removed
- Coordinator system prompt and
CoordinatorProfile: orchestration is now an L2 application-code concern, planned to land as a code-driven Orchestrate tool in a follow-up - Static
output/rulesprompt sections: their delivery rules are folded intoprinciples
v0.0.5
Added
- Universal artifact store: session-scoped content store for large tool results with automatic threshold-based replacement, frozen replacement decisions for prompt cache stability, and pre-LLM compaction
- ArtifactGet tool: LLM retrieves full artifact content by ID without regenerating
- Write tool
artifact_refparameter: write artifact content to files by reference, saving output tokens - Artifact-aware compaction: replaces artifact-backed tool results with compact references before LLM summarization
- SQLite artifact persistence:
artifactstable for persisting large tool results across server restarts - Multi-agent orchestration system: sub-agent spawning (sync/async/fork), loopConfig-parameterized query loop, coordinator mode with team management
- Agent tool with
SpawnSync/SpawnAsync,InheritedCheckerpermission inheritance, andLongRunningToolinterface for timeout bypass - Task system:
TaskCreate/TaskGet/TaskList/TaskUpdatetools with in-memory and SQLite-backed stores - Team management:
TeamCreate/TeamDeletetools,MessageBrokerwith mailbox-based inter-agent messaging,SendMessagetool - @-mention routing:
MentionParserextracts@agent_namefrom user messages, routes to registered agent definitions loaded from YAML - Coordinator mode: system prompt rewrite to dispatcher role with 4-phase workflow (research → synthesis → implementation → verification)
- WebSocket sub-agent and multi-agent event protocol:
subagent.start/end,agent.routed/spawned/idle/completed/failed,task.created/updated,agent.message,team.created/member_join/member_left/deleted - Render hint metadata on tool results:
render_hint,language,file_pathfields promoted to top-level intool.endWebSocket messages - Language detection utility mapping file extensions to language identifiers for render hints
- Web search tools: Tavily search and iFly/Xunfei search integrations
- LLM retry with exponential backoff for transient provider failures
- Mock LLM provider and stream builder for unit testing
- Current date section in system prompt
- Artifact guidance section in system prompt teaching LLM about artifact usage patterns
Changed
- Storage architecture: removed memory/sqlite switch; SQLite is now always the persistence backend,
Manager.activemap serves as in-memory cache - Bifrost adapter error messages now include HTTP status code, error type/code, and underlying error details for easier troubleshooting
- Bifrost stream idle timeout increased to 300s and request timeout to 600s for sub-agent workloads
- System prompt role and output style sections rewritten with improved anti-AI-speak guidance
Fixed
- Session persistence across server restarts: eliminated config-level memory/sqlite choice that prevented SQLite from being used
- Bifrost error messages no longer hide HTTP-level error details behind generic constant strings
- Missing
SourceSkillDirconstant in command source enumeration