feat: implement OpenSkald Agent Evolution — memory-native runtime, reflection engine, collaborative multi-agent mode#6
Conversation
Full implementation of the agent evolution plan including:
Phase A - Memory & Runtime Tracking:
- New domain models: MemoryRecord, AgentExperience, AgentReflection,
AgentMetric, AgentRun, AgentSpec, AgentContext, AgentResult
- MemoryStore extended with namespace query (search_namespace,
list_reflections, list_experiences)
- Experience recording wired into ContentAgent.generate()
and PublishingAgent.publish_content()
- API: GET /api/memory/records, GET /api/memory/reflections,
POST /api/memory/reflections/discover
- CLI: memory-list, reflections-discover, metrics-import
Phase B - Reflection Engine:
- New ReflectionAgent with LLM-based structured reflection generation
- SkillEvolutionAgent.discover_proposals() now prioritizes
reflection-based discovery before heuristic rules
Phase C - Runtime Facade:
- New OpenSkaldAgentRuntime wrapping all agents with
AgentRun lifecycle tracking
- API: POST /api/agent/runs, GET /api/agent/runs, GET /api/agent/runs/{id}
- CLI: agent-run (single/collaborative mode)
Phase D - Collaborative Mode:
- MultiAgentOrchestrator with deterministic 6-step workflow:
Research → Writing → Review → Store → Reflection → Growth
- ResearchAgent, WritingAgent, ReviewAgent, GrowthAgent
- ReviewAgent quality veto with one-shot revision
Phase E - OpenViking Workspace Pattern:
- MemoryBackend abstraction with JsonlMemoryBackend
and OpenVikingMemoryBackend interface
Per-Agent LLM Support:
- AgentLLMConfig with PartialLLMConfig for per-agent overrides
- content/reflection/writing each support independent LLM config
- Falls back to global llm config when no override specified
Config: agent.mode, agent.reflection, agent.collaboration, agent_llm
Tests: 111 passing (28 new)
There was a problem hiding this comment.
Code Review
This pull request introduces a multi-agent collaborative framework, adding several specialized agents (Growth, Reflection, Research, Review, Writing, and an Orchestrator), a unified runtime, namespace-based memory backends, and associated API/CLI endpoints. The feedback highlights critical concurrency issues where JSONL files are fully rewritten instead of appended, which could lead to race conditions and data loss. Additionally, the reviewer identified correctness bugs in the orchestrator's post-revision and error-handling loops, redundant imports and validations inside nested loops, dead code, and opportunities to optimize LLM context usage by filtering research articles.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
This PR implements the OpenSkald Agent Evolution Plan, evolving the single
ContentAgentinto a memory-native, reflection-capable, multi-agent runtime while keeping backward compatibility.Design Philosophy: Agent Collaboration & Interaction Flow
Why Deterministic Orchestration Over LLM Handoff
The architecture deliberately avoids letting agents freely hand off to each other (a common pattern in frameworks like AutoGen or OpenAI Swarm). Instead, a central Orchestrator controls the workflow deterministically:
Benefits of this design:
SourceBrief → PlatformDraft[] → ReviewReport), not unbounded conversation histories. Each step sees exactly the data it needs.AgentRun. You can inspect exactly which agent produced what, when, and with which errors.max_turns(default 8). If a step fails, the error is recorded and the run continues with degraded status instead of looping forever.Two Operating Modes
Single mode (
agent.mode: single, default):ContentAgent remains the single content generator. Reflection happens after generation. No orchestration overhead. Zero behavioral change for existing callers.
Collaborative mode (
agent.mode: collaborative):Needed only when the task benefits from separated concerns (e.g., "research this topic, write for blog+X, and let the quality checker validate").
Three Human-in-the-Loop Gates
Human oversight is preserved as hard gates, not optional callbacks:
Content release: All generated content starts as
PENDING_REVIEW. Publishing requires explicit human approval (POST /api/review/{id}/approve). This is enforced at the config level (review.require_human_approval: true) and is mandatory in production.Skill proposals:
SkillEvolutionAgent.discover_proposals()createsSkillProposalobjects inPENDING_REVIEWstatus. Even after human approval, the generated skill file is written withenabled: false— a human must manually flip the switch to activate it.Review queue:
GET /api/review?status=pending_reviewandreview-list --status pending_reviewexpose the queue for dashboard or CLI inspection.Every approve/reject/publish action writes to
viking://agent/experience, creating a closed feedback loop for the ReflectionAgent.Shared Memory, No Silos
All agents share the same memory pool via namespaced JSONL records:
viking://agent/experienceviking://agent/reflectionsviking://agent/metricsviking://agent/plansThe namespace pattern (
viking://*) is designed for future migration to a remote OpenViking workspace while keeping the local JSONL implementation as the default and fallback.Per-Agent LLM Configuration
Each agent that uses an LLM can be configured independently:
This lets you route cheap tasks (reflection summarization) to a lightweight model and expensive tasks (content generation) to a capable one — without changing any agent code.
Agent Boundaries (Least Privilege)
Each agent exposes the minimal tool set needed for its role:
What Changed
New Files (8 agent modules + 1 backend module)
agents/reflection_agent.pyagents/research_agent.pyagents/writing_agent.pyagents/review_agent.pyagents/growth_agent.pyagents/runtime.pyOpenSkaldAgentRuntime— unified facadeagents/orchestrator.pyMultiAgentOrchestrator— deterministic 6-step flowmemory/backend.pyMemoryBackendabstractionModified Files (12)
domain/models.pymemory/store.pyconfig/settings.pybootstrap.pyapi/routes.pycli.pyagents/content_agent.pyagents/publishing_agent.pyagents/skill_evolution_agent.pyconfig/demo.yamlTests
Verification
ruff check .— All checks passedpytest— 111 passedscripts/check.sh config/demo.yaml— Full integration test passedagent-run --mode single— status=completed, 0 errorsagent-run --mode collaborative— status=completed, 7 artifacts, 0 errors