Skip to content

feat: implement OpenSkald Agent Evolution — memory-native runtime, reflection engine, collaborative multi-agent mode#6

Merged
deitxfge merged 2 commits into
mainfrom
codex/agent-evolution
Jun 8, 2026
Merged

feat: implement OpenSkald Agent Evolution — memory-native runtime, reflection engine, collaborative multi-agent mode#6
deitxfge merged 2 commits into
mainfrom
codex/agent-evolution

Conversation

@skyloevil

@skyloevil skyloevil commented Jun 7, 2026

Copy link
Copy Markdown
Owner

This PR implements the OpenSkald Agent Evolution Plan, evolving the single ContentAgent into a memory-native, reflection-capable, multi-agent runtime while keeping backward compatibility.


Design Philosophy: Agent Collaboration & Interaction Flow

Why Deterministic Orchestration Over LLM Handoff

The architecture deliberately avoids letting agents freely hand off to each other (a common pattern in frameworks like AutoGen or OpenAI Swarm). Instead, a central Orchestrator controls the workflow deterministically:

User objective
  → ResearchAgent    (collect source material)
  → WritingAgent     (generate platform drafts)
  → ReviewAgent      (quality gate — veto power)
  → Store            (persist as pending_review)
  → ReflectionAgent  (distill lessons into memory)
  → GrowthAgent      (propose skills / strategy)

Benefits of this design:

  • No context loss: Agents pass structured artifacts (SourceBrief → PlatformDraft[] → ReviewReport), not unbounded conversation histories. Each step sees exactly the data it needs.
  • Traceable: Every turn is logged in AgentRun. You can inspect exactly which agent produced what, when, and with which errors.
  • Recoverable: The Orchestrator enforces max_turns (default 8). If a step fails, the error is recorded and the run continues with degraded status instead of looping forever.
  • Testable: Deterministic = predictable. Unit tests cover every step in isolation and the full workflow end-to-end.

Two Operating Modes

Single mode (agent.mode: single, default):

Runtime._run_single()
  → ContentAgent.generate()       (LLM-driven, uses existing Skills)
  → ReflectionAgent.discover()    (post-hoc reflection)
  → AgentRun recorded

ContentAgent remains the single content generator. Reflection happens after generation. No orchestration overhead. Zero behavioral change for existing callers.

Collaborative mode (agent.mode: collaborative):

MultiAgentOrchestrator.run()
  1. ResearchAgent.research()      → SourceBrief
  2. WritingAgent.write()          → PlatformDraft[]
  3. ReviewAgent.review()          → ReviewReport
     ├─ approved  → continue
     └─ rejected  → one-shot revise, then continue
  4. stored as pending_review content
  5. ReflectionAgent.reflect_on_experiences()
  6. GrowthAgent.analyze()
  → AgentResult (artifacts + memory_writes)

Needed only when the task benefits from separated concerns (e.g., "research this topic, write for blog+X, and let the quality checker validate").

Three Human-in-the-Loop Gates

Human oversight is preserved as hard gates, not optional callbacks:

  1. Content release: All generated content starts as PENDING_REVIEW. Publishing requires explicit human approval (POST /api/review/{id}/approve). This is enforced at the config level (review.require_human_approval: true) and is mandatory in production.

  2. Skill proposals: SkillEvolutionAgent.discover_proposals() creates SkillProposal objects in PENDING_REVIEW status. Even after human approval, the generated skill file is written with enabled: false — a human must manually flip the switch to activate it.

  3. Review queue: GET /api/review?status=pending_review and review-list --status pending_review expose the queue for dashboard or CLI inspection.

Every approve/reject/publish action writes to viking://agent/experience, creating a closed feedback loop for the ReflectionAgent.

Shared Memory, No Silos

All agents share the same memory pool via namespaced JSONL records:

Namespace Purpose
viking://agent/experience Every action (generate/approve/reject/publish)
viking://agent/reflections Structured lessons from ReflectionAgent / GrowthAgent
viking://agent/metrics External metrics imported via GrowthAgent
viking://agent/plans Plan records from MultiAgentOrchestrator

The namespace pattern (viking://*) is designed for future migration to a remote OpenViking workspace while keeping the local JSONL implementation as the default and fallback.

Per-Agent LLM Configuration

Each agent that uses an LLM can be configured independently:

llm:  # global default
  provider: deepseek
  model: deepseek-v4-flash

agent_llm:  # per-agent overrides (partial — unspecified fields inherit from global)
  reflection:
    provider: openai
    model: gpt-4o
  writing:
    model: deepseek-chat

This lets you route cheap tasks (reflection summarization) to a lightweight model and expensive tasks (content generation) to a capable one — without changing any agent code.

Agent Boundaries (Least Privilege)

Each agent exposes the minimal tool set needed for its role:

Agent Can do Cannot do
ResearchAgent Query knowledge base + memory Write content, publish
WritingAgent Generate drafts via Skills Modify skills, publish
ReviewAgent Validate platform rules, veto Generate content, publish
PublishingAgent Validate + publish Generate content, modify skills
GrowthAgent Read metrics/reflections, propose Publish, modify skills
ReflectionAgent Read experiences, write reflections Modify content, publish

What Changed

New Files (8 agent modules + 1 backend module)

File Purpose
agents/reflection_agent.py Structured reflection from experiences
agents/research_agent.py Source collection for collaborative mode
agents/writing_agent.py Draft generation for collaborative mode
agents/review_agent.py Quality gate with veto
agents/growth_agent.py Metric consumption + skill proposal
agents/runtime.py OpenSkaldAgentRuntime — unified facade
agents/orchestrator.py MultiAgentOrchestrator — deterministic 6-step flow
memory/backend.py MemoryBackend abstraction

Modified Files (12)

File Changes
domain/models.py +13 model types (MemoryRecord, AgentExperience, AgentReflection, AgentMetric, AgentRun, AgentSpec, AgentContext, AgentResult, SourceBrief, PlatformDraft, ReviewReport, AgentMode, AgentRunStatus)
memory/store.py +append_memory_record, search_namespace, list_reflections, list_experiences
config/settings.py +AgentConfig, ReflectionConfig, CollaborationConfig, PartialLLMConfig, AgentLLMConfig
bootstrap.py +_build_agent_llm, per-agent LLM wiring
api/routes.py +7 endpoints (agent/runs, memory/records, memory/reflections, reflections/discover, metrics/import)
cli.py +4 commands (agent-run, memory-list, reflections-discover, metrics-import)
agents/content_agent.py Experience recording on generate
agents/publishing_agent.py Experience recording on publish
agents/skill_evolution_agent.py Reflection-based discovery
config/demo.yaml Agent config section

Tests

  • 28 new tests across 9 test files
  • 111 tests total, all passing
  • Coverage: MemoryStore namespace, ReflectionAgent, Runtime, Orchestrator, GrowthAgent, ReviewAgent, ResearchAgent, skill evolution reflection discovery, per-agent LLM config, CLI experience recording

Verification

  • ruff check . — All checks passed
  • pytest — 111 passed
  • scripts/check.sh config/demo.yaml — Full integration test passed
  • agent-run --mode single — status=completed, 0 errors
  • agent-run --mode collaborative — status=completed, 7 artifacts, 0 errors

Full implementation of the agent evolution plan including:

Phase A - Memory & Runtime Tracking:
- New domain models: MemoryRecord, AgentExperience, AgentReflection,
  AgentMetric, AgentRun, AgentSpec, AgentContext, AgentResult
- MemoryStore extended with namespace query (search_namespace,
  list_reflections, list_experiences)
- Experience recording wired into ContentAgent.generate()
  and PublishingAgent.publish_content()
- API: GET /api/memory/records, GET /api/memory/reflections,
  POST /api/memory/reflections/discover
- CLI: memory-list, reflections-discover, metrics-import

Phase B - Reflection Engine:
- New ReflectionAgent with LLM-based structured reflection generation
- SkillEvolutionAgent.discover_proposals() now prioritizes
  reflection-based discovery before heuristic rules

Phase C - Runtime Facade:
- New OpenSkaldAgentRuntime wrapping all agents with
  AgentRun lifecycle tracking
- API: POST /api/agent/runs, GET /api/agent/runs, GET /api/agent/runs/{id}
- CLI: agent-run (single/collaborative mode)

Phase D - Collaborative Mode:
- MultiAgentOrchestrator with deterministic 6-step workflow:
  Research → Writing → Review → Store → Reflection → Growth
- ResearchAgent, WritingAgent, ReviewAgent, GrowthAgent
- ReviewAgent quality veto with one-shot revision

Phase E - OpenViking Workspace Pattern:
- MemoryBackend abstraction with JsonlMemoryBackend
  and OpenVikingMemoryBackend interface

Per-Agent LLM Support:
- AgentLLMConfig with PartialLLMConfig for per-agent overrides
- content/reflection/writing each support independent LLM config
- Falls back to global llm config when no override specified

Config: agent.mode, agent.reflection, agent.collaboration, agent_llm
Tests: 111 passing (28 new)
@skyloevil skyloevil marked this pull request as ready for review June 7, 2026 08:49
@skyloevil skyloevil assigned skyloevil, ideazw and deitxfge and unassigned skyloevil Jun 7, 2026
@skyloevil skyloevil added the enhancement New feature or request label Jun 7, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a multi-agent collaborative framework, adding several specialized agents (Growth, Reflection, Research, Review, Writing, and an Orchestrator), a unified runtime, namespace-based memory backends, and associated API/CLI endpoints. The feedback highlights critical concurrency issues where JSONL files are fully rewritten instead of appended, which could lead to race conditions and data loss. Additionally, the reviewer identified correctness bugs in the orchestrator's post-revision and error-handling loops, redundant imports and validations inside nested loops, dead code, and opportunities to optimize LLM context usage by filtering research articles.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread backend/app/memory/store.py
Comment thread backend/app/agents/runtime.py
Comment thread backend/app/agents/runtime.py
Comment thread backend/app/agents/orchestrator.py
Comment thread backend/app/agents/orchestrator.py
Comment thread backend/app/agents/research_agent.py
Comment thread backend/app/agents/writing_agent.py
Comment thread backend/app/agents/writing_agent.py Outdated
Comment thread backend/app/memory/backend.py Outdated
Comment thread backend/app/agents/orchestrator.py
@skyloevil skyloevil changed the title feat: implement OpenSkald Agent Evolution (Phase A-E) feat: implement OpenSkald Agent Evolution — memory-native runtime, reflection engine, collaborative multi-agent mode Jun 7, 2026
@deitxfge deitxfge merged commit 34a3912 into main Jun 8, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants