feat: implement OpenSkald Agent Evolution — memory-native runtime, reflection engine, collaborative multi-agent mode by skyloevil · Pull Request #6 · skyloevil/OpenSkald

skyloevil · 2026-06-07T08:48:37Z

This PR implements the OpenSkald Agent Evolution Plan, evolving the single ContentAgent into a memory-native, reflection-capable, multi-agent runtime while keeping backward compatibility.

Design Philosophy: Agent Collaboration & Interaction Flow

Why Deterministic Orchestration Over LLM Handoff

The architecture deliberately avoids letting agents freely hand off to each other (a common pattern in frameworks like AutoGen or OpenAI Swarm). Instead, a central Orchestrator controls the workflow deterministically:

User objective
  → ResearchAgent    (collect source material)
  → WritingAgent     (generate platform drafts)
  → ReviewAgent      (quality gate — veto power)
  → Store            (persist as pending_review)
  → ReflectionAgent  (distill lessons into memory)
  → GrowthAgent      (propose skills / strategy)

Benefits of this design:

No context loss: Agents pass structured artifacts (SourceBrief → PlatformDraft[] → ReviewReport), not unbounded conversation histories. Each step sees exactly the data it needs.
Traceable: Every turn is logged in AgentRun. You can inspect exactly which agent produced what, when, and with which errors.
Recoverable: The Orchestrator enforces max_turns (default 8). If a step fails, the error is recorded and the run continues with degraded status instead of looping forever.
Testable: Deterministic = predictable. Unit tests cover every step in isolation and the full workflow end-to-end.

Two Operating Modes

Single mode (agent.mode: single, default):

Runtime._run_single()
  → ContentAgent.generate()       (LLM-driven, uses existing Skills)
  → ReflectionAgent.discover()    (post-hoc reflection)
  → AgentRun recorded

ContentAgent remains the single content generator. Reflection happens after generation. No orchestration overhead. Zero behavioral change for existing callers.

Collaborative mode (agent.mode: collaborative):

MultiAgentOrchestrator.run()
  1. ResearchAgent.research()      → SourceBrief
  2. WritingAgent.write()          → PlatformDraft[]
  3. ReviewAgent.review()          → ReviewReport
     ├─ approved  → continue
     └─ rejected  → one-shot revise, then continue
  4. stored as pending_review content
  5. ReflectionAgent.reflect_on_experiences()
  6. GrowthAgent.analyze()
  → AgentResult (artifacts + memory_writes)

Needed only when the task benefits from separated concerns (e.g., "research this topic, write for blog+X, and let the quality checker validate").

Three Human-in-the-Loop Gates

Human oversight is preserved as hard gates, not optional callbacks:

Content release: All generated content starts as PENDING_REVIEW. Publishing requires explicit human approval (POST /api/review/{id}/approve). This is enforced at the config level (review.require_human_approval: true) and is mandatory in production.
Skill proposals: SkillEvolutionAgent.discover_proposals() creates SkillProposal objects in PENDING_REVIEW status. Even after human approval, the generated skill file is written with enabled: false — a human must manually flip the switch to activate it.
Review queue: GET /api/review?status=pending_review and review-list --status pending_review expose the queue for dashboard or CLI inspection.

Every approve/reject/publish action writes to viking://agent/experience, creating a closed feedback loop for the ReflectionAgent.

Shared Memory, No Silos

All agents share the same memory pool via namespaced JSONL records:

Namespace	Purpose
`viking://agent/experience`	Every action (generate/approve/reject/publish)
`viking://agent/reflections`	Structured lessons from ReflectionAgent / GrowthAgent
`viking://agent/metrics`	External metrics imported via GrowthAgent
`viking://agent/plans`	Plan records from MultiAgentOrchestrator

The namespace pattern (viking://*) is designed for future migration to a remote OpenViking workspace while keeping the local JSONL implementation as the default and fallback.

Per-Agent LLM Configuration

Each agent that uses an LLM can be configured independently:

llm:  # global default
  provider: deepseek
  model: deepseek-v4-flash

agent_llm:  # per-agent overrides (partial — unspecified fields inherit from global)
  reflection:
    provider: openai
    model: gpt-4o
  writing:
    model: deepseek-chat

This lets you route cheap tasks (reflection summarization) to a lightweight model and expensive tasks (content generation) to a capable one — without changing any agent code.

Agent Boundaries (Least Privilege)

Each agent exposes the minimal tool set needed for its role:

Agent	Can do	Cannot do
ResearchAgent	Query knowledge base + memory	Write content, publish
WritingAgent	Generate drafts via Skills	Modify skills, publish
ReviewAgent	Validate platform rules, veto	Generate content, publish
PublishingAgent	Validate + publish	Generate content, modify skills
GrowthAgent	Read metrics/reflections, propose	Publish, modify skills
ReflectionAgent	Read experiences, write reflections	Modify content, publish

What Changed

New Files (8 agent modules + 1 backend module)

File	Purpose
`agents/reflection_agent.py`	Structured reflection from experiences
`agents/research_agent.py`	Source collection for collaborative mode
`agents/writing_agent.py`	Draft generation for collaborative mode
`agents/review_agent.py`	Quality gate with veto
`agents/growth_agent.py`	Metric consumption + skill proposal
`agents/runtime.py`	`OpenSkaldAgentRuntime` — unified facade
`agents/orchestrator.py`	`MultiAgentOrchestrator` — deterministic 6-step flow
`memory/backend.py`	`MemoryBackend` abstraction

Modified Files (12)

File	Changes
`domain/models.py`	+13 model types (MemoryRecord, AgentExperience, AgentReflection, AgentMetric, AgentRun, AgentSpec, AgentContext, AgentResult, SourceBrief, PlatformDraft, ReviewReport, AgentMode, AgentRunStatus)
`memory/store.py`	+append_memory_record, search_namespace, list_reflections, list_experiences
`config/settings.py`	+AgentConfig, ReflectionConfig, CollaborationConfig, PartialLLMConfig, AgentLLMConfig
`bootstrap.py`	+_build_agent_llm, per-agent LLM wiring
`api/routes.py`	+7 endpoints (agent/runs, memory/records, memory/reflections, reflections/discover, metrics/import)
`cli.py`	+4 commands (agent-run, memory-list, reflections-discover, metrics-import)
`agents/content_agent.py`	Experience recording on generate
`agents/publishing_agent.py`	Experience recording on publish
`agents/skill_evolution_agent.py`	Reflection-based discovery
`config/demo.yaml`	Agent config section

Tests

28 new tests across 9 test files
111 tests total, all passing
Coverage: MemoryStore namespace, ReflectionAgent, Runtime, Orchestrator, GrowthAgent, ReviewAgent, ResearchAgent, skill evolution reflection discovery, per-agent LLM config, CLI experience recording

Verification

ruff check . — All checks passed
pytest — 111 passed
scripts/check.sh config/demo.yaml — Full integration test passed
agent-run --mode single — status=completed, 0 errors
agent-run --mode collaborative — status=completed, 7 artifacts, 0 errors

Full implementation of the agent evolution plan including: Phase A - Memory & Runtime Tracking: - New domain models: MemoryRecord, AgentExperience, AgentReflection, AgentMetric, AgentRun, AgentSpec, AgentContext, AgentResult - MemoryStore extended with namespace query (search_namespace, list_reflections, list_experiences) - Experience recording wired into ContentAgent.generate() and PublishingAgent.publish_content() - API: GET /api/memory/records, GET /api/memory/reflections, POST /api/memory/reflections/discover - CLI: memory-list, reflections-discover, metrics-import Phase B - Reflection Engine: - New ReflectionAgent with LLM-based structured reflection generation - SkillEvolutionAgent.discover_proposals() now prioritizes reflection-based discovery before heuristic rules Phase C - Runtime Facade: - New OpenSkaldAgentRuntime wrapping all agents with AgentRun lifecycle tracking - API: POST /api/agent/runs, GET /api/agent/runs, GET /api/agent/runs/{id} - CLI: agent-run (single/collaborative mode) Phase D - Collaborative Mode: - MultiAgentOrchestrator with deterministic 6-step workflow: Research → Writing → Review → Store → Reflection → Growth - ResearchAgent, WritingAgent, ReviewAgent, GrowthAgent - ReviewAgent quality veto with one-shot revision Phase E - OpenViking Workspace Pattern: - MemoryBackend abstraction with JsonlMemoryBackend and OpenVikingMemoryBackend interface Per-Agent LLM Support: - AgentLLMConfig with PartialLLMConfig for per-agent overrides - content/reflection/writing each support independent LLM config - Falls back to global llm config when no override specified Config: agent.mode, agent.reflection, agent.collaboration, agent_llm Tests: 111 passing (28 new)

gemini-code-assist

Code Review

This pull request introduces a multi-agent collaborative framework, adding several specialized agents (Growth, Reflection, Research, Review, Writing, and an Orchestrator), a unified runtime, namespace-based memory backends, and associated API/CLI endpoints. The feedback highlights critical concurrency issues where JSONL files are fully rewritten instead of appended, which could lead to race conditions and data loss. Additionally, the reviewer identified correctness bugs in the orchestrator's post-revision and error-handling loops, redundant imports and validations inside nested loops, dead code, and opportunities to optimize LLM context usage by filtering research articles.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

skyloevil marked this pull request as ready for review June 7, 2026 08:49

skyloevil assigned skyloevil, ideazw and deitxfge and unassigned skyloevil Jun 7, 2026

skyloevil added the enhancement New feature or request label Jun 7, 2026

gemini-code-assist Bot reviewed Jun 7, 2026

View reviewed changes

skyloevil changed the title ~~feat: implement OpenSkald Agent Evolution (Phase A-E)~~ feat: implement OpenSkald Agent Evolution — memory-native runtime, reflection engine, collaborative multi-agent mode Jun 7, 2026

fix: address agent review feedback

dcfa48b

deitxfge approved these changes Jun 8, 2026

View reviewed changes

deitxfge merged commit 34a3912 into main Jun 8, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement OpenSkald Agent Evolution — memory-native runtime, reflection engine, collaborative multi-agent mode#6

feat: implement OpenSkald Agent Evolution — memory-native runtime, reflection engine, collaborative multi-agent mode#6
deitxfge merged 2 commits into
mainfrom
codex/agent-evolution

skyloevil commented Jun 7, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

skyloevil commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Design Philosophy: Agent Collaboration & Interaction Flow

Why Deterministic Orchestration Over LLM Handoff

Two Operating Modes

Three Human-in-the-Loop Gates

Shared Memory, No Silos

Per-Agent LLM Configuration

Agent Boundaries (Least Privilege)

What Changed

New Files (8 agent modules + 1 backend module)

Modified Files (12)

Tests

Verification

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

skyloevil commented Jun 7, 2026 •

edited

Loading