You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This document proposes migrating the current class-based CodeMode pipeline
to a SKILLS-driven, agent-agnostic orchestration model with deterministic protocol enforcement.
The goal is to decouple:
The evaluation methodology (CodeMode approach)
The internal orchestration agents
The concrete LLM used
The execution control layer
By moving to SKILLS + TestSession, the system becomes:
Agent-agnostic
Protocol-driven
Deterministically enforced
Replayable
Production-compatible
More native to the code-writing agent
More portable across LLM systems
Motivation
Current architecture:
CodeModeGenerator owns:
Exploration agent
Spec generation agent
Refinement agent
Orchestration logic is programmatic
Internal agents are fixed
This means:
Pipeline quality depends partly on internal prompt design
Code-writing agent is separate from eval-writing agent
Codebase awareness (fixtures, serializers, helpers) is limited
Determinism is achieved only through code-level control
We want:
The same agent that writes code
to also execute the evaluation protocol
using structured SKILLS
while still being programmatically constrained.
Core Idea
Instead of:
Pipeline calls internal agents
We move to:
External agent executes protocol steps via SKILLS
under a deterministic TestSession controller
CodeMode becomes a strict evaluation protocol,
not a specific generator implementation.
The agent is no longer the orchestrator.
It is a participant in a state machine.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
SKILL DRIVEN ORCHESTRATION
Overview
This document proposes migrating the current class-based CodeMode pipeline
to a SKILLS-driven, agent-agnostic orchestration model with deterministic protocol enforcement.
The goal is to decouple:
By moving to SKILLS + TestSession, the system becomes:
Motivation
Current architecture:
CodeModeGenerator owns:
Orchestration logic is programmatic
Internal agents are fixed
This means:
We want:
Core Idea
Instead of:
We move to:
CodeMode becomes a strict evaluation protocol,
not a specific generator implementation.
The agent is no longer the orchestrator.
It is a participant in a state machine.
High-Level Transformation
Current Model
Internal orchestration is embedded in Python classes.
SKILLS + TestSession Model
The agent may reason freely inside each step,
but it may not:
If it does, TestSession fails.
TestSession: Deterministic Protocol Enforcement
Purpose
TestSession acts as:
Think:
The agent does not control progression.
TestSession does.
TestSession Responsibilities
TestSession maintains:
If a step is skipped or produces invalid output:
This guarantees production safety.
SKILL Structure (Strict Contracts)
Each skill must define:
No free-form orchestration.
Skill 1: ExplorationPlan
Input:
Output:
Validator:
Next Step:
Skill 2: ExecutionAnalysis
Input:
Output:
Validator:
Next Step:
Skill 3: SpecSynthesis
Input:
Output:
Validator:
Next Step:
Skill 4: SpecValidation
Input:
Output:
Validation report
Validator:
Next Step:
Skill 5: SpecRefinement
Input:
Output:
Validator:
Next Step:
Determinism vs Agent Freedom
Agent freedom:
Agent constraints:
This preserves:
Why Not Pure Prompt-Orchestrated SKILLS?
Pure skill prompting risks:
TestSession solves this by enforcing:
Production Implications
Current stacked pipelines are ideal for:
But production requires:
SKILLS + TestSession provides:
Migration Strategy
Phase 1
Keep CodeModeGenerator intact.
Phase 2
Extract steps into formal skill contracts.
Phase 3
Introduce TestSession controller.
Phase 4
Run A/B test:
Phase 5
Compare:
Evaluation Criteria
SKILLS + TestSession is justified only if it improves:
If determinism degrades, revert.
Long-Term Vision
If successful, SKILLS CodeMode becomes:
This shifts the project identity from:
to:
Guiding Principle
CodeMode defines the evaluation method.
SKILLS define capability boundaries.
TestSession defines execution legality.
Only keep this architecture if it preserves rigor
while enabling agent-agnostic orchestration.
Beta Was this translation helpful? Give feedback.
All reactions