SKILL DRIVEN ORCHESTRATION

fswair · 2026-03-16T00:33:35Z

fswair
Mar 16, 2026
Maintainer

Overview

This document proposes migrating the current class-based CodeMode pipeline
to a SKILLS-driven, agent-agnostic orchestration model with deterministic protocol enforcement.

The goal is to decouple:

The evaluation methodology (CodeMode approach)
The internal orchestration agents
The concrete LLM used
The execution control layer

By moving to SKILLS + TestSession, the system becomes:

Agent-agnostic
Protocol-driven
Deterministically enforced
Replayable
Production-compatible
More native to the code-writing agent
More portable across LLM systems

Motivation

Current architecture:

CodeModeGenerator owns:
- Exploration agent
- Spec generation agent
- Refinement agent
Orchestration logic is programmatic
Internal agents are fixed

This means:

Pipeline quality depends partly on internal prompt design
Code-writing agent is separate from eval-writing agent
Codebase awareness (fixtures, serializers, helpers) is limited
Determinism is achieved only through code-level control

We want:

The same agent that writes code
to also execute the evaluation protocol
using structured SKILLS
while still being programmatically constrained.

Core Idea

Instead of:

Pipeline calls internal agents

We move to:

External agent executes protocol steps via SKILLS
under a deterministic TestSession controller

CodeMode becomes a strict evaluation protocol,
not a specific generator implementation.

The agent is no longer the orchestrator.
It is a participant in a state machine.

High-Level Transformation

Current Model

CodeModeGenerator
    → exploration()
    → execution()
    → spec_generation()
    → validation()
    → refinement()

Internal orchestration is embedded in Python classes.

SKILLS + TestSession Model

TestSession (deterministic controller)
    → Agent executes Skill: ExplorationPlan
    → Agent executes Skill: ExecutionAnalysis
    → Agent executes Skill: SpecSynthesis
    → Agent executes Skill: SpecValidation
    → Agent executes Skill: SpecRefinement

The agent may reason freely inside each step,
but it may not:

Skip steps
Reorder steps
Produce invalid artifacts
Escape the protocol

If it does, TestSession fails.

TestSession: Deterministic Protocol Enforcement

Purpose

TestSession acts as:

A state machine
A durable execution container
A replayable evaluation trace
A protocol validator

Think:

Durable Execution + VCR + State Machine

The agent does not control progression.
TestSession does.

TestSession Responsibilities

TestSession maintains:

Current stage
Allowed next steps
Produced artifacts
Consumed artifacts
Validation gates
Execution trace log
Replayable state

If a step is skipped or produces invalid output:

Session → FAIL

This guarantees production safety.

SKILL Structure (Strict Contracts)

Each skill must define:

Required inputs
Expected outputs
Validation logic
Allowed next steps

No free-form orchestration.

Skill 1: ExplorationPlan

Input:

Function source
Description

Output:

Executable snippet plan

Validator:

Non-empty
No duplicates
Function reference valid

Next Step:

ExecuteSnippets

Skill 2: ExecutionAnalysis

Input:

Snippet execution results

Output:

Behavior clusters
Output shape summary
Error surface summary

Validator:

Derived only from observed traces
No hallucinated behaviors

Next Step:

SpecSynthesis

Skill 3: SpecSynthesis

Input:

Verified behaviors
Error results

Output:

YAML spec

Validator:

YAML valid
DSL schema valid
No inferred behavior not observed

Next Step:

SpecValidation

Skill 4: SpecValidation

Input:

YAML spec
Runtime evaluator

Output:

Validation report
- Coverage
- Failures
- Structural errors

Validator:

Report complete
Coverage measurable

Next Step:

SpecRefinement or SUCCESS

Skill 5: SpecRefinement

Input:

Validation failures
Original spec

Output:

Revised spec

Validator:

Only spec changes allowed
No function modification
Previously valid cases preserved

Next Step:

SpecValidation

Determinism vs Agent Freedom

Agent freedom:

Chooses exploration strategy
Chooses edge-case emphasis
Chooses spec structuring style
Chooses refinement reasoning

Agent constraints:

Cannot change step order
Cannot skip steps
Cannot fabricate artifacts
Cannot bypass validation

This preserves:

Flexibility inside the step
Determinism across steps

Why Not Pure Prompt-Orchestrated SKILLS?

Pure skill prompting risks:

Orchestration drift
Step skipping
Non-deterministic execution order
Hard-to-replay failures

TestSession solves this by enforcing:

Legal transitions
Artifact contracts
Strict stage progression

Production Implications

Current stacked pipelines are ideal for:

Comparing approaches
Measuring reasoning differences
Controlled benchmarking

But production requires:

Replayability
Observability
State persistence
Artifact lineage
Deterministic failure semantics

SKILLS + TestSession provides:

Protocol-level guarantees
with agent-level reasoning flexibility

Migration Strategy

Phase 1

Keep CodeModeGenerator intact.

Phase 2

Extract steps into formal skill contracts.

Phase 3

Introduce TestSession controller.

Phase 4

Run A/B test:

Internal pipeline orchestration
SKILLS + TestSession orchestration

Phase 5

Compare:

Pass rate
Coverage
Refinement count
YAML validity
Fixture correctness
Replay stability
Latency
Token cost

Evaluation Criteria

SKILLS + TestSession is justified only if it improves:

Agent portability
Production safety
Fixture naturalness
Structured output stability
Weak-model compliance with protocol

If determinism degrades, revert.

Long-Term Vision

If successful, SKILLS CodeMode becomes:

A portable, deterministic evaluation protocol
for AI code agents

This shifts the project identity from:

"A test generator"

to:

"A state-machine-enforced behavioral validation framework"

Guiding Principle

CodeMode defines the evaluation method.
SKILLS define capability boundaries.
TestSession defines execution legality.

Only keep this architecture if it preserves rigor
while enabling agent-agnostic orchestration.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SKILL DRIVEN ORCHESTRATION | AGENT-AGNOSTIC APPROACH #9

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

SKILL DRIVEN ORCHESTRATION | AGENT-AGNOSTIC APPROACH #9

Uh oh!

Uh oh!

fswair Mar 16, 2026 Maintainer

SKILL DRIVEN ORCHESTRATION

Overview

Motivation

Core Idea

High-Level Transformation

Current Model

SKILLS + TestSession Model

TestSession: Deterministic Protocol Enforcement

Purpose

TestSession Responsibilities

SKILL Structure (Strict Contracts)

Skill 1: ExplorationPlan

Skill 2: ExecutionAnalysis

Skill 3: SpecSynthesis

Skill 4: SpecValidation

Skill 5: SpecRefinement

Determinism vs Agent Freedom

Why Not Pure Prompt-Orchestrated SKILLS?

Production Implications

Migration Strategy

Phase 1

Phase 2

Phase 3

Phase 4

Phase 5

Evaluation Criteria

Long-Term Vision

Guiding Principle

Replies: 0 comments

fswair
Mar 16, 2026
Maintainer