harness0 — Development TODO

Tracks all planned work items after the initial L1–L5 core implementation (v0.0.3). Items are grouped by priority tier. Each item is self-contained and can be implemented independently.

Tier 1 — Blockers (needed to run the engine end-to-end)

T1-1: LLM Provider Layer (`llm/`)

Why: HarnessEngine.run() accepts any llm_client, but there's no built-in provider. Users must pass their own OpenAI client. The llm/ module should handle key loading, retries, and provide a standard interface.

Files to create:

src/harness0/llm/base.py — LLMProvider abstract base class
src/harness0/llm/openai.py — OpenAI adapter (wraps openai.AsyncOpenAI)
src/harness0/llm/anthropic.py — Anthropic adapter

API design:

class LLMProvider(ABC):
    @abstractmethod
    async def chat(
        self,
        messages: list[Message],
        tools: list[dict] | None = None,
    ) -> LLMResponse:
        ...

class LLMResponse(BaseModel):
    content: str
    finish_reason: Literal["stop", "tool_calls", "length", "error"]
    tool_calls: list[ToolCallRequest] = []
    usage: TokenUsage | None = None

class TokenUsage(BaseModel):
    prompt_tokens: int
    completion_tokens: int
    total_tokens: int

Integration: HarnessConfig.llm already defines provider, model, api_key, base_url. The engine should auto-build the provider from config when llm_client is not explicitly passed.

Scope: OpenAI adapter + base class. Anthropic optional.

T1-2: Built-in Tool Plugins (`plugins/`)

Why: Every user currently hand-writes the same 4 tools (read_file, write_file, list_dir, run_command). These should ship with the library.

Files to create:

src/harness0/plugins/base.py — ToolPlugin ABC
src/harness0/plugins/builtin/file_tools.py — read_file, write_file, list_directory
src/harness0/plugins/builtin/shell_tools.py — run_command (uses ProcessSandbox)
src/harness0/plugins/builtin/search_tools.py — grep_search, glob_search

API design:

class ToolPlugin(ABC):
    name: str

    @abstractmethod
    def get_tools(self) -> list[ToolDefinition]:
        ...

# Registration on the engine
engine.register_plugin(file_tools)
engine.register_plugin(shell_tools)
engine.register_all_builtins()  # convenience: all built-in plugins

# Or standalone
from harness0.plugins.builtin import file_tools, shell_tools, search_tools

Scope: ToolPlugin ABC + 3 builtin plugin files.

Tier 2 — Core Quality (needed before open-source release)

T2-1: Test Suite (`tests/`)

Why: Zero tests currently. All core layers are implemented and stable enough for tests.

Files to create (one per layer, plus integration):

tests/test_context.py — ContextAssembler, DisclosureLevel, Freshness, sources
tests/test_tools.py — ToolInterceptor pipeline, RiskLevel, schema validation
tests/test_security.py — CommandGuard, ProcessSandbox, ApprovalManager
tests/test_feedback.py — FeedbackSignal, FeedbackTranslator factory methods, rendering
tests/test_entropy.py — EntropyManager (compression, dedup, conflicts), EntropyGardener, GoldenRule checkers
tests/test_engine.py — HarnessEngine (decorator, execute_tool, run with stub LLM)
tests/test_config.py — HarnessConfig.from_yaml, field defaults, validation

Test setup (tests/conftest.py):

import pytest
from harness0 import HarnessEngine, HarnessConfig

@pytest.fixture
def engine():
    return HarnessEngine.default()

@pytest.fixture
def config():
    return HarnessConfig.default()

Key behaviors to cover:

Progressive disclosure: INDEX always selected, DETAIL keyword matching
Freshness cache: static/per_session/per_turn reload behavior
Truncation: per-layer and total token budget enforcement
Tool interceptor: missing tool, schema validation, truncation, timeout
CommandGuard: blocklist match, safe command pass-through, fix_instructions content
ApprovalManager: fingerprint cache hit/miss, auto-approve/deny backends
EntropyManager: dedup, compression, stale signal removal
EntropyGardener: staleness detection, duplicate tool detection, golden rule enforcement
FeedbackSignal: XML/Markdown/JSON rendering, fix_instructions in XML

T2-2: Working Example (`examples/simple_agent.py`)

Why: Docs reference this file, it doesn't exist. Should be a minimal but runnable example.

File to create: examples/simple_agent.py

What it should demonstrate:

HarnessEngine.from_config("harness.yaml") — config-driven setup
@engine.tool for read_file, write_file, run_command
engine.run(task, llm_client=AsyncOpenAI())
Printing result.output, result.turn_count, result.signals

Also create examples/harness.yaml with a minimal config and examples/prompts/base.md with a basic system prompt.

T2-3: Checkpoint Persistence

Why: checkpoint_enabled: true is in config but does nothing. Long-running agents benefit from crash recovery.

What to implement:

Serialize AgentState to JSON after each turn: {session_id}.json
Load from checkpoint: engine.run(task, resume_from="path/to/checkpoint.json")
AgentState is already a Pydantic model → model_dump_json() / model_validate_json()

Files to modify:

src/harness0/engine.py — add resume_from param to run(), add _save_checkpoint() call each turn
src/harness0/core/types.py — verify AgentState is fully JSON-serializable

T2-4: Update Project Context Rule

File: .cursor/rules/harness0-project-context.mdc

What to update:

Status: change from "Pre-Alpha (placeholder)" to "v0.0.3 — core implemented"
Current State & Next Steps: replace with references to TODO.md items
Project Structure: update to match actual file layout (remove loop.py, state.py, compressor.py, decay.py, hints.py; add gardener.py)

Tier 3 — Framework Integrations (`integrations/`)

Each adapter is independent. Implement in order of community size.

T3-1: OpenAI Agents SDK Integration

File: src/harness0/integrations/openai_sdk.py

How it maps:

harness0	OpenAI SDK
L1 ContextAssembler	Input guardrail pre-processing
L2 ToolInterceptor	Tool wrapper / guardrail
L3 SecurityGuard	Tool wrapper
L4 FeedbackTranslator	Output processing
L5 EntropyManager	Input guardrail pre-processing

Install extra: pip install harness0[openai] → add openai-agents to optional deps.

T3-2: LangChain Integration

File: src/harness0/integrations/langchain.py

How it maps:

harness0	LangChain
L1 ContextAssembler	`before_model` middleware
L2+L3 ToolInterceptor + SecurityGuard	`wrap_tool_call`
L4 FeedbackTranslator	`after_model` + `wrap_tool_call`
L5 EntropyManager	`before_model`

Install extra: pip install harness0[langchain]

T3-3: PydanticAI Integration

File: src/harness0/integrations/pydantic_ai.py

Pattern: Inject as deps_type. Tools wrapped via RunContext[HarnessDeps].

Install extra: pip install harness0[pydantic-ai]

T3-4: CrewAI Integration

File: src/harness0/integrations/crewai.py

Pattern: @harness_tool decorator wraps CrewAI tools.

Install extra: pip install harness0[crewai]

Tier 4 — Enhancements

T4-1: LLM-based Summarization for L5

Why: Current L5 compression drops old messages. LLM summarization would preserve more information.

What to add: EntropyManager._llm_summarize(messages) — requires LLMProvider (T1-1).

Config field (already in architecture doc): entropy.compression_strategy: "targeted" | "summarize" | "sliding_window".

T4-2: CallableSource in harness.yaml

Why: Currently CallableSource must be created in Python code. Config YAML can't reference callables.

Spec: Support "callable:module.function_name" in source field — make_source() imports and wraps it.

T4-3: Token Usage Tracking

Why: RunResult has no token usage summary. Useful for cost monitoring.

What to add:

TokenUsage model in core/types.py
Accumulate usage in engine.run() from LLM responses
Expose as result.token_usage: TokenUsage | None

T4-4: `HarnessScore` Concept

Why: Described in docs/growth-strategy.md as a key differentiator. A per-session reliability score based on signal types, entropy actions, and tool success rates.

What to add:

HarnessScore model: overall score + per-dimension breakdown (context_quality, tool_reliability, security_events, entropy_health)
Computed from RunResult.signals and interceptor.audit_log()
Expose as result.harness_score: HarnessScore

T4-5: Async-safe ContextLayer caching

Why: ContextLayer._content_cache uses private attributes which Pydantic v2 doesn't track, and there's no async lock protecting concurrent access.

What to fix: Replace _content_cache with an asyncio.Lock + proper __init__-style setup, or convert ContextLayer to a non-Pydantic class.

Deferred / Out of Scope

core/loop.py — AgentLoop was merged into engine.py. Keep merged unless there's a use case for standalone loop.
core/state.py — AgentState is in core/types.py. No reason to split unless state management becomes complex.
entropy/compressor.py / entropy/decay.py — functionality is in manager.py. Extract only if the file grows > 300 lines.
feedback/hints.py — SystemHint builder was merged into translator.py. No need to split.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

harness0 — Development TODO

Tier 1 — Blockers (needed to run the engine end-to-end)

T1-1: LLM Provider Layer (`llm/`)

T1-2: Built-in Tool Plugins (`plugins/`)

Tier 2 — Core Quality (needed before open-source release)

T2-1: Test Suite (`tests/`)

T2-2: Working Example (`examples/simple_agent.py`)

T2-3: Checkpoint Persistence

T2-4: Update Project Context Rule

Tier 3 — Framework Integrations (`integrations/`)

T3-1: OpenAI Agents SDK Integration

T3-2: LangChain Integration

T3-3: PydanticAI Integration

T3-4: CrewAI Integration

Tier 4 — Enhancements

T4-1: LLM-based Summarization for L5

T4-2: CallableSource in harness.yaml

T4-3: Token Usage Tracking

T4-4: `HarnessScore` Concept

T4-5: Async-safe ContextLayer caching

Deferred / Out of Scope

FilesExpand file tree

TODO.md

Latest commit

History

TODO.md

File metadata and controls

harness0 — Development TODO

Tier 1 — Blockers (needed to run the engine end-to-end)

T1-1: LLM Provider Layer (llm/)

T1-2: Built-in Tool Plugins (plugins/)

Tier 2 — Core Quality (needed before open-source release)

T2-1: Test Suite (tests/)

T2-2: Working Example (examples/simple_agent.py)

T2-3: Checkpoint Persistence

T2-4: Update Project Context Rule

Tier 3 — Framework Integrations (integrations/)

T3-1: OpenAI Agents SDK Integration

T3-2: LangChain Integration

T3-3: PydanticAI Integration

T3-4: CrewAI Integration

Tier 4 — Enhancements

T4-1: LLM-based Summarization for L5

T4-2: CallableSource in harness.yaml

T4-3: Token Usage Tracking

T4-4: HarnessScore Concept

T4-5: Async-safe ContextLayer caching

Deferred / Out of Scope

T1-1: LLM Provider Layer (`llm/`)

T1-2: Built-in Tool Plugins (`plugins/`)

T2-1: Test Suite (`tests/`)

T2-2: Working Example (`examples/simple_agent.py`)

Tier 3 — Framework Integrations (`integrations/`)

T4-4: `HarnessScore` Concept