Tracks all planned work items after the initial L1–L5 core implementation (v0.0.3). Items are grouped by priority tier. Each item is self-contained and can be implemented independently.
Why: HarnessEngine.run() accepts any llm_client, but there's no built-in provider. Users must pass their own OpenAI client. The llm/ module should handle key loading, retries, and provide a standard interface.
Files to create:
src/harness0/llm/base.py—LLMProviderabstract base classsrc/harness0/llm/openai.py— OpenAI adapter (wrapsopenai.AsyncOpenAI)src/harness0/llm/anthropic.py— Anthropic adapter
API design:
class LLMProvider(ABC):
@abstractmethod
async def chat(
self,
messages: list[Message],
tools: list[dict] | None = None,
) -> LLMResponse:
...
class LLMResponse(BaseModel):
content: str
finish_reason: Literal["stop", "tool_calls", "length", "error"]
tool_calls: list[ToolCallRequest] = []
usage: TokenUsage | None = None
class TokenUsage(BaseModel):
prompt_tokens: int
completion_tokens: int
total_tokens: intIntegration: HarnessConfig.llm already defines provider, model, api_key, base_url. The engine should auto-build the provider from config when llm_client is not explicitly passed.
Scope: OpenAI adapter + base class. Anthropic optional.
Why: Every user currently hand-writes the same 4 tools (read_file, write_file, list_dir, run_command). These should ship with the library.
Files to create:
src/harness0/plugins/base.py—ToolPluginABCsrc/harness0/plugins/builtin/file_tools.py— read_file, write_file, list_directorysrc/harness0/plugins/builtin/shell_tools.py— run_command (uses ProcessSandbox)src/harness0/plugins/builtin/search_tools.py— grep_search, glob_search
API design:
class ToolPlugin(ABC):
name: str
@abstractmethod
def get_tools(self) -> list[ToolDefinition]:
...
# Registration on the engine
engine.register_plugin(file_tools)
engine.register_plugin(shell_tools)
engine.register_all_builtins() # convenience: all built-in plugins
# Or standalone
from harness0.plugins.builtin import file_tools, shell_tools, search_toolsScope: ToolPlugin ABC + 3 builtin plugin files.
Why: Zero tests currently. All core layers are implemented and stable enough for tests.
Files to create (one per layer, plus integration):
tests/test_context.py— ContextAssembler, DisclosureLevel, Freshness, sourcestests/test_tools.py— ToolInterceptor pipeline, RiskLevel, schema validationtests/test_security.py— CommandGuard, ProcessSandbox, ApprovalManagertests/test_feedback.py— FeedbackSignal, FeedbackTranslator factory methods, renderingtests/test_entropy.py— EntropyManager (compression, dedup, conflicts), EntropyGardener, GoldenRule checkerstests/test_engine.py— HarnessEngine (decorator, execute_tool, run with stub LLM)tests/test_config.py— HarnessConfig.from_yaml, field defaults, validation
Test setup (tests/conftest.py):
import pytest
from harness0 import HarnessEngine, HarnessConfig
@pytest.fixture
def engine():
return HarnessEngine.default()
@pytest.fixture
def config():
return HarnessConfig.default()Key behaviors to cover:
- Progressive disclosure: INDEX always selected, DETAIL keyword matching
- Freshness cache: static/per_session/per_turn reload behavior
- Truncation: per-layer and total token budget enforcement
- Tool interceptor: missing tool, schema validation, truncation, timeout
- CommandGuard: blocklist match, safe command pass-through, fix_instructions content
- ApprovalManager: fingerprint cache hit/miss, auto-approve/deny backends
- EntropyManager: dedup, compression, stale signal removal
- EntropyGardener: staleness detection, duplicate tool detection, golden rule enforcement
- FeedbackSignal: XML/Markdown/JSON rendering, fix_instructions in XML
Why: Docs reference this file, it doesn't exist. Should be a minimal but runnable example.
File to create: examples/simple_agent.py
What it should demonstrate:
HarnessEngine.from_config("harness.yaml")— config-driven setup@engine.toolfor read_file, write_file, run_commandengine.run(task, llm_client=AsyncOpenAI())- Printing
result.output,result.turn_count,result.signals
Also create examples/harness.yaml with a minimal config and examples/prompts/base.md with a basic system prompt.
Why: checkpoint_enabled: true is in config but does nothing. Long-running agents benefit from crash recovery.
What to implement:
- Serialize
AgentStateto JSON after each turn:{session_id}.json - Load from checkpoint:
engine.run(task, resume_from="path/to/checkpoint.json") AgentStateis already a Pydantic model →model_dump_json()/model_validate_json()
Files to modify:
src/harness0/engine.py— addresume_fromparam torun(), add_save_checkpoint()call each turnsrc/harness0/core/types.py— verifyAgentStateis fully JSON-serializable
File: .cursor/rules/harness0-project-context.mdc
What to update:
Status: change from "Pre-Alpha (placeholder)" to "v0.0.3 — core implemented"Current State & Next Steps: replace with references to TODO.md itemsProject Structure: update to match actual file layout (removeloop.py,state.py,compressor.py,decay.py,hints.py; addgardener.py)
Each adapter is independent. Implement in order of community size.
File: src/harness0/integrations/openai_sdk.py
How it maps:
| harness0 | OpenAI SDK |
|---|---|
| L1 ContextAssembler | Input guardrail pre-processing |
| L2 ToolInterceptor | Tool wrapper / guardrail |
| L3 SecurityGuard | Tool wrapper |
| L4 FeedbackTranslator | Output processing |
| L5 EntropyManager | Input guardrail pre-processing |
Install extra: pip install harness0[openai] → add openai-agents to optional deps.
File: src/harness0/integrations/langchain.py
How it maps:
| harness0 | LangChain |
|---|---|
| L1 ContextAssembler | before_model middleware |
| L2+L3 ToolInterceptor + SecurityGuard | wrap_tool_call |
| L4 FeedbackTranslator | after_model + wrap_tool_call |
| L5 EntropyManager | before_model |
Install extra: pip install harness0[langchain]
File: src/harness0/integrations/pydantic_ai.py
Pattern: Inject as deps_type. Tools wrapped via RunContext[HarnessDeps].
Install extra: pip install harness0[pydantic-ai]
File: src/harness0/integrations/crewai.py
Pattern: @harness_tool decorator wraps CrewAI tools.
Install extra: pip install harness0[crewai]
Why: Current L5 compression drops old messages. LLM summarization would preserve more information.
What to add: EntropyManager._llm_summarize(messages) — requires LLMProvider (T1-1).
Config field (already in architecture doc): entropy.compression_strategy: "targeted" | "summarize" | "sliding_window".
Why: Currently CallableSource must be created in Python code. Config YAML can't reference callables.
Spec: Support "callable:module.function_name" in source field — make_source() imports and wraps it.
Why: RunResult has no token usage summary. Useful for cost monitoring.
What to add:
TokenUsagemodel incore/types.py- Accumulate usage in
engine.run()from LLM responses - Expose as
result.token_usage: TokenUsage | None
Why: Described in docs/growth-strategy.md as a key differentiator. A per-session reliability score based on signal types, entropy actions, and tool success rates.
What to add:
HarnessScoremodel: overall score + per-dimension breakdown (context_quality, tool_reliability, security_events, entropy_health)- Computed from
RunResult.signalsandinterceptor.audit_log() - Expose as
result.harness_score: HarnessScore
Why: ContextLayer._content_cache uses private attributes which Pydantic v2 doesn't track, and there's no async lock protecting concurrent access.
What to fix: Replace _content_cache with an asyncio.Lock + proper __init__-style setup, or convert ContextLayer to a non-Pydantic class.
core/loop.py— AgentLoop was merged intoengine.py. Keep merged unless there's a use case for standalone loop.core/state.py—AgentStateis incore/types.py. No reason to split unless state management becomes complex.entropy/compressor.py/entropy/decay.py— functionality is inmanager.py. Extract only if the file grows > 300 lines.feedback/hints.py— SystemHint builder was merged intotranslator.py. No need to split.