codesm

An AI coding agent built with Python + Rust. Features a TUI interface, multi-provider support (Anthropic, OpenAI, OpenRouter), and a powerful tool system for reading, writing, and executing code.

Still in early development. Contributions welcome!

Mermaid Graph

flowchart TD
    A[User Input] --> B[Agent.stream]
    B --> C[ReAct Loop Execute]
    C --> D[Claude API Call]
    D --> E{Response Type?}
    
    E -->|Text Response| F[Add Assistant Message]
    E -->|Tool Call| G[Extract Tool Call]
    
    G --> H{Tool Type?}
    H -->|MCP Tool| I[MCP Execute Tool]
    H -->|Built-in Tool| J[Direct Tool Execution]
    H -->|Parallel Tasks| PA[Parallel Subagent Spawning]
    
    I --> K[Generate Python Code]
    K --> L[Execute in Subprocess]
    L --> M[MCP Client Call]
    M --> N[MCP Server Process]
    N --> O[Tool Implementation]
    O --> P[Tool Result]
    
    J --> Q[Tool.execute method]
    Q --> R{Tool Category?}
    R -->|File Operations| S[Read/Write/Edit Tools]
    R -->|Search Tools| T[Grep/Glob/CodeSearch]
    R -->|External Tools| U[Bash/Web Tools]
    R -->|Agent Tools| V[Task/Oracle/Subagents]
    
    S --> W[File System Operations]
    T --> X[Search Operations]
    U --> Y[External Process/API]
    V --> Z[Spawn Single Subagent]
    
    W --> P
    X --> P
    Y --> P
    Z --> AA[Subagent Result]
    AA --> P
    
    PA --> PB{Orchestration Type?}
    PB -->|parallel_tasks| PC[Concurrent Execution]
    PB -->|orchestrate| PD[Staged Execution]
    PB -->|pipeline| PE[Sequential Chain]
    
    PC --> PF[asyncio.gather]
    PD --> PG[Stage 1 Parallel] --> PH[Stage 2 Parallel] --> PI[Stage N Parallel]
    PE --> PJ[Step 1] --> PK[Pass Result] --> PL[Step 2]
    
    PF --> PM[Subagent 1]
    PF --> PN[Subagent 2]
    PF --> PO[Subagent N]
    
    PM --> PQ[Aggregate Results]
    PN --> PQ
    PO --> PQ
    PI --> PQ
    PL --> PQ
    PQ --> P
    
    P --> BB[Add Tool Result Message]
    BB --> CC[Update Session State]
    CC --> DD{More Tool Calls?}
    
    DD -->|Yes| G
    DD -->|No| EE[Continue ReAct Loop]
    EE --> D
    
    F --> FF[Session Complete]
    
    CC --> GG[Context Management]
    GG --> HH[Message Storage]
    HH --> II[Session Persistence]

Quick Start

# Install
uv pip install -e .

# Run
codesm

# Or with uv directly
uv run codesm

Parallel Subagents

codesm supports spawning multiple subagents in parallel for independent tasks. This is inspired by opencode's batch/task pattern and allows faster execution of parallelizable work.

Using `parallel_tasks` Tool

The parallel_tasks tool allows you to run up to 10 subagent tasks concurrently:

# Example: Run multiple research tasks in parallel
{
    "tasks": [
        {
            "subagent_type": "researcher",
            "prompt": "Find all API endpoints in the codebase",
            "description": "Find API endpoints"
        },
        {
            "subagent_type": "researcher", 
            "prompt": "Analyze the authentication flow",
            "description": "Analyze auth flow"
        },
        {
            "subagent_type": "finder",
            "prompt": "Find all test files",
            "description": "Find test files"
        }
    ],
    "fail_fast": false
}

Features

Up to 10 concurrent tasks - Prevent resource exhaustion
Auto-routing - Use subagent_type: "auto" to let the router pick the best agent
Fail-fast mode - Cancel remaining tasks on first failure with fail_fast: true
Progress tracking - Per-task timing and success/failure indicators
Result aggregation - Combined output with truncation for long results

Subagent Types

Type	Best For	Model
`coder`	Multi-file edits, features	Claude Sonnet
`researcher`	Code analysis (read-only)	Claude Sonnet
`reviewer`	Bug detection, security	Claude Sonnet
`planner`	Implementation plans	Claude Sonnet
`finder`	Fast code search	Gemini Flash
`oracle`	Deep reasoning	o1
`librarian`	Multi-repo research	Claude Sonnet
`auto`	Router picks best	Varies

Advanced Orchestration

For complex multi-stage workflows, use the orchestrate tool with stages:

# Example: Staged execution (stages run sequentially, tasks within stages run in parallel)
{
    "stages": [
        [  # Stage 1: Research (parallel)
            {"subagent_type": "researcher", "prompt": "Analyze current auth system", "description": "Research auth"},
            {"subagent_type": "finder", "prompt": "Find all auth-related files", "description": "Find auth files"}
        ],
        [  # Stage 2: Planning (after research completes)
            {"subagent_type": "planner", "prompt": "Plan auth improvements based on research", "description": "Plan improvements"}
        ],
        [  # Stage 3: Implementation (after planning)
            {"subagent_type": "coder", "prompt": "Implement planned auth improvements", "description": "Implement changes"},
            {"subagent_type": "coder", "prompt": "Add tests for new auth code", "description": "Add tests"}
        ]
    ],
    "fail_fast": true
}

Pipeline Tool

For sequential workflows where each step receives the previous step's output:

{
    "steps": [
        {"subagent_type": "researcher", "prompt_template": "Find all TODO comments in the codebase", "description": "Find TODOs"},
        {"subagent_type": "planner", "prompt_template": "Prioritize these TODOs: {previous_result}", "description": "Prioritize"},
        {"subagent_type": "coder", "prompt_template": "Fix the top priority TODO: {previous_result}", "description": "Fix top TODO"}
    ],
    "initial_context": ""
}

Programmatic Usage

from codesm.agent.orchestrator import spawn_parallel_subagents, SubAgentOrchestrator, OrchestrationPlan

# Simple parallel execution
results = await spawn_parallel_subagents(
    tasks=[
        ("researcher", "Find all database queries", "Find DB queries"),
        ("coder", "Add input validation to user.py", "Add validation"),
    ],
    directory=Path("."),
    parent_tools=tool_registry,
    max_concurrent=5,
)

for task in results:
    print(f"{task.description}: {task.status}")

# Advanced: Staged orchestration
orchestrator = SubAgentOrchestrator(directory=Path("."), parent_tools=registry)
plan = OrchestrationPlan.staged([
    [orchestrator.create_task("researcher", "Research phase 1", "Research")],
    [orchestrator.create_task("coder", "Implement based on research", "Implement")],
])
await orchestrator.execute_plan(plan)

Agent Modes

codesm supports two agent modes for different task types:

Smart Mode (Default)

Uses powerful models (Claude Sonnet 4, GPT-4o)
Best for complex tasks, architecture decisions, debugging
Full capability, thorough reasoning

Rush Mode

Uses fast models (Claude Haiku 3.5, GPT-4o-mini)
67% cheaper, 50% faster
Best for simple, well-defined tasks: quick edits, small fixes, simple features

Switch modes:

Press tab to toggle between modes
Use /rush or /smart commands
Use /mode to open mode selector

Providers

Supported Providers

Anthropic - Claude models (Sonnet, Opus, Haiku)
OpenAI - GPT-4o, GPT-4-turbo, o1
OpenRouter - Access 100+ models with one API key
Ollama - Local models (Llama, Qwen, DeepSeek, Mistral)
Google - Gemini models (coming soon)

Connect a Provider

Press Ctrl+A or use /connect command
Select provider
Enter API key

Feature Progress

Implemented

TUI interface (Textual-based)
Session management & persistence
Multi-provider support (Anthropic, OpenAI, OpenRouter)
Rush Mode - Fast/cheap mode for simple tasks
Agent loop with tool execution
Command palette (Ctrl+P or /)
Sidebar with session list
Tools: Read, Write, Edit, Bash, Glob, Grep, WebSearch, WebFetch, Diagnostics, LSP, CodeSearch, Todo, Ls, Batch, ParallelTasks, Orchestrate, Pipeline

To Be Implemented

Core Features (from opencode)

CLI Features (from amp)

Execute Mode (from amp)

-x, --execute - Non-interactive execute mode (run prompt, print last message, exit)
--stream-json - Output in Claude Code-compatible stream JSON format
--stream-json-thinking - Include thinking blocks in stream JSON
--stream-json-input - Read JSON Lines from stdin
-l, --label - Add labels to threads
Stdin piping - echo "message" | codesm
Stdout redirect detection - Auto-enable execute mode when redirecting

Global Options (from amp)

--visibility - Set thread visibility (private, public, workspace, group)
--notifications / --no-notifications - Sound notifications toggle
--settings-file - Custom settings file path
--log-level - Set log level (error, warn, info, debug, audit)
--log-file - Set log file location
--dangerously-allow-all - Disable all confirmation prompts
--mcp-config - JSON config or file path for MCP servers
-m, --mode - Set agent mode (free, rush, smart)

Settings System (from amp)

Smart Multi-Model Architecture

The system uses task-specialized models across three tiers:

Tier 1: Agent Modes (Primary interaction)

Smart Mode - Claude Sonnet 4 / GPT-4o for complex reasoning
Rush Mode - Claude Haiku 3.5 / GPT-4o-mini for fast, cheap tasks
Mode switching via tab key or /mode command

Tier 2: Feature Models (Low-latency UI/UX tasks)

Tab Completion - Custom fine-tuned model for autocomplete/next-action
Code Review - Gemini 2.5 Pro (via OpenRouter) for bug detection and review assistance
Titling - Claude 3.5 Haiku (via OpenRouter) for fast thread title generation
Look At - Gemini 2.0 Flash (via OpenRouter) for image/PDF/media analysis

Tier 3: Specialized Subagents (Background processing)

Oracle - GPT-5/o1 for complex reasoning, planning, debugging
Finder/Search - Gemini 2.5 Flash for high-speed codebase retrieval
Librarian - Claude Sonnet 4 for multi-repo research & external code

Workflow Management

Handoff System - Gemini 2.5 Flash for context analysis & task continuation
Topics/Indexing - Gemini 2.0 Flash-Lite for thread categorization
Task Router - Route tasks based on reasoning depth vs speed tradeoff

Infrastructure

Multi-provider model registry (Anthropic, OpenAI, OpenRouter, Google)
Model selection logic per task type
Subagent spawning and orchestration
Context passing between agents
Cost/latency optimization layer

Other Smart Features

Mermaid diagram generation
Thread search & cross-thread context
Auto todo planning & tracking during tasks
File citations with clickable links
Web page reading (WebFetch tool)
Parallel tool execution optimization

Roadmap: Competitive Feature

Critical (To match Amp/Claude Code/OpenCode)

Agent Architecture

Parallel Subagent Spawning - Run multiple subagents concurrently for independent tasks
Context Window Management - Smart context compression, summarization, and handoff
Automatic Thread Handoff - When context gets long, seamlessly continue in new thread
Task Decomposition Engine - Break complex tasks into parallelizable subtasks (via orchestrate/pipeline tools)
Agent Self-Correction - Detect and retry failed tool calls with different approaches

Tool System

Undo/Redo System - Revert any file edit with full history
File Watcher - React to file changes in real-time
Diff Preview - Show unified diff before applying edits
Format on Save - Auto-format files after edits (ruff, prettier, gofmt, rustfmt)
Multi-file Atomic Edits - Transactional edits across files

Context & Memory

AGENTS.md Support - Read project-specific instructions automatically
Cross-Session Memory - Remember user preferences and past solutions
Codebase Indexing - Pre-index repos for instant semantic search
Git Integration - Auto-commit, branch management, PR creation
Conversation Branching - Fork conversations to explore alternatives

Important (Competitive Advantages)

Intelligence Layer

Code Review Agent - Automatic PR review with actionable feedback
Test Generation - Auto-generate tests for new/changed code
Bug Localization - Given an error, find the root cause automatically
Refactoring Suggestions - Proactive code improvement recommendations

Developer Experience

Streaming Tool Output - Real-time output for long-running tools (tests, builds)
Progress Indicators - Show what the agent is doing at each step
Keyboard-First Navigation - Vim-style motions throughout TUI
Split Pane View - Code preview alongside chat
File Tree Browser - Navigate and select files visually
Syntax Highlighted Diffs - Beautiful, readable code changes

Safety & Permissions

Sandboxed Execution - Run bash commands in isolated environment
Permission Prompts - Ask before destructive operations
Allowlist/Blocklist - Configure which commands/paths are allowed
Audit Log - Track all agent actions for review
Dry Run Mode - Preview all changes without applying

Nice to Have (Killer Features)

Advanced Capabilities

Mermaid Diagram Generation - Auto-generate architecture/flow diagrams
Image/Screenshot Analysis - Describe UI, extract text from images
PDF/Document Reading - Extract and summarize document contents
Browser Automation - Navigate and interact with web pages
API Testing - Make HTTP requests and validate responses

Local-First Features

Ollama Integration - Run fully local with open models
Embedding Cache - Local vector DB for semantic search (ChromaDB/LanceDB)
Offline Fallback - Graceful degradation when no internet
Model Switching Mid-Task - Hot-swap models during execution
Cost Tracking Dashboard - Monitor API spend in real-time

Collaboration

Session Sharing - Share conversations via URL
Team Workspaces - Shared sessions, shared memory
Real-time Collaboration - Multiple users in same session
Export Formats - Export to Markdown, JSON, or executable scripts

IDE Integration

VS Code Extension - Native integration with file syncing
Neovim Plugin - Lua-based plugin for Neovim users
JetBrains Plugin - Support for IntelliJ-based IDEs
Language Server - Act as an LSP for AI-powered completions

Voice & Accessibility

Voice Input - Whisper-based speech-to-text
Voice Output - TTS for responses (optional)
Screen Reader Support - Full accessibility compliance
High Contrast Themes - Accessibility-focused UI themes

Experimental (Moonshots)

Self-Improving Agent - Learn from user corrections
Codebase-Specific Fine-tuning - LoRA adapters for your repos
Multi-Agent Debates - Multiple agents argue to find best solution
Autonomous Mode - Run overnight, wake user for decisions
Plugin Marketplace - skills.sh registry integration (vercel-labs/agent-skills)
Natural Language Git - "Undo my last 3 commits" → executes git commands
Code Generation Streaming - Token-by-token code preview as it's written
Predictive Actions - Suggest next action before user asks

How to Make codesm Better

vs Claude Code

Open source & extensible - Full transparency, community-driven
Multi-provider - Not locked to Anthropic; use any LLM
Rust core - Performance-critical ops in Rust (grep, file ops)
Self-hostable - No cloud dependency, run fully local
Custom skills - User-defined agent behaviors + skills.sh registry

vs Opencode

Python ecosystem - Easier contribution, rich ML/AI libraries
Simpler architecture - Less overhead, faster iteration
Better defaults - Opinionated but sensible out-of-box experience

Unique Differentiators to Build

Hybrid Python+Rust - Python for flexibility, Rust for speed
Local-first AI - First-class support for Ollama, llama.cpp
Agent memory - Long-term context across sessions
Code understanding graphs - AST-based code intelligence
Collaborative mode - Multiple users, shared sessions
Voice interface - Speech-to-code capabilities
Custom model fine-tuning - Train on your codebase patterns
Offline mode - Full functionality without internet

Keyboard Shortcuts

Key	Action
`tab`	Toggle Smart/Rush mode
`Ctrl+P`	Command palette
`Ctrl+A`	Connect provider
`Ctrl+N`	New session
`Ctrl+T`	Toggle theme
`Ctrl+B`	Toggle sidebar
`Ctrl+C`	Quit
`Escape`	Cancel current operation

Commands

Command	Description
`/mode`	Open mode selector
`/rush`	Switch to Rush mode
`/smart`	Switch to Smart mode
`/models`	Select model
`/connect`	Connect a provider
`/session`	Browse sessions
`/new`	New session
`/theme`	Change theme
`/status`	Show current status
`/help`	Show help

Post-Development Roadmap

Phase 1: Polish & Reliability (1-2 weeks)

Goal: Make it daily-drivable

Undo/Redo system - Critical for trust
Permission prompts - Ask before destructive ops
AGENTS.md support - Match Amp/Claude Code behavior
Streaming tool output - Real-time feedback for bash/tests

Phase 2: Differentiation (2-4 weeks)

Goal: Create your unique angle

Ollama/local models - First-class offline support (big differentiator)
Cost tracking dashboard - Show users their spend in real-time
Git integration - Auto-commit, branch, PR creation
Cross-session memory - Remember preferences/past solutions

Phase 3: Community & Growth (4-8 weeks)

Goal: Build traction

Plugin marketplace/skills - Let community extend
VS Code extension - Meet devs where they are
Session sharing - Viral loop for growth
Better docs + demo videos

Recommended Focus Order

Undo system → builds trust
Ollama support → unique selling point vs Cursor/Claude Code
Cost tracking → users love transparency
Git integration → makes it a complete workflow

Strategic Note: Ship Ollama support early—"fully local AI coding agent" is a strong positioning no major player owns yet.

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
assets		assets
codesm		codesm
examples		examples
packages		packages
prompts		prompts
tests		tests
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
example-mcp.json		example-mcp.json
main.py		main.py
mcp-servers.json		mcp-servers.json
pyproject.toml		pyproject.toml
test_subagent.py		test_subagent.py
uv.lock		uv.lock

License

Aditya-PS-05/codesm

Folders and files

Latest commit

History

Repository files navigation