Skip to content

CodeFRAME is a Fully Remote Autonomous Multi-Agent Environment for coding - the next generation IDE for agentic coding. Launch self-correcting AI swarms that plan, build, test, and deploy software autonomously.

License

Notifications You must be signed in to change notification settings

frankbria/codeframe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CodeFRAME

Status License Python Tests Coverage Follow on X

AI coding agents that work autonomously while you sleep. Check in like a coworker, answer questions when needed, ship features continuously.


Overview

CodeFRAME is an autonomous AI development system where specialized agents collaborate to build software features end-to-end. It combines multi-agent orchestration, human-in-the-loop blockers, and intelligent context management to enable truly autonomous software development cycles.

Unlike traditional AI coding assistants that wait for your prompts, CodeFRAME agents work independently on tasks, ask questions when blocked, and coordinate with each other to ship complete features—day and night.

Two modes of operation:

  • CLI-first (v2) — Complete Golden Path workflow from the command line, no server required
  • Dashboard (v1) — Real-time web UI with WebSocket updates for monitoring and interaction

What's New (Updated: 2026-01-16)

Tech Stack Configuration (Simplified)

Describe your tech stack — Tell CodeFRAME what technologies your project uses during initialization.

# Auto-detect from project files (pyproject.toml, package.json, Cargo.toml, etc.)
cf init . --detect

# Provide explicit tech stack description
cf init . --tech-stack "Python 3.11 with FastAPI, uv, pytest"
cf init . --tech-stack "TypeScript monorepo with pnpm, Next.js frontend"
cf init . --tech-stack "Rust project using cargo"

# Interactive setup
cf init . --tech-stack-interactive

Example output:

Workspace initialized
  Path: /home/user/projects/my-project
  ID: abc123...
  State: .codeframe/
  Tech Stack: Python with uv, pytest, ruff for linting

Why this matters: The agent uses your tech stack description to determine appropriate commands and patterns. Works with any stack — Python, TypeScript, Rust, Go, monorepos, or mixed environments.


Agent Self-Correction & Observability

Verification self-correction loop — Agent now automatically attempts to fix failing verification gates.

# Execute with verbose output to see self-correction progress
cf work start <task-id> --execute --verbose

# Watch the agent attempt fixes in real-time
[VERIFY] Running final verification (attempt 1/3)
[VERIFY] Gates failed: pytest, ruff
[SELFCORRECT] Attempting to fix verification failures
[SELFCORRECT] Applied 2/2 fixes, re-verifying...

New Capabilities:

  • Self-Correction Loop — Agent analyzes gate errors and generates fix plans using LLM
  • Verbose Mode--verbose / -v flag shows detailed verification and self-correction progress
  • FAILED Task Status — Tasks can now transition to FAILED state for proper error visibility
  • Project Preferences — Agent loads AGENTS.md or CLAUDE.md for per-project configuration

Bug Fixes:

  • Fixed fail_run() not updating task status (tasks stuck in IN_PROGRESS)
  • Fixed task state transitions for proper failure recovery
  • Added diagnostic output for debugging agent behavior

Previous: v2 Phase 2 - Parallel Batch Execution

Batch Execution (2026-01-15)

Multi-task batch execution — Run multiple tasks with intelligent parallelization.

# Execute multiple tasks in parallel
cf work batch run task1 task2 task3 --strategy parallel

# Execute all READY tasks with LLM-inferred dependencies
cf work batch run --all-ready --strategy auto

# Automatic retry on failure
cf work batch run --all-ready --retry 3

Batch Capabilities:

  • Parallel Execution — ThreadPoolExecutor-based concurrent task execution
  • Dependency Graph — DAG-based task ordering with cycle detection
  • LLM Dependency Inference--strategy auto analyzes task descriptions to infer dependencies
  • Automatic Retry--retry N retries failed tasks up to N times
  • Batch Resumecf work batch resume <batch-id> re-runs failed/blocked tasks

Modules:

  • codeframe/core/conductor.py — Batch orchestration with worker pool
  • codeframe/core/dependency_graph.py — DAG operations and execution planning
  • codeframe/core/dependency_analyzer.py — LLM-based dependency inference

Previous: v2 Agent Implementation

Agent System (2026-01-14)

Autonomous Agent Execution — The full agent loop is now functional via the CLI.

# Execute a task with the AI agent
cf work start <task-id> --execute

# Preview changes without applying (dry run)
cf work start <task-id> --execute --dry-run

Components:

  • LLM Adapter Interface — Pluggable provider system with Anthropic Claude support
  • Task Context Loader — Intelligent codebase scanning with relevance scoring
  • Implementation Planner — LLM-powered task decomposition into executable steps
  • Code Execution Engine — File operations, shell commands, and rollback capability
  • Agent Orchestrator — Full execution loop with blocker detection and verification gates

Key Features:

  • Task-based model selection (Sonnet for planning/execution, Haiku for generation)
  • Automatic blocker creation when agent needs human input
  • Incremental verification with ruff after each file change
  • State persistence for pause/resume across sessions

Previous Updates

Late-Joining User Bug Fixes (2026-01-09)

Phase-Aware Data Source Selection - Components now correctly display data for users who navigate to a project after events have occurred.

  • TaskStats Phase-Awareness - Fixed bug where TaskStats showed 0 tasks during planning phase
  • State Reconciliation Tests - Comprehensive E2E tests validate UI state for late-joining users
  • Duplicate Button Prevention - Fixed duplicate "Generate Tasks" button appearing for late-joining users
Authentication System (Sprint 11)

FastAPI Users Migration - Complete auth system redesign for production security.

  • Migration: BetterAuth → FastAPI Users with JWT tokens
  • Mandatory Auth: Authentication is now required (no bypass mode)
  • WebSocket Auth: Connections require ?token=TOKEN query parameter
  • Session Management: Secure session tokens with SQLite-backed storage
Sprint 10: MVP Complete

Production-Ready Quality System - Comprehensive quality gates, checkpoint recovery, and cost tracking.

  • Quality Gates System - Multi-stage gates: Tests → Type Check → Coverage → Code Review
  • Checkpoint & Recovery - Hybrid snapshot: Git commit + SQLite backup + context JSON
  • Metrics & Cost Tracking - Per-call tracking for every LLM API interaction
  • End-to-End Testing - 85+ E2E tests with full workflow validation

Key Features

CLI-First Agent System (v2)

  • Autonomous Executioncf work start --execute runs the full agent loop
  • Self-Correction Loop — Agent automatically fixes failing verification gates (up to 3 attempts)
  • Human-in-the-Loop Blockers — Agents pause and ask questions when they need decisions
  • Verification Gates — Automatic ruff/pytest checks after changes
  • Verbose Mode--verbose flag shows detailed progress and self-correction activity
  • Dry Run Mode — Preview changes without applying them
  • State Persistence — Resume work across sessions

Multi-Agent Orchestration

  • Multi-Agent Orchestra — Lead agent coordinates backend, frontend, test, and review specialists
  • Async/Await Architecture — Non-blocking agent execution with true concurrency
  • Self-Correction Loops — Agents automatically fix failing tests (up to 3 attempts)
  • WebSocket Agent Broadcasting — Real-time agent status updates to all connected clients

Quality & Review

  • AI Quality Enforcement — Dual-layer quality system preventing test skipping and enforcing 85%+ coverage
  • Quality Gates — Pre-completion checks block bad code (tests, types, coverage, review)
  • Automated Code Review — Security scanning, OWASP pattern detection, and complexity analysis
  • Lint Enforcement — Multi-language linting with trend tracking and automatic fixes

State & Context Management

  • Context-Aware Memory — Tiered HOT/WARM/COLD memory system reduces token usage by 30-50%
  • Session Lifecycle — Auto-save/restore work context across CLI restarts
  • Checkpoint & Recovery — Git + DB snapshots enable project state rollback
  • Phase-Aware Components — UI intelligently selects data sources based on project phase

Developer Experience

  • Real-time Dashboard — WebSocket-powered UI with agent status, blockers, and progress tracking
  • Proactive WebSocket Messaging — Backend pushes updates without client polling
  • Multi-Channel Notifications — Desktop notifications, webhooks, and custom routing for agent events
  • Auto-Commit Workflows — Git integration with automatic commits after successful test passes
  • Cost Tracking — Real-time token usage and cost analytics per agent/task with timeseries API

Architecture

┌─────────────────────────────────────────────────────────────┐
│                    CLI / Agent Orchestrator                  │
│  • cf work start --execute                                   │
│  • Context loading → Planning → Execution → Verification    │
│  • Blocker detection and human-in-loop                      │
└─────────────┬──────────────┬──────────────┬────────────┬────┘
              │              │              │            │
      ┌───────▼───┐   ┌──────▼──────┐  ┌───▼────────┐  ┌▼────────┐
      │ Backend   │   │  Frontend   │  │    Test    │  │ Review  │
      │ Worker    │   │  Worker     │  │   Worker   │  │ Worker  │
      │ (async)   │   │  (async)    │  │  (async)   │  │ (async) │
      └─────┬─────┘   └──────┬──────┘  └─────┬──────┘  └────┬────┘
            │                │               │              │
            │  ┌─────────────▼───────────────▼──────────────▼─────┐
            │  │         Blocker Management (Sync/Async)           │
            │  │  • Database-backed queue (SQLite)                 │
            │  │  • Human-in-the-loop questions                    │
            │  └───────────────────────────────────────────────────┘
            │
    ┌───────▼──────────────────────────────────────────────────┐
    │              Context Management Layer                     │
    │  • Tiered memory (HOT/WARM/COLD)                         │
    │  • Importance scoring & tier assignment                   │
    │  • Flash save mechanism                                   │
    └──────────────────────────────────────────────────────────┘

Quick Start

Prerequisites

  • Python 3.11+
  • Node.js 18+ (for frontend, optional)
  • Anthropic API key
  • SQLite 3 (included with Python)

Installation

# Clone repository
git clone https://github.com/frankbria/codeframe.git
cd codeframe

# Install uv package manager
curl -LsSf https://astral.sh/uv/install.sh | sh

# Set up backend
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
uv sync

# Set up environment
export ANTHROPIC_API_KEY="your-api-key-here"

CLI-First Workflow (v2 — Recommended)

# 1. Initialize workspace (with optional tech stack detection)
cd /path/to/your/project
cf init . --detect
# Or explicit: cf init . --tech-stack "Python with FastAPI, uv, pytest"

# 2. Add a PRD (Product Requirements Document)
cf prd add requirements.md

# 3. Generate tasks from PRD
cf tasks generate

# 4. List tasks
cf tasks list

# 5. Start work on a task (with AI agent)
cf work start <task-id> --execute

# 6. Check for blockers (questions the agent needs answered)
cf blocker list
cf blocker answer <blocker-id> "Your answer here"

# 7. Resume work after answering blockers
cf work resume <task-id>

# 8. Review changes and create checkpoint
cf review
cf checkpoint create "Feature complete"

Dashboard Mode (v1)

# Start the dashboard (from project root)
codeframe serve

# Or manually start backend and frontend separately:
# Terminal 1: Backend
uv run uvicorn codeframe.ui.server:app --reload --port 8080

# Terminal 2: Frontend
cd web-ui && npm install && npm run dev

# Access dashboard at http://localhost:3000

CLI Commands

Workspace Management

cf init <path>                           # Initialize workspace for a repo
cf init <path> --detect                  # Initialize + auto-detect tech stack
cf init <path> --tech-stack "description"  # Initialize + explicit tech stack
cf init <path> --tech-stack-interactive  # Initialize + interactive setup
cf status                                # Show workspace status

PRD (Product Requirements)

cf prd add <file.md>        # Add/update PRD
cf prd show                 # Display current PRD

Task Management

cf tasks generate           # Generate tasks from PRD (uses LLM)
cf tasks list               # List all tasks
cf tasks list --status READY  # Filter by status
cf tasks show <id>          # Show task details

Work Execution

cf work start <id>          # Start work (creates run record)
cf work start <id> --execute     # Start with AI agent execution
cf work start <id> --execute --verbose  # Execute with detailed output
cf work start <id> --execute --dry-run  # Preview changes only
cf work stop <id>           # Stop current run
cf work resume <id>         # Resume blocked work

Batch Execution

cf work batch run <id1> <id2> ...     # Execute multiple tasks
cf work batch run --all-ready         # Execute all READY tasks
cf work batch run --strategy parallel # Parallel execution
cf work batch run --strategy auto     # LLM-inferred dependencies
cf work batch run --retry 3           # Auto-retry failed tasks
cf work batch status [batch_id]       # Show batch status
cf work batch cancel <batch_id>       # Cancel running batch
cf work batch resume <batch_id>       # Re-run failed tasks

Blockers

cf blocker list             # List open blockers
cf blocker show <id>        # Show blocker details
cf blocker answer <id> "answer"  # Answer a blocker

Quality & Review

cf review                   # Run verification gates
cf patch export             # Export changes as patch
cf commit                   # Commit changes

Checkpoints

cf checkpoint create <name>  # Create checkpoint
cf checkpoint list          # List checkpoints
cf checkpoint restore <id>  # Restore to checkpoint
cf summary                  # Show session summary

Configuration

Environment Variables

# Required
ANTHROPIC_API_KEY=sk-ant-...           # Anthropic API key

# Optional - Database
DATABASE_PATH=./codeframe.db           # SQLite database path (default: in-memory)

# Optional - Quality Enforcement
MIN_COVERAGE_PERCENT=85                # Minimum test coverage required
CODEFRAME_ENABLE_SKIP_DETECTION=true   # Enable skip detection gate (default: true)

# Optional - Git Integration
AUTO_COMMIT_ENABLED=true               # Enable automatic commits after test passes

# Optional - Notifications
NOTIFICATION_DESKTOP_ENABLED=true      # Enable desktop notifications
NOTIFICATION_WEBHOOK_URL=https://...   # Webhook endpoint for agent events

# Frontend (set at build time for Next.js)
NEXT_PUBLIC_API_URL=http://localhost:8080
NEXT_PUBLIC_WS_URL=ws://localhost:8080/ws

Project Configuration

See CLAUDE.md in project root for project-specific configuration including:

  • Active technologies and frameworks
  • Coding standards and conventions
  • Testing requirements
  • Documentation structure

API Documentation

Core Endpoints

POST   /api/projects                          # Create project
GET    /api/projects/{id}                     # Get project details
POST   /api/projects/{id}/prd                 # Submit PRD

GET    /api/projects/{id}/agents              # List agents
POST   /api/projects/{id}/agents              # Create agent

GET    /api/projects/{id}/blockers            # List blockers
POST   /api/blockers/{id}/answer              # Answer blocker

GET    /api/projects/{id}/tasks               # List tasks
GET    /api/tasks/{id}                        # Get task details
POST   /api/tasks/approve                     # Approve tasks for development

WebSocket

WS     /ws?token=JWT_TOKEN                    # WebSocket connection (auth required)

For detailed API documentation, see /docs (Swagger UI) or /redoc (ReDoc) when the server is running.


Testing

Run Tests

# Run all unit tests
uv run pytest

# Run specific test suite
uv run pytest tests/core/           # Core module tests
uv run pytest tests/agents/         # Agent tests
uv run pytest tests/api/            # API endpoint tests

# Run with coverage
uv run pytest --cov=codeframe --cov-report=html

Test Statistics

  • Total Tests: 3000+
    • Core module tests: ~400
    • Unit tests: ~900 (Python + TypeScript)
    • Integration tests: ~500
    • E2E tests: 85+ (Backend + Playwright)
  • Coverage: 88%+
  • Pass Rate: 100%

Documentation

For detailed documentation, see:


Contributing

We welcome contributions! To get started:

  1. Fork and clone the repository
  2. Install dependencies: uv sync
  3. Install pre-commit hooks: pre-commit install
  4. Run tests to ensure everything works: uv run pytest

Code Standards

  • Python: Follow PEP 8, use ruff for linting
  • TypeScript: Follow ESLint rules, use Prettier for formatting
  • Type Hints: Required for all Python functions
  • Tests: Required for all new features (85%+ coverage)
  • Documentation: Update README and docstrings for API changes

Pull Request Process

  1. Create a feature branch: git checkout -b feature/my-feature
  2. Write tests first (TDD approach encouraged)
  3. Implement feature with proper error handling
  4. Ensure all tests pass: uv run pytest
  5. Run quality checks: uv run ruff check .
  6. Update documentation if needed
  7. Submit PR with clear description of changes

Roadmap

Completed

  • Phase 0: CLI-first Golden Path workflow
  • Phase 1: Batch execution (serial), status monitoring, cancellation
  • Phase 2: Parallel execution, dependency graphs, LLM inference, auto-retry
  • ✅ Autonomous agent execution with blocker detection
  • ✅ Verification gates integration
  • ✅ Task-based model selection
  • ✅ Self-correction loop for verification failures
  • ✅ Verbose mode for observability

In Progress (Phase 3: Reliability & Polish)

  • Environment detection: Automatic uv vs pip detection
  • Live streaming: cf work batch follow for real-time terminal output
  • WebSocket adapter: Batch events for dashboard integration
  • Progress estimation: ETA and completion forecasting

Planned Features

  • Per-task model override: cf tasks set provider <id> <provider>
  • LLM Provider Abstraction: Support for OpenAI, Gemini, local models
  • Advanced Git Workflows: PR creation, branch management, merge conflict resolution
  • Custom Agent Types: Plugin system for domain-specific agents
  • Team Collaboration: Multi-user support with role-based access control

License

This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0).

Key points:

  • Open Source: Free to use, modify, and distribute
  • Copyleft: Derivative works must also be AGPL-3.0
  • Network Use: If you run a modified version as a service, you must release source code
  • Commercial Use: Permitted with AGPL-3.0 compliance

See LICENSE for full details.


Credits & Acknowledgments

Core Team

  • Frank Bria - Creator and Lead Developer

Technologies

  • Anthropic Claude - AI reasoning engine powering all agents
  • FastAPI - High-performance async web framework
  • FastAPI Users - Authentication and user management
  • React + TypeScript - Modern frontend with real-time updates
  • SQLite - Embedded database for persistence
  • Playwright - End-to-end testing framework
  • pytest + jest - Comprehensive testing frameworks

Inspiration

Built on the principles of:

  • Autonomous agent systems (AutoGPT, BabyAGI)
  • Multi-agent orchestration (LangGraph, CrewAI)
  • Human-in-the-loop design (Constitutional AI)
  • Test-driven development (Kent Beck, Robert Martin)

Support & Community


Built with care by humans and AI agents working together

About

CodeFRAME is a Fully Remote Autonomous Multi-Agent Environment for coding - the next generation IDE for agentic coding. Launch self-correcting AI swarms that plan, build, test, and deploy software autonomously.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 7