Skip to content

A centralized processing engine that decomposes big problems into sub-problems, builds solutions as composable building blocks, and uses a layered agent ecosystem (Thinker, Planner, Dispatcher, Orchestrator, Critic, Analyst, Judge, etc.) to break down, document, verify, and solve iteratively.

Notifications You must be signed in to change notification settings

iotlodge/problemsolver.ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

32 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

ProblemSolver.ai

ProblemSolver.ai

Recursive Problem Decomposition Engine
A multi-agent system that breaks complex problems into solvable dimensions, validates solutions through adversarial critique, and synthesizes computed answers β€” all in real time.

Python 3.12 Next.js 15 LangGraph FastAPI PostgreSQL License


What is ProblemSolver.ai?

ProblemSolver.ai is a Centralized Processing Engine that applies structured, recursive problem decomposition to any complex question. Instead of asking an LLM to answer in one shot, it orchestrates a pipeline of specialized agents β€” each with a distinct cognitive role β€” that collaborate to produce validated, high-confidence solutions.

The system decomposes problems into dimensions (independent facets), phases (ordered execution steps), and tasks (atomic work items), then solves against those structures and validates the result through adversarial review.


Architecture

Agent Pipeline

The orchestrator coordinates five agents through a LangGraph StateGraph with revision loops and quality gates:

                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚              ORCHESTRATOR                     β”‚
                    β”‚         (Top-Level StateGraph)                β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                       β”‚
                                       β–Ό
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚  THINKER                                      β”‚
                    β”‚  Analyzes the problem, identifies ambiguity,  β”‚
                    β”‚  asks clarifying questions or makes            β”‚
                    β”‚  assumptions in autonomous mode.               β”‚
                    β”‚                                                β”‚
                    β”‚  analyze_input β†’ route_by_mode β†’               β”‚
                    β”‚    generate_questions | make_assumptions β†’      β”‚
                    β”‚    refine_understanding                        β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                       β”‚
                                       β–Ό
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚  PLANNER                                      β”‚
                    β”‚  Decomposes the problem into dimensions,      β”‚
                    β”‚  phases, and tasks. Evaluates complexity      β”‚
                    β”‚  and recursively decomposes if needed.        β”‚
                    β”‚                                                β”‚
                    β”‚  identify_dimensions β†’ evaluate_complexity β†’   β”‚
                    β”‚    decompose_deeper (loop) | finalize_plan    β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                       β”‚
                                       β–Ό
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚  CRITIC                                       β”‚
                    β”‚  Evaluates the plan for completeness,        β”‚
                    β”‚  feasibility, and quality. Scores across     β”‚
                    β”‚  dimensions and decides: accept, revise,     β”‚
                    β”‚  or reject.                                   β”‚
                    β”‚                                  ◄──── revise β”‚
                    β”‚  evaluate_solution β†’ score_dimensions β†’       β”‚
                    β”‚    render_decision                             β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                       β”‚ accept
                                       β–Ό
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚  SOLVER                                       β”‚
                    β”‚  Executes the validated plan to produce      β”‚
                    β”‚  a computed answer. Synthesizes concrete     β”‚
                    β”‚  results from the decomposed structure.      β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                       β”‚
                                       β–Ό
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚  JUDGE                                        β”‚
                    β”‚  Final validation against original problem   β”‚
                    β”‚  constraints. Checks dimension coverage and  β”‚
                    β”‚  constraint satisfaction. Issues verdict     β”‚
                    β”‚  with confidence score.                       β”‚
                    β”‚                                  ◄──── retry  β”‚
                    β”‚  validate_completeness β†’ render_verdict       β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                       β”‚ accept
                                       β–Ό
                                   FINALIZE

Revision Loops

The pipeline has two feedback loops that enforce quality:

  • Critic Loop β€” If the Critic rejects or requests revision, the plan is sent back to the Planner with specific feedback. A degradation guard prevents plan collapse across revision passes. Capped at MAX_CRITIC_ITERATIONS (default 3).
  • Judge Loop β€” If the Judge's confidence falls below the user-set threshold, or if the verdict is "reject", the entire plan-solve cycle is retried with accumulated feedback. Budget protection prevents runaway iterations.

Real-Time Sub-Graph Visualization

Each agent runs its own internal LangGraph. The orchestrator streams internal node completions through an asyncio.Queue side-channel, which the WebSocket layer drains concurrently. The frontend renders live SVG mini-graphs showing exactly which internal node each agent is executing β€” Thinker's analyze_input, Planner's decompose_deeper, etc.


Tech Stack

Layer Technology
Agent Framework LangGraph 0.3+ (StateGraph with conditional edges and revision loops)
LLM Providers Anthropic Claude, OpenAI GPT (configurable per-agent)
Backend API FastAPI 0.115+ with async/await throughout
Real-Time WebSocket streaming with per-session event multiplexing
Database PostgreSQL 16 with async SQLAlchemy, Alembic migrations
Frontend Next.js 15, React 19, TypeScript 5.7, Tailwind CSS
Agent Toolkit 15 tools across 5 categories (math, analysis, research, document, system)
Package Manager uv (Python), npm (Node.js)

Agent Toolkit β€” 15 Tools

The Solver has access to a registry of executable tools that are injected into agent system prompts and tracked with per-tool metrics:

Category Tools Libraries
Math Calculator, Symbolic Math, Unit Converter, Matrix Operations sympy, numpy, scipy, pint
Analysis Data Profiler, Statistical Analysis, Pattern Detector, Aggregation Engine pandas, scikit-learn
Research Tavily Web Search, Fact Checker, Domain Knowledge tavily-python
Document Markdown Generator, Schema Builder, Data Formatter, Code Generator jinja2, jsonschema
System Resource Estimator, Quality Scorer, Risk Analyzer β€”

Each tool follows a common BaseTool interface with JSON Schema input/output validation, execution timing, success/failure tracking, and per-agent usage metrics.


Frontend

The dashboard provides a real-time view of the entire pipeline:

  • Problem Input β€” Title, description, and mode selection (interactive vs. autonomous)
  • Pipeline Topology β€” SVG graph showing agent flow with live status indicators
  • Confidence Gauge β€” Radial gauge with adjustable threshold slider
  • LLM Token Counter β€” Per-agent token usage breakdown
  • Agent Panels β€” Thinker (assumptions/constraints), Planner (dimensions/phases), Critic (scores/iterations), Solver (markdown-rendered answers), Judge (verdict/confidence)
  • Sub-Graph Explorer β€” Expandable SVG mini-graphs of each agent's internal LangGraph nodes
  • Agent Timeline β€” Chronological event feed with sub-node activity
  • Clarification Modal β€” Interactive Q&A when Thinker needs user input
  • Dark/Light Mode β€” Full theme support with CSS custom properties

Project Structure

ProblemSolver.ai/
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ agents/
β”‚   β”‚   β”œβ”€β”€ base/             # BaseAgent, AgentRegistry, JSON parser
β”‚   β”‚   β”œβ”€β”€ orchestrator/     # Top-level StateGraph (9 nodes, 3 conditional edges)
β”‚   β”‚   β”œβ”€β”€ thinker/          # Problem understanding (4 nodes)
β”‚   β”‚   β”œβ”€β”€ planner/          # Recursive decomposition (4 nodes, loop)
β”‚   β”‚   β”œβ”€β”€ critic/           # Solution evaluation (3 nodes)
β”‚   β”‚   └── judge/            # Final validation (2 nodes)
β”‚   β”œβ”€β”€ api/
β”‚   β”‚   β”œβ”€β”€ main.py           # FastAPI app factory with lifespan
β”‚   β”‚   β”œβ”€β”€ deps.py           # Dependency injection
β”‚   β”‚   └── routes/           # REST + WebSocket endpoints
β”‚   β”œβ”€β”€ core/
β”‚   β”‚   β”œβ”€β”€ config.py         # Pydantic settings
β”‚   β”‚   β”œβ”€β”€ llm_provider.py   # Anthropic + OpenAI provider abstraction
β”‚   β”‚   β”œβ”€β”€ prompt_engine.py  # Agent system prompt compilation
β”‚   β”‚   └── subnode_events.py # Async event queue for sub-graph streaming
β”‚   β”œβ”€β”€ db/
β”‚   β”‚   β”œβ”€β”€ database.py       # Async SQLAlchemy engine
β”‚   β”‚   β”œβ”€β”€ models.py         # Session, Problem, Solution tables
β”‚   β”‚   └── repositories/     # Async CRUD repositories
β”‚   β”œβ”€β”€ models/               # Pydantic domain models
β”‚   β”œβ”€β”€ services/
β”‚   β”‚   └── toolkit.py        # Tool execution service with timeouts
β”‚   └── tools/
β”‚       β”œβ”€β”€ base.py           # BaseTool, ToolResult, ToolMetrics
β”‚       β”œβ”€β”€ registry.py       # ToolRegistry singleton
β”‚       β”œβ”€β”€ errors.py         # Tool-specific exceptions
β”‚       β”œβ”€β”€ math/             # Calculator, symbolic, units, matrix
β”‚       β”œβ”€β”€ analysis/         # Profiler, statistics, patterns, aggregation
β”‚       β”œβ”€β”€ research/         # Tavily search, fact checker, domain knowledge
β”‚       β”œβ”€β”€ document/         # Markdown, schema, formatter, code gen
β”‚       └── system/           # Estimator, quality scorer, risk analyzer
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ app/              # Next.js app router (layout, page, providers)
β”‚   β”‚   β”œβ”€β”€ components/
β”‚   β”‚   β”‚   β”œβ”€β”€ agents/       # ThinkerPanel, PlannerPanel, CriticPanel, SolverPanel, JudgePanel
β”‚   β”‚   β”‚   β”œβ”€β”€ layout/       # Navbar, ThemeToggle
β”‚   β”‚   β”‚   β”œβ”€β”€ pipeline/     # PipelineGraph, SubGraphExplorer, AgentTimeline, etc.
β”‚   β”‚   β”‚   └── shared/       # StatusBadge, ProgressBar, ClarificationModal
β”‚   β”‚   β”œβ”€β”€ hooks/            # usePipeline (state + reducer), useWebSocket
β”‚   β”‚   └── lib/              # Types, constants, sub-graph topology
β”‚   └── package.json
β”œβ”€β”€ tests/                    # 262 tests across agents, tools, models, core, API
β”œβ”€β”€ alembic/                  # Database migrations
β”œβ”€β”€ docker/                   # Docker Compose (PostgreSQL 16)
β”œβ”€β”€ scripts/                  # start.sh, stop.sh, restart.sh, push.sh
β”œβ”€β”€ images/                   # Assets
└── pyproject.toml            # Project config (uv / hatch)

Getting Started

Prerequisites

  • Python 3.12+ with uv package manager
  • Node.js 20+ with npm
  • PostgreSQL 16 (via Docker or local install)
  • Anthropic API key (or OpenAI β€” configurable)

1. Clone and configure

git clone https://github.com/iotlodge/problemsolver.ai.git
cd problemsolver.ai/ProblemSolver.ai
cp .env.example .env
# Edit .env β€” add your ANTHROPIC_API_KEY at minimum

2. Start PostgreSQL

docker compose -f docker/docker-compose.yml up -d

3. Install dependencies and migrate

uv sync
uv run alembic upgrade head

4. Start the backend

uv run uvicorn backend.api.main:app --reload --host 0.0.0.0 --port 8000

5. Start the frontend

cd frontend
npm install
npm run dev

Open http://localhost:3000 β€” the dashboard connects to the backend via WebSocket automatically.

API Documentation

Once running, interactive API docs are available at http://localhost:8000/api/docs (Swagger) and http://localhost:8000/api/redoc (ReDoc).


Configuration

All configuration is through environment variables (.env file):

Variable Default Description
ANTHROPIC_API_KEY β€” Anthropic API key (required for Claude)
OPENAI_API_KEY β€” OpenAI API key (optional, for GPT models)
DEFAULT_LLM_PROVIDER anthropic Which LLM provider to use
DEFAULT_LLM_MODEL claude-sonnet-4-5-20250929 Model identifier
DATABASE_URL postgresql+asyncpg://... PostgreSQL connection string
MAX_DECOMPOSITION_DEPTH 3 Max recursive decomposition depth
MAX_CRITIC_ITERATIONS 3 Max plan revision cycles
THINKER_DEFAULT_MODE interactive interactive (asks questions) or autonomous (makes assumptions)
NEXT_PUBLIC_API_URL http://localhost:8000 Backend URL for frontend proxy
NEXT_PUBLIC_WS_URL ws://localhost:8000 WebSocket URL

Running Tests

# Full suite (262 tests)
uv run pytest tests/ -v

# With coverage
uv run pytest tests/ --cov=backend --cov-report=term-missing

# Specific category
uv run pytest tests/test_agents/ -v      # Agent pipeline tests
uv run pytest tests/test_tools/ -v       # Tool implementation tests
uv run pytest tests/test_models/ -v      # Domain model tests
uv run pytest tests/test_core/ -v        # Prompt engine tests
uv run pytest tests/test_api/ -v         # WebSocket tests

How It Works β€” Example

Input: "Optimal Meeting Schedule β€” find availability windows during a single 8-hour workday for 5 people with overlapping constraints"

Pipeline Execution:

  1. Thinker analyzes the problem, identifies ambiguities (time zones? priority ordering? lunch breaks?), and either asks clarifying questions or makes assumptions.

  2. Planner decomposes into dimensions: Temporal Constraints, Participant Availability, Room/Resource Allocation, Priority Optimization. Each dimension gets phases and atomic tasks.

  3. Critic evaluates the plan β€” scores completeness, feasibility, specificity. If the plan is too vague or missing edge cases, it sends revision feedback back to the Planner.

  4. Solver executes the validated plan, computing concrete time windows, conflict resolutions, and a recommended schedule.

  5. Judge validates the answer against original constraints β€” did it actually address all 5 people? Are the time windows valid? Does it respect the 8-hour boundary? Issues a verdict with a confidence score.

The entire flow streams to the frontend in real time, with sub-node activity visible in the Sub-Graph Explorer.


License

Apache License 2.0 β€” see LICENSE for details.


Built by @iotlodge

About

A centralized processing engine that decomposes big problems into sub-problems, builds solutions as composable building blocks, and uses a layered agent ecosystem (Thinker, Planner, Dispatcher, Orchestrator, Critic, Analyst, Judge, etc.) to break down, document, verify, and solve iteratively.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published