ParaMind is an advanced intelligent orchestration engine designed to overcome the limitations of single-agent LLM interactions. By leveraging parallel processing and dynamic task decomposition, ParaMind transforms slow, serial interactions into fast, concurrent workflows.
- Mode A (Data-Parallel): Execute the same prompt across multiple LLM models simultaneously for comparison or consensus.
- Mode B (Task-Parallel): Decompose complex tasks into independent or dependent subtasks (DAG execution).
- Smart Controller: AI-powered decision engine that chooses the optimal execution mode using LLM reasoning or semantic fallback.
- Dependency Management: Automatically handles task dependencies in Mode B, executing tasks in topological order.
- Robustness: Built-in retries, error handling, and partial failure recovery.
- High Performance: Caching layer and parallel execution deliver 2x-5x speedups (up to 100x with cache).
- Modern Web UI: Fast, responsive interface built with FastAPI and Vanilla JS.
The project follows a modern, decoupled client-server architecture.
The "Brain" of the operation. It analyzes user intent using Llama 3.3 70B to generate a JSON execution plan. It features:
- LLM Decision: Uses few-shot prompting to decide between Mode A and Mode B.
- Self-Correction: Automatically fixes invalid JSON output from the LLM.
- Semantic Fallback: Uses regex-based heuristics if the LLM fails.
The "Muscle" that executes the plan.
- Concurrency: Built on Python's
asynciofor non-blocking parallel execution. - DAG Logic: Uses Topological Sort to organize Mode B subtasks into "execution layers".
- Dynamic Refinement: Automatically re-runs tasks if the output is too short or apologetic.
The "Synthesizer". Combines individual agent outputs into a single, coherent response using a fast summarization model (Llama 3.1 8B).
- Unified Client: Wrapper for Groq and OpenAI APIs.
- Caching: File-based caching (
.cache/) drastically reduces latency and cost. - Resilience: Implements retries and rate limiting via
tenacityandasyncio.Semaphore.
ParaMind-Parallel_Agent_Orchestration/
├── src/
│ ├── api.py # FastAPI backend entry point
│ ├── controller.py # Mode decision & planning logic
│ ├── agents.py # Parallel & DAG execution engine
│ ├── llm_clients.py # API wrappers with caching/retries
│ ├── aggregator.py # Result synthesis logic
│ └── cache.py # File-based caching implementation
├── ui/
│ ├── index.html # Main Single Page Application (SPA)
│ ├── style.css # Custom CSS styling
│ └── app.js # Frontend logic (Fetch API, DOM manipulation)
├── benchmarks/
│ ├── run_eval.py # Automated benchmark runner
│ ├── analyze_results.py # Result analysis and reporting
│ └── prompts.json # Test cases
├── tests/
│ ├── test_dependencies.py
│ └── test_robustness.py
├── config/ # Configuration files
├── logs/ # Application logs
├── .env # API keys (not in git)
└── README.md
Latest run (Dec 2, 2025):
- Success Rate: 100% (25/25 prompts)
- Mode Accuracy: 100%
- Avg Speedup: High (>100x due to caching)
- Python 3.11+
- Conda (recommended)
- API keys for OpenAI and Groq
- Clone the repository:
cd ParaMind-Parallel_Agent_Orchestration- Create conda environment:
conda create -n paramind python=3.11 -y
conda activate paramind- Install dependencies:
pip install -r requirements.txt- Configure API keys:
cp .env.example .env
# Edit .env and add your API keysStart the FastAPI server:
uvicorn src.api:app --reloadOpen your browser at http://localhost:8000.
python benchmarks/run_eval.py- Live Playground: Test custom prompts with adjustable model selection and agent counts.
- Visual Plan Inspector: View the generated execution plan and dependency graph before execution.
- Human-in-the-Loop: (Optional) Review and modify the plan before agents start working.
- Real-time Metrics: See speedup and latency stats update live as agents complete tasks.
The Controller (src/controller.py) doesn't just ask the LLM for a plan; it actively validates the output.
- JSON Repair: Automatically fixes malformed JSON responses.
- Cycle Detection: Prevents circular dependencies in Mode B task graphs.
- Semantic Fallback: If the LLM fails repeatedly, a regex-based heuristic engine takes over to ensure execution never stops.
The Executor (src/agents.py) monitors agent outputs in real-time:
- Quality Checks: Detects if a response is too short or apologetic (e.g., "I apologize, but...").
- Auto-Retry: Automatically re-prompts the model with stricter instructions to get a proper response, ensuring high-quality outputs without user intervention.
The backend provides RESTful endpoints for integration:
Analyzes a prompt and returns an execution plan.
- Body:
{"prompt": "your prompt here"} - Returns: JSON object with
mode(A/B) andplandetails.
Executes a generated plan.
- Body:
{ "mode": "B", "prompt": "original prompt", "plan": { ... } } - Returns: Aggregated results and performance metrics.
Returns the latest benchmark performance metrics including success rate and speedup factors.
The benchmark suite (benchmarks/run_eval.py) measures:
- Sequential Baseline: The theoretical time taken if agents ran one after another (sum of latencies).
- Parallel Time: The actual wall-clock time taken (max of latencies).
- Speedup:
Sequential / Parallel. Values > 1.0x indicate performance gains. - Mode Accuracy: Whether the Controller correctly identified the optimal execution mode compared to ground truth.
If you encounter ModuleNotFoundError: No module named 'fastapi', please install the server dependencies manually:
pip install fastapi uvicornEnsure your .env file is correctly formatted and contains valid keys for GROQ_API_KEY and OPENAI_API_KEY.
The application is configurable via environment variables in .env:
| Variable | Description | Default |
|---|---|---|
OPENAI_API_KEY |
Required for OpenAI models | - |
GROQ_API_KEY |
Required for Llama models via Groq | - |
MAX_CONCURRENT_AGENTS |
Limit parallel agents to prevent rate limits | 10 |
DEFAULT_TIMEOUT |
Timeout for individual agent tasks (seconds) | 30 |
LOG_LEVEL |
Logging verbosity (DEBUG, INFO, WARNING) | INFO |
ParaMind uses a file-based caching system to save costs and speed up development.
- Location:
.cache/directory. - Mechanism: MD5 hash of
(model + prompt)serves as the key. - Clearing Cache: Simply delete the
.cache/directory to force fresh LLM calls.
Run the test suite to verify system integrity:
# Run all tests
pytest tests/
# Run specific test categories
pytest tests/test_dependencies.py # Verify DAG logic
pytest tests/test_robustness.py # Verify error handling
