While prototyping basic LLM workflows is straightforward, scaling autonomous systems to production introduces complex software engineering constraints:
- Context Expansion & Latency: Compounding token histories increase inference costs and latency.
- State & Concurrency Control: Complex, cyclic workflows require structured state validation and session synchronization to prevent state corruption or infinite execution loops.
- Execution Safety: Dynamic, agent-initiated tool execution demands containerized compute environments to protect underlying systems.
- Telemetry & Tracing: Non-deterministic agent trajectories require standardized execution traces for performance inspection and debugging.
This repository provides reference architectures, implementation labs, and capstones for building stateful, multi-agent systems, focusing on runtime optimization, state persistence, and systematic evaluation.
- The Production Agent Gap
- Architectural Paradigm Matrix
- Prerequisites & Environment Setup
- Curriculum Syllabus
- Enterprise Tooling Stack
- Contributing
- License
Modern AI engineering requires a transition from stateless prompting to stateful cyclic workflows and specialized multi-agent teams.
graph LR
A["Stateless Prompting<br/>(Zero Context/Tools)"] -->|"Add Search Context"| B["Static RAG<br/>(One-Shot Search)"]
B -->|"Add Planning Loop"| C["Autonomous Agent (ReAct)<br/>(Dynamic Tool Reasoning)"]
C -->|"Decouple Concerns"| D["Multi-Agent Orchestration<br/>(Specialized Pipelines)"]
By adding a stateful planning loop, persistent memory checkpointers, and isolated worker boundaries, we upgrade fragile chains into resilient systems that can self-reflect, plan, and gracefully recover from execution errors.
| Paradigm | Primary Limitation | State Scope | Concurrency Safety | Cost Profile | Production Reliability |
|---|---|---|---|---|---|
| Stateless Prompting | Lacks iteration, cannot correct intermediate logic errors. | None (Stateless) | N/A (Atomic) | Low (Fixed) | High (But limited capability) |
| Static RAG | Passive lookup; fails if search terms are suboptimal. | Single Query | N/A | Low-Medium | Medium-Low (Prone to retrieval noise) |
| Autonomous Agent (ReAct) | Attention degradation, tool hallucination, infinite loops. | Local Loop | Low (Requires session locks) | High (Compounding prefill tokens) | Low (Hard to predict/assert) |
| Multi-Agent Systems | High coordination latency, complex state merges. | Decoupled / Shared Graph | High (Via isolated actor queues) | Optimized (Via cached context & small scopes) | High (Deterministic boundaries & isolated fallbacks) |
To run the labs and capstone projects, your development environment must satisfy:
- Python 3.13+ (leveraging modern typing and async primitives)
- Docker Desktop (required for sandbox execution labs and project containment)
- LLM Provider API Keys: Google Gemini (via
GEMINI_API_KEY), Anthropic (viaANTHROPIC_API_KEY), OpenAI (viaOPENAI_API_KEY)
# 1. Clone the repository
git clone https://github.com/FirdowsRahaman/agentic-workflows-masterclass.git
cd agentic-workflows-masterclass
# 2. Establish python environment
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# 3. Initialize verification run (Vanilla ReAct Loop)
cd labs/lab-01-vanilla-react
python agent.pyThis repository follows a theory-first, implementation-focused learning path.
Deep dive into architectural patterns, mathematical mechanics, and engineering design.
- π Level 1: Foundations of Agentic Reasoning
- Chapter 1: Defining the Agent Paradigm β Autonomy boundaries, agentic loops, and context constraints.
- Chapter 2: Moving Beyond Static RAG β Resolving retrieval failure modes via active retrieval loops.
- Chapter 3: Cognitive Design Patterns β Plan-and-Execute, ReAct, and Self-Reflection frameworks.
- Chapter 4: Tool Use & Stateless Core β Structured tool output, timeout handles, and error recovery loops.
- ποΈ Level 2: Advanced Orchestration Frameworks
- Chapter 5: Stateful Agent Workflows β Cyclic state graphs, node transitions, and thread checkpointers.
- Chapter 6: Multi-Agent Collaboration Patterns β Supervisor routing, peer-to-peer choreography, and state boundaries.
- Chapter 7: Human-in-the-Loop (HITL) β Interrupt conditions, manual state overrides, and time-travel debugging.
- Chapter 8: Persistent Memory & Context Management β Episodic vs. semantic memory, associative indexing, and GraphRAG.
- ποΈ Level 3: Enterprise-Grade AgentOps
- Chapter 9: Agent Guardrails & Security β Prompt injection defense, PII scrubbing, and sandboxed execution rules.
- Chapter 10: Observability & Tracing β Telemetry tracing, span collection, and path auditing (Phoenix/LangSmith).
- Chapter 11: Evaluation Pipelines β LLM-as-a-judge patterns, golden evaluation datasets, and deterministic assertions.
- Chapter 12: Production Deployment & Serving β Server-Sent Events (SSE), distributed session locking with Redis, and message queues.
- Chapter 13: Context Caching & KV-Cache Optimization β KV attention math, prefix matching constraints, and token billing economics.
- Chapter 14: The Future of Agents β Vision/voice loops, local runtimes (Ollama/SLMs), and desktop-automation environments.
Implement core components from scratch to master the low-level mechanics.
- π§ͺ Lab 1: The Vanilla ReAct Loop β Implement a complete planning, tool execution, and self-reflection loop in pure Python without framework wrappers.
- π§ͺ Lab 2: Stateful LangGraph β Design cyclic state graphs with node updates, state mergers, and thread-level state restoration.
- π§ͺ Lab 3: Supervisor Team β Implement a supervisor router that coordinates task delegation, output aggregation, and worker control loops.
- π§ͺ Lab 4: Human Interruption β Pause state graphs before high-risk operations to allow manual validation and state modifications.
- π§ͺ Lab 5: Dual-Core Memory Engine β Build a memory manager coordinating short-term conversational threads alongside long-term semantic embeddings in SQLite.
- π§ͺ Lab 6: Evals Pipeline β Configure automated evaluation suites that run test assertions and compute similarity/completeness scores via LLM judges.
- π§ͺ Lab 7: Corrective RAG (CRAG) Engine β Build an advanced RAG pipeline with document grading, query expansion, and web search fallback routes.
- π§ͺ Lab 8: Collaborative Agents with Google ADK β Orchestrate sequential collaborative workflows using Google's Agent Development Kit, exposing custom tools and native MCP server interfaces.
Deploy production-ready solutions modeled after enterprise workloads.
- π» Autonomous Sandbox Dev Team: Build PM, Coder, and QA Tester agents that collaborate, write python scripts, and run automated unit tests inside secure, isolated Docker sandboxes.
- π₯ Clinical Support Agent with HITL: Deploy a clinical triage assistant with clinical guidelines RAG, safety policy guardrails, and human escalation checkpoints.
- π Collaborative Market Research Engine: Build a multi-agent data aggregator, research compiler, and Neo4j GraphRAG explorer that builds structured reports from unstructured documents.
- π‘οΈ Compliance & Risk Auditor: Architect a regulatory compliance engine that audits PDF financial files, validates outputs against strict Pydantic schemas, and triggers safety filters via Guardrails AI.
- Agent Development Kit (ADK): Google's open-source framework for building multi-agent systems and workflows.
- LangGraph: Stateful orchestration library of choice for complex state management.
- GenAI SDK: Official library for Google Gemini models.
- Phoenix: AI observability and tracing platform.
We welcome contributions!
- Fork the repo and create your feature branch (
git checkout -b feature/AmazingFeature). - Commit your changes (
git commit -m 'Add some AmazingFeature'). - Push to the branch (
git push origin feature/AmazingFeature). - Open a Pull Request.
Distributed under the MIT License. See LICENSE for details.