Production-grade multi-agent AI supply chain orchestration powered by Claude Opus 4.6 — now with LangChain, LangGraph, and LangSmith
A production-grade supply chain management platform with two complementary agent orchestration layers:
| Layer | Framework | Description |
|---|---|---|
| v1 — Anthropic SDK | agents/ |
7 specialized agents using Claude tool-use agentic loop |
| v2 — LangGraph Pipeline | langchain_agents/ |
Stateful StateGraph with LCEL chains + LangSmith tracing |
Both layers share the same FastAPI backend, PostgreSQL, Redis, Kafka, and AWS infrastructure. The LangGraph pipeline adds LangSmith observability, structured evaluation datasets, and LCEL-based report generation on top of the core platform.
| Feature | Technology |
|---|---|
| Multi-Agent Orchestration (v1) | Claude Opus 4.6 + Anthropic SDK agentic loop |
| Multi-Agent Pipeline (v2) | LangGraph StateGraph + Claude via langchain-anthropic |
| LCEL Chains | Task classification + executive report generation |
| LangSmith Tracing | Full distributed trace per pipeline run |
| LangSmith Evaluation | Golden-set datasets + 6 custom evaluators + LLM-as-judge |
| Prompt Hub Sync | Versioned prompts pushed to LangSmith Hub on merge |
| API Framework | FastAPI + Pydantic v2 |
| Database | Aurora PostgreSQL 15 (Multi-AZ) |
| Caching | ElastiCache Redis 7 |
| Event Streaming | Amazon MSK (Managed Kafka) |
| ML Platform | MLflow + XGBoost + scikit-learn |
| Container Orchestration | EKS 1.29 (HPA 3–20 replicas) |
| Infrastructure | Terraform + AWS (VPC, EKS, RDS, MSK, ECR) |
| CI/CD | GitHub Actions (7 workflows incl. LangSmith eval) |
| Monitoring | Prometheus + Grafana + Jaeger + OpenTelemetry |
| Security | OWASP Top 10, Trivy, Semgrep, CodeQL, Gitleaks |
- Python 3.11+
- Docker & Docker Compose
- AWS CLI (for deployment)
- LangSmith account (for v2 tracing — free tier available)
# 1. Clone and configure
git clone https://github.com/your-org/supply-chain-ai
cd supply-chain-ai
cp .env.example .env
# 2. Add required API keys to .env
echo "ANTHROPIC_API_KEY=sk-ant-your-key-here" >> .env
echo "LANGCHAIN_API_KEY=ls__your-langsmith-key" >> .env
echo "LANGCHAIN_TRACING_V2=true" >> .env
echo "LANGCHAIN_PROJECT=supply-chain-ai" >> .env
# 3. Start all services
docker-compose up -d
# 4. Run database migrations
docker-compose exec api alembic upgrade head
# 5. Verify it's working
curl http://localhost:8000/healthfrom langchain_agents.pipelines.supply_chain_pipeline import SupplyChainPipeline
pipeline = SupplyChainPipeline()
# Synchronous
result = pipeline.run({
"task_description": "Three SKUs below reorder point and two delayed shipments",
"task_type": "comprehensive_analysis",
"priority": "high",
"data": {"warehouse_id": "WH-001"},
})
print(result.final_report)
print(result.langsmith_run_url) # Link to trace in LangSmith UI
# Streaming — yields state updates from each node as they complete
for event in pipeline.stream({"task_type": "inventory_check", "data": {}}):
node_name, update = list(event.items())[0]
print(f"Node '{node_name}' completed")
# Async
import asyncio
result = asyncio.run(pipeline.arun({"task_type": "anomaly_detection", "data": {}}))# Install dev dependencies
pip install -r requirements-dev.txt
# Unit tests (fast, no external deps)
pytest tests/unit/ -v --cov=app --cov=agents --cov-report=html
# Integration tests (requires Docker Compose)
pytest tests/integration/ -v
# E2E smoke tests (requires running app)
API_BASE_URL=http://localhost:8000 pytest tests/e2e/ -v -m smoke
# Load tests
locust -f tests/load/locustfile.py --host=http://localhost:8000Direct Claude API integration with a custom agentic loop (run_agentic_loop), Prometheus metrics, and Slack notifications.
| Agent | Role | Tools |
|---|---|---|
| OrchestratorAgent | Routes tasks, coordinates agents | analyze_task, route_to_agent, aggregate_results |
| InventoryAgent | Stock management, reordering | check_inventory_level, calculate_reorder_quantity, reserve_inventory |
| OrderAgent | Order validation, fraud detection | validate_order, check_fraud_indicators, calculate_order_total |
| SupplierAgent | Vendor evaluation, RFQs | get_supplier_performance, evaluate_supplier, generate_rfq |
| LogisticsAgent | Carrier selection, tracking | select_carrier, track_shipment, handle_exception |
| DemandForecastAgent | ML demand prediction | get_demand_forecast, analyze_demand_trends, generate_replenishment_plan |
| AnomalyDetectionAgent | Fraud & anomaly detection | detect_order_anomalies, score_transaction, classify_anomaly |
StateGraph-based orchestration with conditional fan-out, parallel node execution, and full LangSmith observability.
START
→ triage_node (LCEL analysis chain → TaskClassificationOutput)
→ [conditional fan-out based on task_type]
inventory_node ──┐
order_node ──────┤
supplier_node ───┤ (parallel; all use ChatAnthropic.bind_tools())
logistics_node ──┤
forecast_node ───┤
anomaly_node ────┘
→ aggregator_node (LCEL synthesis chain)
→ report_node (LCEL report chain → markdown executive report)
→ notification_node (Slack / logging)
END
| Node | Description | LangChain Tools Used |
|---|---|---|
triage_node |
Classifies task, sets next_nodes |
LCEL analysis_chain |
inventory_node |
Stock health, reorder recommendations | INVENTORY_TOOLS (4 tools) |
order_node |
Backlog analysis, fulfillment KPIs | ORDER_TOOLS (4 tools) |
supplier_node |
Scorecards, risk flags, lead times | SUPPLIER_TOOLS (4 tools) |
logistics_node |
Delayed shipments, route optimization | LOGISTICS_TOOLS (4 tools) |
forecast_node |
XGBoost demand forecast, seasonality | FORECAST_TOOLS (4 tools) |
anomaly_node |
IsolationForest anomaly detection | FORECAST_TOOLS (anomaly subset) |
aggregator_node |
Merges agent_results, LCEL synthesis |
— |
report_node |
Generates markdown executive report | report_chain |
notification_node |
Dispatches critical alerts to Slack | — |
| Chain | File | Input → Output |
|---|---|---|
build_analysis_chain() |
chains/analysis_chain.py |
{task_description, context} → TaskClassificationOutput (structured) |
build_parallel_analysis_chain() |
chains/analysis_chain.py |
Same → {classification, risk_summary} (parallel) |
build_report_chain() |
chains/report_chain.py |
{task_type, priority, summary, action_items} → str (markdown) |
build_streaming_report_chain() |
chains/report_chain.py |
Same → streaming token chunks |
build_summary_chain() |
chains/report_chain.py |
{summary, priority} → str (Slack-ready paragraph, Haiku) |
Every pipeline run is automatically traced in LangSmith when LANGCHAIN_TRACING_V2=true. Each trace captures:
- Full message history for every agent node
- Tool calls and responses
- Token counts per node
- End-to-end latency breakdown
- A shareable run URL (
result.langsmith_run_url)
Three golden-set datasets in LangSmith (created via python -m langchain_agents.evaluation.datasets):
| Dataset | Size | Purpose |
|---|---|---|
supply-chain-classification |
8 examples | Task type + priority accuracy |
supply-chain-inventory |
3 examples | Reorder recommendation quality |
supply-chain-anomaly |
3 examples | Anomaly severity classification |
| Evaluator | Type | Metric |
|---|---|---|
task_type_accuracy |
Exact match | Task type classification |
priority_accuracy |
Exact + partial match | Priority level (adjacent = 0.5) |
report_completeness |
Section detection | Required markdown sections present |
action_items_quality |
Heuristic | Count + specificity + priority ordering |
pipeline_latency_sla |
Threshold | Pass/fail at 60-second SLA |
anomaly_severity_match |
Exact match | Anomaly severity level |
llm_judge_quality |
LLM-as-judge | Claude Haiku scores report 0–10 |
Prompts from llmops/prompts/*.json are automatically pushed to LangSmith Hub on every merge to main via the sync-prompts job in langsmith-eval.yml.
# Create datasets in LangSmith
python -m langchain_agents.evaluation.datasets
# Run classification chain eval
python -m langchain_agents.evaluation.evaluators classification
# Run full pipeline eval
python -m langchain_agents.evaluation.evaluators pipeline| Workflow | Trigger | Steps |
|---|---|---|
| CI | Push/PR | Lint → Unit Tests → Integration Tests |
| Security Scan | Push/Schedule | OWASP → Semgrep → Gitleaks → Bandit → CodeQL |
| Image Scan | Push | Trivy → Snyk → Docker Scout → Hadolint |
| CD Production | Tag v*.*.* |
Build → Scan → Deploy → Verify |
| Terraform | Infra changes | Validate → Plan → Apply |
| MLOps | Weekly/Dispatch | Data Validate → Train → Evaluate → Register |
| LangSmith Eval | Push/PR/Daily | Dataset Upsert → Classification Eval → Pipeline Eval → Prompt Sync |
Once running, access the interactive API docs:
- Swagger UI: http://localhost:8000/api/docs
- ReDoc: http://localhost:8000/api/redoc
- OpenAPI JSON: http://localhost:8000/api/openapi.json
| Service | URL | Credentials |
|---|---|---|
| Grafana | http://localhost:3000 | admin/admin |
| Prometheus | http://localhost:9090 | — |
| Jaeger | http://localhost:16686 | — |
| MLflow | http://localhost:5000 | — |
| Kafka UI | http://localhost:8080 | — |
| LangSmith | https://smith.langchain.com | Your account |
| PgAdmin | http://localhost:5050 | admin@supply-chain.local/admin |
cd infrastructure/terraform
terraform init
terraform workspace new production
terraform plan -var-file=environments/production.tfvars
terraform apply -var-file=environments/production.tfvarsexport IMAGE_TAG=v2.0.0
export ECR_REGISTRY=123456789.dkr.ecr.us-east-1.amazonaws.com
docker build -t $ECR_REGISTRY/supply-chain-api:$IMAGE_TAG .
docker push $ECR_REGISTRY/supply-chain-api:$IMAGE_TAG
./scripts/deploy.sh production $IMAGE_TAGOWASP Top 10 protections:
- A01 Broken Access Control: JWT authentication, RBAC (5 roles)
- A02 Cryptographic Failures: TLS everywhere, bcrypt passwords, KMS encryption
- A03 Injection: Parameterized queries via SQLAlchemy ORM; prompt injection guardrails
- A04 Insecure Design: Threat modeling, security headers middleware
- A05 Security Misconfiguration: Hardened containers (readOnlyRootFilesystem, runAsNonRoot)
- A06 Vulnerable Components: Automated dependency scanning (OWASP, Snyk)
- A07 Auth Failures: JWT with expiry, refresh tokens, bcrypt
- A08 Software Integrity: Image signing, SBOM
- A09 Logging Failures: Structured logging (structlog), audit trails, LangSmith traces
- A10 SSRF: URL validation, private network restrictions
- Prompt Registry: Versioned prompts (JSON) with A/B testing, synced to LangSmith Hub
- Cost Tracking: Per-agent token consumption and cost attribution with 80% budget alerts
- Guardrails: 15 prompt injection patterns, PII redaction, sliding-window rate limiter
- Drift Detection: Automated data drift monitoring (PSI / Z-score) with retraining triggers
- Auto-Remediation: 5 built-in actions (scale pods, clear Redis, restart Celery, notify, retrain)
- LangSmith Tracing: Distributed traces with node-level latency, token counts, tool calls
- LangSmith Evaluation: CI-gated quality checks — classification accuracy ≥ 80%, report completeness ≥ 75%
MIT — see LICENSE
.
├── .github
│ ├── dependabot.yml
│ └── workflows
│ ├── cd-production.yml
│ ├── ci.yml
│ ├── image-scan.yml
│ ├── langsmith-eval.yml ← NEW: LangSmith CI evaluation
│ ├── mlops.yml
│ ├── security-scan.yml
│ └── terraform.yml
├── agents ← v1: Anthropic SDK agents
│ ├── __init__.py
│ ├── anomaly_detection_agent.py
│ ├── base_agent.py
│ ├── demand_forecast_agent.py
│ ├── inventory_agent.py
│ ├── logistics_agent.py
│ ├── orchestrator_agent.py
│ ├── order_agent.py
│ └── supplier_agent.py
├── aiops
│ ├── __init__.py
│ ├── automation
│ │ └── auto_remediation.py
│ ├── log_analysis
│ ├── monitoring
│ │ └── drift_detector.py
│ └── runbooks
├── alembic
│ ├── env.py
│ └── versions
│ └── 001_initial_schema.py
├── alembic.ini
├── app
│ ├── api
│ │ └── v1
│ │ ├── endpoints
│ │ │ ├── agents.py
│ │ │ ├── analytics.py
│ │ │ ├── auth.py
│ │ │ ├── inventory.py
│ │ │ ├── orders.py
│ │ │ ├── shipments.py
│ │ │ └── suppliers.py
│ │ └── router.py
│ ├── core
│ │ ├── config.py
│ │ ├── database.py
│ │ ├── kafka_client.py
│ │ ├── logging.py
│ │ ├── redis_client.py
│ │ ├── security.py
│ │ └── slack_notifier.py
│ ├── main.py
│ ├── middleware
│ │ ├── request_id.py
│ │ └── security.py
│ ├── models
│ │ ├── inventory.py
│ │ ├── order.py
│ │ ├── shipment.py
│ │ ├── supplier.py
│ │ └── user.py
│ ├── schemas
│ │ ├── agent.py
│ │ ├── inventory.py
│ │ ├── order.py
│ │ └── supplier.py
│ ├── services
│ │ └── order_service.py
│ └── tasks
│ ├── celery_app.py
│ └── supply_chain_tasks.py
├── architecture
│ └── aws_supply_chain_agentic_architecture.drawio.svg
├── docker-compose.yml
├── Dockerfile
├── docs
│ ├── api
│ │ └── openapi.yaml
│ └── architecture
│ ├── supply-chain-architecture.drawio
│ ├── supply-chain-architecture.drawio.svg
│ └── supply-chain-architecture-v2.drawio.svg ← NEW: LangGraph diagram
├── infrastructure
│ ├── kubernetes
│ │ ├── configmaps/app-config.yaml
│ │ ├── deployments
│ │ │ ├── api-deployment.yaml
│ │ │ └── worker-deployment.yaml
│ │ ├── helm/supply-chain-api/Chart.yaml
│ │ ├── hpa/api-hpa.yaml
│ │ ├── ingress/ingress.yaml
│ │ ├── jobs/db-migration-job.yaml
│ │ ├── namespaces/namespace.yaml
│ │ ├── network-policies/network-policy.yaml
│ │ ├── pdb/pdb.yaml
│ │ ├── rbac/rbac.yaml
│ │ └── services/api-service.yaml
│ └── terraform
│ ├── environments
│ │ ├── development.tfvars
│ │ ├── production.tfvars
│ │ └── staging.tfvars
│ ├── main.tf
│ ├── modules
│ │ ├── ecr/{main,outputs,variables}.tf
│ │ ├── eks/{main,outputs,variables}.tf
│ │ ├── elasticache/{main,outputs,variables}.tf
│ │ ├── msk/{main,outputs,variables}.tf
│ │ ├── rds/{main,outputs,variables}.tf
│ │ └── vpc/{main,outputs,variables}.tf
│ ├── outputs.tf
│ └── variables.tf
├── langchain_agents ← NEW: v2 LangGraph pipeline
│ ├── __init__.py
│ ├── state.py SupplyChainState TypedDict
│ ├── chains
│ │ ├── __init__.py
│ │ ├── analysis_chain.py LCEL task classification (structured output)
│ │ └── report_chain.py LCEL executive report + streaming
│ ├── evaluation
│ │ ├── __init__.py
│ │ ├── datasets.py LangSmith golden-set dataset creation
│ │ └── evaluators.py 6 evaluators + LLM-as-judge
│ ├── graphs
│ │ ├── __init__.py
│ │ ├── nodes.py 9 node functions (triage → specialists → agg → report)
│ │ └── supply_chain_graph.py StateGraph assembly + compile_graph()
│ ├── pipelines
│ │ ├── __init__.py
│ │ └── supply_chain_pipeline.py SupplyChainPipeline (run/arun/stream + @traceable)
│ └── tools
│ ├── __init__.py
│ ├── forecast_tools.py @tool: demand forecast + anomaly detection
│ ├── inventory_tools.py @tool: stock levels, reorder, turnover
│ ├── logistics_tools.py @tool: tracking, routing, freight booking
│ ├── order_tools.py @tool: order status, processing, fulfillment
│ └── supplier_tools.py @tool: scorecards, risks, lead times
├── llmops
│ ├── cost_tracker.py
│ ├── guardrails.py
│ ├── prompt_registry.py
│ └── prompts
│ └── orchestrator.json
├── ml
│ ├── anomaly_detection/train.py
│ └── demand_forecast/train.py
├── monitoring
│ ├── alertmanager/alertmanager.yml
│ ├── grafana/provisioning/
│ └── prometheus
│ ├── prometheus.yml
│ └── rules
│ ├── api-alerts.yml
│ ├── infrastructure-alerts.yml
│ └── ml-alerts.yml
├── pyproject.toml
├── pytest.ini
├── requirements.txt
├── requirements-dev.txt
├── scripts
│ ├── deploy.sh
│ ├── init.sql
│ └── seed_data.py
├── security
│ ├── .gitleaks.toml
│ └── .owasp-suppressions.xml
├── LICENSE
└── tests
├── conftest.py
├── e2e/test_smoke.py
├── integration/
├── load/locustfile.py
└── unit
├── test_agents.py
└── test_models.py