Optimize product discoverability for LLM shopping flows with a closed learning loop: simulate -> validate -> update beliefs -> distill memory -> optimize again.
This project is a multi-tenant agentic commerce system for:
- intent-aware product discoverability optimization,
- experiment/simulation-based screening,
- real-world validation capture,
- Bayesian-style belief updates,
- memory distillation and reuse.
The core moat is not a single score. It is the feedback loop that continuously updates decisions using observed evidence.
Current extension:
- Agent operator mode (v0): governed backend orchestration with run/action persistence, runtime step controls, centralized capability registry, and policy enforcement.
See
docs/agentic-layer.md.
This app is a screening + validation + learning platform.
- Screening: synthetic LLM judge signals (fast iteration).
- Validation: observed reality signals plus provider-integrated synthetic checks.
- Learning: belief revision + memory distillation + calibration profiles.
It does not claim guaranteed production ranking outcomes from lab scores alone.
- Run simulation/experiments to generate candidate improvements.
- Validate with synthetic and/or observed signals.
- Update scoped beliefs (
client_id,brand_id,product_id) via Bayesian-style weighting. - Distill high-quality memory artifacts.
- Use those artifacts in future query generation and copy optimization.
- Recalibrate policy weights from drift between synthetic vs observed outcomes.
- Convert posterior into decision action (
promote_variant,iterate_variant,reject_hypothesis).
Loop APIs:
GET /loop/statePOST /loop/stepPOST /beliefs/updateGET /calibration/profileGET /memory/artifactsPOST /memory/distill
Operational endpoints:
POST /admin/ops/loop-maintenanceGET /admin/ops/loop-maintenance/history
domain/-> pure business logic/types.application/services/-> orchestrators grouped by capability:application/services/admin/application/services/conversation/application/services/evidence/application/services/experiment/application/services/loop/application/services/query_battery/application/services/simulation/application/services/validation_service.py
infrastructure/-> DB + LLM adapters + protocol adapters.api/-> FastAPI routes (composition root atapi/composition.py).web/-> Next.js app.shared/-> schema, config, common utilities.
Layer rule enforced by architecture checks:
- Application layer must not import infrastructure directly.
- Intent inference + alignment scoring.
- Evidence analysis and protocol readiness (UCP/ACP).
- Simulation sandbox (
run -> optimize -> retest). - Query battery generation (top-down / bottom-up / hybrid).
- Canonical intent spec and controlled onboarding.
- Experiment orchestration with variants/runs/metrics.
- Retrieval-backed frozen protocol snapshots (
snapshot_version) for fair variant comparison. - Baseline-first gating for candidate runs in retrieval-backed mode.
- Hypothesis persistence and linkage (
hypothesis_id) across variants and runs. - Versioned decision policy inputs/outputs persisted per metrics row (
decision_policy_version,decision_inputs,decision_outputs). - Variant generation paths for experiments:
- manual variant authoring,
- simulation revision prefill,
- closed-loop evidence generation (experiment + simulation + validation),
- cold-start copy generation (bottom-up / top-down / both).
- Validation system with two signals:
- Synthetic validation (LLM judge signal: in-app BYOK, provider run, manual fallback).
- Observed reality validation (manual observed outcomes).
- Belief revisions, decision events, calibration profiles.
- Memory artifacts with quality/support gating and provenance tracking.
- Agent operator mode:
- plan-first run creation (
plan_onlydefault), - approved action execution in
auto_execute_safe, - runtime safety with run lock + heartbeat refresh,
- centralized capability specs + policy checks,
- operator UI in
Agent runswith approvals, timeline deep-links, and action explainability, - immutable run event history (
agent_events) for audit/replay.
- plan-first run creation (
Experiment protocol transparency APIs:
GET /experiments/{experiment_id}/execution-stateGET /experiments/{experiment_id}/retrieval-snapshotsGET /experiments/{experiment_id}/hypotheses
Agent operator APIs:
POST /agent-runsGET /agent-runsGET /agent-runs/{run_id}GET /agent-runs/{run_id}/events- run-level event feed with keyset pagination and deep-link recovery
- supports:
event_type=all|failed|policy|executedstatus=all|proposed|approved|executing|executed|failed|rejectedcapability_namesince,untilbefore,afterevent_id,around(center timeline around a specific event)
POST /agent-runs/{run_id}/startPOST /agent-runs/{run_id}/pausePOST /agent-runs/{run_id}/cancelPOST /agent-runs/{run_id}/stepPOST /agent-runs/actions/{action_id}/decisionPOST /agent-runs/tick(bounded autonomous worker tick)
Agent Runs operator UX (current):
- compact left-rail + main workspace layout
- next recommended action panel
- inline guardrail reasons on blocked approvals
- timeline presets (
All activity,Policy failures (24h),Variant execution (7d),Validation focus (7d)) - timeline deep-link state in URL (
run_id, filters,event_id) - per-event deep-link copy with feedback and automatic deep-link recovery when event is outside current page window
Agent runtime worker/scheduler:
- one-off tick:
make agent-runtime-tick - interval scheduler:
make agent-runtime-scheduler(orpython -m scripts.run_agent_runtime_scheduler --interval-seconds 30)
Validation lives in the dedicated Validation flow/page and splits into:
- Synthetic validation signal
- Uses configured provider/model (BYOK).
- Fast consistency and copy-vs-copy checks.
- Supports execution modes:
in_app_byok(fully implemented)provider_openai_mcp(fully implemented)provider_gemini_function(UI/API contract present, backend execution pending)manual_fallback(structured paste-back)
- Observed reality signal
- Manual observed capture of what actually surfaced.
- Tracks validation progress and agreement with lab winners.
This separation keeps experiment UX focused on design/run/analyze while validation remains centralized.
cp .env.example .env.local
uv sync --extra dev
uv run uvicorn api.main:app --reload --port 8000Note: make lint and make format require dev dependencies (uv sync --extra dev) because ruff is installed via the dev extra.
cd web
cp ../.env.example .env.local
pnpm install
pnpm devFrontend tests:
cd web
pnpm install
pnpm testIf pnpm test fails with vitest: command not found, run pnpm install again in web/ to pull the new test dependencies.
Open:
- App:
http://localhost:3000 - Validation:
http://localhost:3000/validation
Use one DB file for migrations + seeds + backend runtime.
Canonical DB bootstrap/migration sources:
- schema:
shared/db/schema.sql - migrations:
shared/db/migrations/*.sql
rm -f ./tmp/local.db
DATABASE_PATH=./tmp/local.db make db-init
DATABASE_PATH=./tmp/local.db make seed-demo
DATABASE_PATH=./tmp/local.db make seed-canonical
DATABASE_PATH=./tmp/local.db make seed-demo-acme
DATABASE_PATH=./tmp/local.db make db-validate-migrate
DATABASE_PATH=./tmp/local.db make db-migrate
DATABASE_PATH=./tmp/local.db uv run uvicorn api.main:app --reload --port 8000Useful helpers:
make db-path
make db-reset
make loop-maintenance
make agent-runtime-tick
make agent-runtime-schedulerAdmin -> Operational controls -> Model gateway:
- Set per-provider keys and models.
- Separate chat/generation and validation model settings.
- Activate provider centrally.
Default model presets:
- OpenRouter:
openai/gpt-oss-120b - OpenAI:
gpt-5.2-2025-12-11 - Claude (Anthropic):
claude-sonnet-4-5-20250929 - Gemini:
gemini-3-flash-preview
Health endpoint:
GET /health/llm
Provider-run synthetic validation is feature-flagged.
Required env vars:
ENABLE_PROVIDER_VALIDATION_INTEGRATIONS=trueBACKEND_PUBLIC_URL(public backend base used to build callback URL)VALIDATION_CALLBACK_SIGNING_SECRET(HMAC signing key for callback verification)
Optional env vars:
VALIDATION_CALLBACK_TTL_SECONDS(default900)OPENAI_MCP_LAUNCH_URL(defaulthttps://chatgpt.com/)GEMINI_FUNCTION_LAUNCH_URL(defaulthttps://gemini.google.com/)
Current status:
- OpenAI ChatGPT MCP launch/callback flow: implemented.
- Gemini function-call launch mode: API contract is present, execution currently returns
501 Not Implemented.
Loop maintenance can run:
- manually from Admin (
Run loop maintenance), - via CLI (
make loop-maintenance), - on schedule (template workflow):
.github/workflows/loop-maintenance-template.yml
Install dev tools first (includes ruff):
uv sync --extra devIf you see No module named ruff when running make lint or make format, run uv sync --extra dev again in the repo root.
make lint
make format
make arch-check
make test
make web-lintArchitecture checks are part of CI and enforce layer boundaries.
docs/agentic-layer.mddocs/app-architecture.mddocs/app-workflows.mddocs/experiment-flow-detailed.mddocs/architecture-learning-loop.mddocs/user-guide-complete.mddocs/external-integrations.mddocs/deployment.mddocs/debug/incidents-fixed.mddocs/debug/open-risks.md
Apache 2.0