Phase D — migrate aegis-oss to @stackbilt/llm-providers (remove all bolted-in LLM logic)

## Summary

Multi-session epic to remove **all** bolted-in LLM inference logic from aegis-oss and the downstream AEGIS daemon, consuming \`@stackbilt/llm-providers\` as the canonical routing layer. Per the architectural rule: no Stackbilt repo (public or private) may contain its own bolted-in LLM inference, routing, failover, or provider-specific orchestration logic. llm-providers is the SoT.

This is a four-session epic. aegis-oss is the contract repo per \`project_dependency_model.md\` — migration lands here **first**, then the daemon inherits via \`@stackbilt/aegis-core\`.

## Current migration state (as of 2026-04-10, daemon v1.96.2)

**Already on llm-providers (done):**
- \`kernel/executors/cerebras.ts\` — uses \`CerebrasProvider\`
- \`kernel/executors/groq.ts\` — uses \`GroqProvider\` (both plain and tool-use variants)
- \`kernel/resilience.ts\` — re-exports \`CircuitBreakerManager\`, \`CostTracker\`, \`CreditLedger\`, \`ExhaustionRegistry\` from llm-providers

**Not yet migrated (the real Phase D scope):**
1. **Anthropic** — \`web/src/claude.ts\` / \`executeClaudeChat\` still uses raw Anthropic SDK. \`AnthropicProvider\` exists in llm-providers.
2. **Workers AI / GPT-OSS** — \`executeGptOss\` / \`executeWorkersAi\` use raw \`env.ai?.run()\`. \`CloudflareProvider\` exists in llm-providers.
3. **Dispatch routing policy** — the daemon downstream has a Cerebras remap (\`if plan.executor === 'claude' → cerebras_mid\`) that intercepts semantic executor names inside the dispatch switch. This is policy, not LLM logic, and should become a thin **routing adapter** above the llm-providers factory instead of an intercept inside the executor switch.

## Architectural concern: executor naming abstraction

Current dispatch uses semantic executor names (\`claude_opus\`, \`cerebras_reasoning\`, \`cerebras_mid\`, \`claude_code\`) that encode **strategy + tier + capability**. llm-providers routes by **provider + model + fallback chain**. Phase D needs a lightweight **routing policy adapter** in aegis-oss that maps semantic names → (provider, model, fallback chain) tuples. This adapter becomes part of the canonical contract; the daemon inherits it and can optionally supply custom presets for daemon-specific strategies.

This is the design work that makes Phase D non-mechanical.

## Dependencies on llm-providers

- **\`Stackbilt-dev/llm-providers#26\`** — factory-level streaming with fallback. **HARD BLOCKER** for Session D.1. Without it, aegis-oss loses streaming failover when migrating the Anthropic path.
- **\`Stackbilt-dev/llm-providers#28\`** — optional tool-use loop helper. Nice-to-have; enables cleaner tool-use migration in Session D.1/D.3 but not strictly blocking.
- **\`Stackbilt-dev/llm-providers#27\`** — pluggable QuotaHook. Independent track — quota enforcement drops in as a non-breaking addition later, does not block Phase D.
- **\`Stackbilt-dev/llm-providers#29\`** — CF AI Gateway metadata forwarding. LOW; nice-to-have for observability. Not blocking.

## Dependencies on edge-auth

- **\`Stackbilt-dev/edge-auth#82\`** — canonical \`ResourceQuotaProvider\` contract. Independent of Phase D execution, but the QuotaHook wiring in Session D.3+ will consume it.

## Plan

### Session D.1 — aegis-oss: routing adapter + Anthropic migration
- Design the executor routing adapter (semantic names → provider/model/fallback tuples)
- Port \`executeClaudeChat\` to use \`AnthropicProvider\` from llm-providers
- Preserve: MCP integration, streaming (gated on llm-providers#26), cost tracking, circuit breakers
- Update canonical dispatch tests to pass against the new adapter
- Publish \`@stackbilt/aegis-core\` with the migrated code

### Session D.2 — aegis-oss: Workers AI migration
- Port \`executeGptOss\` / \`executeWorkersAi\` to \`CloudflareProvider\`
- Retire raw \`env.ai?.run()\` usage
- Update tests
- Publish new \`@stackbilt/aegis-core\` version

### Session D.3 — daemon (Stackbilt-dev/aegis): inherit via dependency model
- Remove daemon's Cerebras remap intercept
- Delete daemon's \`web/src/claude.ts\` and \`web/src/kernel/executors/workers-ai.ts\`
- Keep daemon-specific Cerebras tier presets if they differ from aegis-oss defaults (inject as custom presets on the factory)
- Restore canonical dispatch tests — they should start passing once the bolted-in logic is gone
- Bump daemon to consume the new \`@stackbilt/aegis-core\`
- Deploy

### Session D.4 — validation + policy enforcement
- Integration test end-to-end: chat streaming, tool-use, failover scenarios
- Grep across aegis-oss + daemon + foodfiles + img-forge for direct \`@anthropic-ai/sdk\` / \`groq-sdk\` / raw \`env.ai?.run()\` imports — delete any that remain
- Add a lint rule (or CI check) that fails on raw provider SDK imports outside of \`@stackbilt/llm-providers\`
- Close this epic
- Close corresponding daemon kernel shadow (\`dispatch.ts +366L\` in \`project_daemon_kernel_shadow\`)

## Related daemon work

- **1.96.1** (2026-04-10) — Phase A kernel shadow cleanup. Closed 4 of ~10 shadows. Phase D.3 closes the dispatch.ts shadow as a side effect.
- **1.96.2** (2026-04-10) — AI Gateway account-ID shadow collapse (5th shadow). Fixed a latent bug where the wrong CF account ID was hardcoded; now pulls from \`env.CF_ACCOUNT_ID\` inherited via Phase A.

## Internal memory references (AEGIS context)

- \`project_phase_d_llm_providers.md\` — full scoping with gap analysis and 4-session breakdown
- \`project_resource_quota_seam.md\` — fractal multi-tenant quota architecture
- \`project_daemon_kernel_shadow.md\` — broader daemon→aegis-oss shadow cleanup context
- \`feedback_no_bolted_llm_logic.md\` — the architectural rule triggering this work

## Priority

**Design-heavy, multi-session.** Not a cc-taskrunner candidate — this needs dedicated Claude Code sessions for each phase. Track here, execute as sessions become available.

🤖 Filed by AEGIS during Phase D scoping session

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phase D — migrate aegis-oss to @stackbilt/llm-providers (remove all bolted-in LLM logic) #24

Summary

Current migration state (as of 2026-04-10, daemon v1.96.2)

Architectural concern: executor naming abstraction

Dependencies on llm-providers

Dependencies on edge-auth

Plan

Session D.1 — aegis-oss: routing adapter + Anthropic migration

Session D.2 — aegis-oss: Workers AI migration

Session D.3 — daemon (Stackbilt-dev/aegis): inherit via dependency model

Session D.4 — validation + policy enforcement

Related daemon work

Internal memory references (AEGIS context)

Priority

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Phase D — migrate aegis-oss to @stackbilt/llm-providers (remove all bolted-in LLM logic) #24

Description

Summary

Current migration state (as of 2026-04-10, daemon v1.96.2)

Architectural concern: executor naming abstraction

Dependencies on llm-providers

Dependencies on edge-auth

Plan

Session D.1 — aegis-oss: routing adapter + Anthropic migration

Session D.2 — aegis-oss: Workers AI migration

Session D.3 — daemon (Stackbilt-dev/aegis): inherit via dependency model

Session D.4 — validation + policy enforcement

Related daemon work

Internal memory references (AEGIS context)

Priority

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions