model-traffic-manager

Azure-native AI traffic routing for teams that need one stable endpoint across Azure OpenAI accounts, regions, and deployments.

model-traffic-manager is a policy-driven router for chat/completions, embeddings, and selected backend-facing shared services. It focuses on secretless auth, explainable routing, health-aware failover, and live AKS validation without turning into a generic AI gateway or tenant control plane.

One stable endpoint for AI traffic on Azure and AKS
Managed Identity first, with explicit API-key fallback only when required
Explainable failover, cooldown, and circuit behavior you can actually debug
Inbound auth with router-owned API bearer tokens or Microsoft Entra ID

If this project helps your team ship reliable Azure AI traffic faster, sponsorship is welcome. Support directly funds documentation, live validation coverage, and roadmap time for the next production-grade capabilities.

Why teams use it

Route across multiple Azure AI / Azure OpenAI accounts and regions without teaching every backend how to fail over.
Keep outbound auth secretless by default when downstreams support Managed Identity.
Protect router entrypoints with either opaque API bearer tokens or Entra ID access tokens for app-to-app callers.
Expose routing decisions, rejected candidates, and caller-safe observability fields for support and audit trails.
Validate the real deployment path with opt-in Azure and AKS live suites instead of relying only on mocks.

Product focus

This repository is intentionally narrow:

it is an internal AI traffic router for Azure-native backends
it is not a generic AI platform, SaaS control plane, prompt workspace, or tenant orchestrator
it favors explicit configuration, predictable routing, and operator supportability over framework magic

That focus is the product. The goal is to give platform teams a small, explainable router they can trust in production.

Who this is for

platform teams running Azure OpenAI or Azure AI workloads across more than one region, account, or deployment
backend teams that want one stable AI endpoint instead of embedding failover and auth logic into every service
organizations that care about Managed Identity, explainable routing, and AKS-backed validation before production rollout
teams building internal chatbot systems or AI backends that need a strong infrastructure layer before they build the larger product around it

Who this is not for

teams looking for a no-code prompt workspace, agent builder, or multi-tenant SaaS control plane
teams that want a generic API gateway with every protocol and every provider built in
workloads that do not need Azure-native identity, failover visibility, or operator-focused routing control

Architecture at a glance

flowchart LR
    A[Backend services] -->|Bearer token or Entra ID| B[model-traffic-manager on AKS]
    B --> C[Routing policy and health state]
    C --> D[Azure OpenAI account / region A]
    C --> E[Azure OpenAI account / region B]
    C --> F[Azure OpenAI account / region C]
    B --> G[Metrics, traces, runtime events]
    B --> H[Optional Redis shared runtime state]

Current capabilities

POST /v1/chat/completions/{deployment_id} with tiered multi-upstream failover
POST /v1/embeddings/{deployment_id} with tiered multi-upstream failover
POST /v1/shared-services/{service_id} for selected backend-facing shared services
startup-time YAML validation with typed deployment and shared-service registries
outbound auth modes none, api_key, and managed_identity
inbound auth modes api_bearer_token and entra_id
weighted round robin within the lowest healthy tier
cooldown, quota-aware classification, circuit opening, and half-open recovery
in-memory and Redis-backed runtime state for shared health and limiter coordination
deployment-level request-rate limiting and concurrency limiting
request correlation, runtime decision events, Prometheus /metrics, and OpenTelemetry trace foundations
live validation suites for smoke, live chat, live embeddings, load balancing, shared services, inbound auth, observability, and Redis-backed multi-replica behavior

Quick start

make bootstrap
make check
make run

Local setup notes:

copy or reference values from .env.example
keep the router YAML in configs/example.router.yaml or point MODEL_TRAFFIC_MANAGER_CONFIG_PATH to another file
use MODEL_TRAFFIC_MANAGER_RUNTIME_STATE_BACKEND=redis together with MODEL_TRAFFIC_MANAGER_REDIS_URL when you want shared runtime state locally

Useful local endpoints after startup:

GET /health/live
GET /health/ready
GET /deployments
GET /shared-services
POST /v1/shared-services/{service_id}
POST /v1/chat/completions/{deployment_id}
POST /v1/embeddings/{deployment_id}

Validation

The repository keeps a layered validation model:

local quality gate: make check
Azure-backed integration without AKS: make integration-azure-local
AKS smoke and live suites: make e2e-aks-local
real model-response validation on AKS: make e2e-aks-live-model-local
live inbound auth validation on AKS: make e2e-aks-live-inbound-auth-local
live request-flow observability validation on AKS: make e2e-aks-live-observability-local

The higher-level suites are intentionally opt-in because they provision temporary Azure resources and can consume real model quota.

Documentation

Official docs explain the product, runtime behavior, configuration model, routing, and operations guidance.
Internal docs track delivery planning, task decomposition, and internal changelog history.
AGENTS.md is the repository working agreement for maintainers and AI-assisted workflows.

Community and support

If model-traffic-manager saves your team time, reduces routing risk, or helps you standardize Azure AI traffic, support it through the repository sponsor button or GitHub Sponsors. Sponsorship helps fund documentation, reliability work, live AKS validation, and the larger product roadmap around production chatbot infrastructure.

License

This repository is licensed under Apache-2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github		.github
_docs		_docs
app		app
configs		configs
docker		docker
docs		docs
infra		infra
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

model-traffic-manager

Why teams use it

Product focus

Who this is for

Who this is not for

Architecture at a glance

Current capabilities

Quick start

Validation

Documentation

Community and support

License

About

Uh oh!

Releases 3

Sponsor this project

Uh oh!

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

model-traffic-manager

Why teams use it

Product focus

Who this is for

Who this is not for

Architecture at a glance

Current capabilities

Quick start

Validation

Documentation

Community and support

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 3

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages