JANUS

A decision guardrail for autonomous enterprise agents. Everyone else builds a librarian — this is the guardrail in front of the action.

Before an AI agent executes a proposed action, JANUS intercepts it and hands it to a team of six reasoning agents on a Microsoft Agent Framework workflow. They retrieve analogous past org decisions and their outcomes through Foundry IQ, trace what happened, ground a cited lesson, simulate three futures, and return a recommendation — approve, modify, or reject — for a human to sign off. Knowledge tools answer questions when asked; by then the agent has already decided. JANUS doesn't wait. It is decision support with a human in the loop, and by construction it never executes anything on its own.

Built for the Microsoft Agents League @ AI Skills Fest 2026 — Reasoning Agents track. Foundry IQ (Azure AI Search agentic retrieval) is the real, load-bearing intelligence layer. Repo: github.com/vighriday/janus

At a glance

A multi-agent reasoning team — six single-responsibility agents (Guard, Retriever, Tracer, Lesson, Simulator, Decision) collaborate over typed edges on a Microsoft Agent Framework workflow, following a Planner→Executor + Critic/Verifier pattern. Any agent can abstain and halt the team.
Real Foundry IQ agentic retrieval — query planning, reranker scores, and a [ref_id] citation on every claim. Below the reranker floor, it abstains rather than guess.
A real human-in-the-loop gate — the Microsoft Agent Framework workflow pauses server-side at request_info; a second HTTP request resumes the same workflow object. Double-clicks are idempotent.
A provable causal flip — move one input across the 70% concentration knee and the recommendation changes, driven by the simulated tail, not a script.
Measured safety, not asserted — a committed red-team probe: 84.6% injection block rate (direct + indirect/XPIA), zero false positives on clean inputs.
Grounded and evaluated — a 22-case scorecard: 4.68 / 5 groundedness (100% pass), 4.36 / 5 relevance (91% pass).
Keyless and deployable — DefaultAzureCredential throughout; azd up provisions the whole stack from Bicep.

Why this exists

Enterprises are starting to hand operational decisions to autonomous agents. Those agents inherit the company's documents and data — but not the lessons it learned the hard way, the constraints that only became visible after something broke. So they confidently repeat mistakes the organization already paid for, with no memory that the bill was already settled once.

A retrieval tool would let you ask whether this has gone wrong before. JANUS doesn't wait to be asked — it checks the proposed action against what actually happened last time, before the agent acts.

Everyone else builds a librarian. This is a guardrail.

What it does — a team of reasoning agents

JANUS is a multi-agent reasoning system: six single-responsibility agents collaborate over typed message edges on a Microsoft Agent Framework workflow, each owning one reasoning step and handing its typed output to the next. The roster follows the reasoning patterns the track rewards — a Planner→Executor decomposition with built-in Critic/Verifier checks — and several agents can refuse and halt the team (abstain, block) rather than push a weak answer through.

Agent	Pattern	What it reasons about
GuardAgent	Verifier	Content Safety Prompt Shields screen the action and the documents about to be read (direct + indirect/XPIA injection).
RetrieverAgent	Executor	Foundry IQ agentic retrieval plans subqueries, ranks precedents with reranker scores, returns `[ref_id]` citations. Below the floor → abstain.
TracerAgent	Executor	For each cited precedent, walks its decision → outcome links in the in-process decision graph.
LessonAgent	Executor + Critic	Synthesises one grounded principle, every claim cited — then self-verifies it with Content Safety Groundedness. No support → abstain.
SimulatorAgent	Executor	Three futures (approve / modify / reject). It proposes only bounded levers; a seeded Monte Carlo computes every number, and a DoWhy `do()` intervention quantifies the causal effect.
DecisionAgent	Planner / HITL	Composes a trust score from the upstream agents' signals, then pauses the whole team for a human to approve or override. Nothing proceeds to execution.

The headline beat — why the recommendation flips

The intercepted action carries one bounded lever: single-vendor dependency concentration. The cost model has a non-linear knee at ~70% — below it, a vendor disruption is absorbable; above it, the seeded Monte Carlo's loss distribution grows a fat tail that dominates the approve branch.

So the flip is mechanical, not theatrical:

Lever	Simulated tail	Recommendation
dependency 100%	tail risk above threshold	MODIFY (cap below the knee)
dependency 60%	tail risk collapses	APPROVE

Same seed, same code path, one input moved across the knee. The model only proposes which levers exist; every number is computed by the Monte Carlo, and a DoWhy do() intervention isolates the causal effect of the lever. That's the difference between a guardrail that reasons and a demo that animates.

Architecture

flowchart TD
    A([Proposed action<br/>from an autonomous agent]) --> G

    subgraph spine["Six reasoning agents · Microsoft Agent Framework workflow"]
        direction TB
        G["GuardAgent<br/>verifier"]
        R["RetrieverAgent<br/>query plan · ranked precedents · citations"]
        T["TracerAgent<br/>decision → outcome graph"]
        L["LessonAgent<br/>cited principle · self-verifies grounding"]
        SIM["SimulatorAgent<br/>3 futures · seeded Monte Carlo · DoWhy do()"]
        H{{"DecisionAgent<br/>trust score · human gate · never auto-executes"}}
        G --> R --> T --> L --> SIM --> H
    end

    CS["Azure AI Content Safety<br/>Prompt Shields · Groundedness"]
    IQ["Foundry IQ<br/>Azure AI Search agentic retrieval"]
    AOAI["Azure OpenAI<br/>gpt-4o-mini · embeddings"]

    CS -. screens .-> G
    IQ == "the IQ integration" ==> R
    AOAI -. extracts .-> L
    CS -. grounds .-> L

    R -- "no precedent" --> AB([ABSTAIN<br/>escalate to a human])
    L -- "insufficient evidence" --> AB
    H --> OUT([Recommendation<br/>for human review])

    classDef azure fill:#0a3d62,stroke:#4a90d9,color:#fff;
    classDef iq fill:#1b4332,stroke:#22c55e,color:#fff;
    classDef gate fill:#3d2c00,stroke:#f59e0b,color:#fff;
    classDef abstain fill:#3d1414,stroke:#ef4444,color:#fff;
    class CS,AOAI azure;
    class IQ iq;
    class H gate;
    class AB abstain;

Layer	Choice
Orchestration	Microsoft Agent Framework — six reasoning agents as typed executors over a Workflows graph, collaborating along typed message edges, with a real `request_info` human-in-the-loop pause
Retrieval (the IQ layer)	Foundry IQ / Azure AI Search agentic retrieval — query planning, reranker scores, `[ref_id]` citations
Decision graph	In-process NetworkX (at this scale a graph server is pure friction)
Simulation	Seeded NumPy Monte Carlo over a transparent cost model + a DoWhy `do()` causal contrast
Safety	Azure AI Content Safety (Groundedness + Prompt Shields) + a 22-case eval scorecard + a red-team ASR probe
Backend	FastAPI + Pydantic v2 on uv
Frontend	Next.js 15 + Tailwind v4 + React Flow, hand-built SVG charts, over Server-Sent Events
Observability	OpenTelemetry → Arize Phoenix (local) + Azure Monitor (production)
Auth	Keyless throughout — `DefaultAzureCredential` / user-assigned managed identity

Full detail and rationale: docs/ARCHITECTURE.md · product spec: docs/PRD.md · decisions: docs/DECISIONS.md.

Roadmap (labeled as roadmap throughout)

Work IQ and Fabric IQ plug into the same knowledge base as additional sources, with no pipeline change.
Durable, crash-resumable workflow state; probabilistic (Bayesian) simulation; groundedness reasoning-mode (gated by a model-deprecation issue); the cloud AI Red Teaming Agent. One real integration shown working beats three half-wired.

Safety & reliability

Never executes. JANUS has no execution capability by construction — it emits a recommendation. "Never auto-executes" is structural, not a flag.
Abstains on thin evidence. No precedent above the reranker floor, or a lesson the model can't ground, → abstain with a depressed trust score. It never fabricates a precedent.
Real human gate. The workflow pauses server-side at the approval gate; the console resumes the same run over a second request. Double-clicks are idempotent.
Measured, not asserted. A committed red-team probe (data/eval/redteam.json) fires direct and indirect/XPIA injections at the shields: 84.6% block rate, zero false positives on clean inputs (a service error counts as not blocked, so that's a lower bound). The groundedness scorecard (data/eval/scorecard.json) scores 4.68 / 5 groundedness and 4.36 / 5 relevance across 22 cases, including the abstain ones.

Data

The decision corpus is entirely synthetic — a fictional logistics company, fictional vendors, generated transcripts, postmortems, and policies. No real organizational data, no PII, no secrets are committed. Credentials are resolved at runtime through managed identity; nothing sensitive lives in this repo.

Running it locally

Requires Python 3.11+ (managed with uv), Node 20+, and an Azure subscription with Azure AI Search, Azure OpenAI, and Azure AI Content Safety. No Docker required for the core demo — the decision graph runs in-process.

# 1. configure — copy the template and fill in your endpoints
cp .env.example .env

# 2. backend (FastAPI) on :8000
cd api && uv run uvicorn janus.app:app --port 8000

# 3. console (Next.js) on :3100, in a second terminal
cd web && npm install && npm run dev

Open the console at http://localhost:3100. The backend authenticates to Azure with DefaultAzureCredential, so az login is enough locally — there are no keys in the repo or in .env.

To index the corpus into a Foundry IQ knowledge base and run the evidence scripts:

cd api
uv run python -m janus.scripts.index_corpus   # build the knowledge base
uv run python -m janus.scripts.run_eval       # groundedness + relevance scorecard
uv run python -m janus.scripts.red_team       # injection block-rate probe

Deploying to Azure

azd up

provisions, from the Bicep in infra/, a Container Apps environment hosting both services behind a user-assigned managed identity with the keyless data-plane role assignments, Key Vault, Application Insights, and a container registry — one reproducible command. The existing AI service endpoints are passed in via azd env set (see infra/main.parameters.json). The deployed app exists for credibility; the demo is recorded on localhost so a cold start is never the first impression.

Repository layout

api/     FastAPI service — the JANUS pipeline, clients, simulation, scripts
web/     Next.js console
infra/   Bicep + azure.yaml for `azd up`
data/    synthetic decision corpus + the eval scorecard + the red-team artifact
docs/    PRD, architecture, decisions, changelog

License

MIT.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github/workflows		.github/workflows
api		api
data		data
docs		docs
infra		infra
web		web
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.markdownlint.json		.markdownlint.json
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
azure.yaml		azure.yaml
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JANUS

At a glance

Why this exists

What it does — a team of reasoning agents

The headline beat — why the recommendation flips

Architecture

Roadmap (labeled as roadmap throughout)

Safety & reliability

Data

Running it locally

Deploying to Azure

Repository layout

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

JANUS

At a glance

Why this exists

What it does — a team of reasoning agents

The headline beat — why the recommendation flips

Architecture

Roadmap (labeled as roadmap throughout)

Safety & reliability

Data

Running it locally

Deploying to Azure

Repository layout

License

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages