microsoft · ashragrawal · Jun 18, 2026 · Jun 22, 2026
@@ -26,6 +26,7 @@ Agents provisioned before this release need `Agent365.Observability.OtelWrite` g
 - Log separator written at the start of each CLI invocation now redacts values for secret-bearing options (e.g. `--idp-client-secret`) so they are not written to the log file in plain text.
 - Authentication context (tenant and user) is now logged at the `Information` level whenever the resolved sign-in identity changes, giving operators a clear audit trail in the log file of who the CLI is acting as, without exposing credentials.
 - `a365 develop-mcp evaluate` command for evaluating MCP server tool schema quality — runs deterministic and semantic checks (via GitHub Copilot or Claude Code CLIs), computes maturity scoring, and generates an interactive HTML report
+- `--eval-engine azure-openai` on `develop-mcp evaluate` — scores semantic checks with your own Azure OpenAI deployment instead of a local coding agent, authenticating with Microsoft Entra ID only (`DefaultAzureCredential`; no API key). Configured via `A365_EVAL_AZURE_OPENAI_ENDPOINT` and `A365_EVAL_AZURE_OPENAI_DEPLOYMENT`.
 - `--skip-sp-provisioning` option on `setup all` — skips the interactive in-line provisioning of missing resource service principals (issue #429). Default: setup prompts per-resource and runs `az ad sp create --id <appId>` using the operator's `az login`. With this flag, missing SPs are excluded from the consent URL and listed in the Action Required block with the `az` command and a per-SP consent URL. Implicitly enabled when stdin is redirected (CI / pipe scenarios).
 - Action Required block in the `setup all` summary now lists missing service principals — each entry shows the resource, pending scopes, the `az ad sp create` command, and the per-SP consent URL needed to complete provisioning.
 - `logs export [command] [--output <dir>]` — exports a redacted copy of a CLI diagnostic log safe to share with Microsoft support. Redacts JWT tokens, email addresses, OS-path usernames, and tenant-specific GUIDs; replaces identical values with consistent aliases so log correlation is preserved. Preserves diagnostic IDs that aren't sensitive but are useful for debugging — `TraceId`, `CorrelationId`, Microsoft Graph `request-id` and `client-request-id` values, and well-known public Microsoft / Agent 365 resource appIds (such as the Microsoft Graph appId `00000003-0000-0000-c000-000000000000`). Omit `[command]` to export all available logs at once.

@@ -37,7 +37,9 @@ Microsoft's role is supplying the `a365` CLI that orchestrates steps 1–2. Step
 
 By default the evaluation uses **Claude Haiku 4.5**, invoked through whichever coding agent CLI you have installed (GitHub Copilot CLI or Claude Code CLI). You can override the engine with `--eval-engine github-copilot` or `--eval-engine claude-code`.
 
-To run a **different model** without waiting for a CLI update, set an environment variable before the run: `A365_EVAL_COPILOT_MODEL` for GitHub Copilot (needs an exact model ID, e.g. `claude-haiku-4.5`) or `A365_EVAL_CLAUDE_MODEL` for Claude Code (accepts an alias, e.g. `haiku`).
+You can also score with **your own Azure OpenAI deployment** via `--eval-engine azure-openai`. Set `A365_EVAL_AZURE_OPENAI_ENDPOINT` and `A365_EVAL_AZURE_OPENAI_DEPLOYMENT`, and sign in to Entra ID (e.g. `az login`). This engine authenticates with **Microsoft Entra ID only** (`DefaultAzureCredential`) — **API-key authentication is not supported**. Unlike the coding-agent engines it needs no local CLI: the `a365` CLI calls your Azure OpenAI deployment directly, under your own Azure subscription, billing, and access policies.
+
+To run a **different model** without waiting for a CLI update, set an environment variable before the run: `A365_EVAL_COPILOT_MODEL` for GitHub Copilot (needs an exact model ID, e.g. `claude-haiku-4.5`) or `A365_EVAL_CLAUDE_MODEL` for Claude Code (accepts an alias, e.g. `haiku`). For Azure OpenAI, the model is whatever you name in `A365_EVAL_AZURE_OPENAI_DEPLOYMENT` (`gpt-5.4` recommended — it is the model this engine was tested against).
 
 The CLI that runs the model is **yours**, installed from npm by you, authenticated with your credentials. Microsoft does not host or resell the model API call — it is made directly by your CLI to the AI provider under your own terms of service and billing. The `a365` CLI specifies the model flag but does not mediate the API call.
 

@@ -32,15 +32,17 @@ Wait for the answer. Store as `needsAuth`:
 1. Auto — try GitHub Copilot first, fall back to Claude Code
 2. GitHub Copilot only
 3. Claude Code only
-4. None — generate the checklist and let me (or my own LLM) score it manually (bring-your-own-LLM)
+4. Azure OpenAI — score with your own Azure OpenAI deployment via Entra ID (no local coding agent; typically the fastest option)
+5. None — generate the checklist and let me (or my own LLM) score it manually (bring-your-own-LLM)
 
 Wait for the answer. Store as `evalEngine`:
 - If **1 (Auto)**: `evalEngine = "auto"`
 - If **2 (GitHub Copilot)**: `evalEngine = "github-copilot"`
 - If **3 (Claude Code)**: `evalEngine = "claude-code"`
-- If **4 (None)**: `evalEngine = "none"`
+- If **4 (Azure OpenAI)**: `evalEngine = "azure-openai"` — also confirm the user has set `A365_EVAL_AZURE_OPENAI_ENDPOINT` and `A365_EVAL_AZURE_OPENAI_DEPLOYMENT` and is signed in to Entra ID (see Step 2).
+- If **5 (None)**: `evalEngine = "none"`
 
-> **Note:** Auto and the named engines require the corresponding CLI to be installed on the workstation (`copilot` or `claude`). If neither is installed and `evalEngine` is `auto` or a named engine, the pipeline will stop after writing the checklist and print BYO-LLM instructions — that is expected, not an error.
+> **Note:** The local engines (`auto`, `github-copilot`, `claude-code`) require the corresponding CLI on the workstation (`copilot` or `claude`); if none is installed the pipeline stops after writing the checklist and prints BYO-LLM instructions — that is expected, not an error. The `azure-openai` engine needs **no local CLI** — instead it requires the `A365_EVAL_AZURE_OPENAI_ENDPOINT` and `A365_EVAL_AZURE_OPENAI_DEPLOYMENT` environment variables and an Entra ID sign-in (e.g. `az login`); see Step 2.
 
 After all three questions are answered, create all todos for the path and mark Todo 1 in-progress:
 
@@ -133,6 +135,27 @@ Tell the user one of the following is required, and that without one of them the
 - Install Claude Code (see above), or
 - Switch to `evalEngine = "none"` and score the checklist with their own LLM after the run.
 
+### If `evalEngine = "azure-openai"`
+
+No local coding agent is used — the `a365` CLI calls your Azure OpenAI deployment directly. Verify two things:
+
+1. **Endpoint and deployment are configured.** The CLI reads these from the environment (they are not flags):
+
+```bash
+echo "$A365_EVAL_AZURE_OPENAI_ENDPOINT"     # e.g. https://<resource>.services.ai.azure.com/openai/v1
+echo "$A365_EVAL_AZURE_OPENAI_DEPLOYMENT"   # deployment name, e.g. gpt-5.4 (recommended)
+```
+
+If either is empty, ask the user for their Azure OpenAI endpoint (include the `/openai/v1` path) and deployment name, then set both variables.
+
+2. **The workstation is signed in to Entra ID.** The engine authenticates with `DefaultAzureCredential` and requests a token for `https://ai.azure.com/.default`. The simplest sign-in is:
+
+```bash
+az login
+```
+
+> **Authentication — Microsoft Entra ID only.** The `azure-openai` engine supports **Entra ID authentication only**, via `DefaultAzureCredential` (Azure CLI sign-in, managed identity, a service principal supplied through environment variables, or Visual Studio / VS Code sign-in). **API-key authentication is NOT supported** — no key is read from the environment or any config file. The model call is made directly by the `a365` CLI to *your* Azure OpenAI deployment, under your Azure subscription, billing, and access policies.
+
 ### Step 2 completion
 
 > Mark Todo 2 as **completed**. Mark Todo 3 as **in-progress**. Proceed to Step 3.
@@ -188,8 +211,11 @@ These settings can come from environment variables instead of (or alongside) the
 | `A365_MCP_AUTH_TOKEN` | Bearer token for the MCP server, used when `--auth-token` is not passed. **Preferred over the flag** — it keeps the token out of process listings (`ps` / Task Manager) and shell history. If you pass `--auth-token`, the CLI prints a one-time warning recommending this variable. |
 | `A365_EVAL_COPILOT_MODEL` | Override the GitHub Copilot model (exact model ID, e.g. `claude-haiku-4.5`). |
 | `A365_EVAL_CLAUDE_MODEL` | Override the Claude Code model (alias, e.g. `haiku`). |
+| `A365_EVAL_AZURE_OPENAI_ENDPOINT` | **Required for `--eval-engine azure-openai`.** Azure OpenAI endpoint, including the API path (e.g. `https://<resource>.services.ai.azure.com/openai/v1`). |
+| `A365_EVAL_AZURE_OPENAI_DEPLOYMENT` | **Required for `--eval-engine azure-openai`.** Deployment (model) name to score with. `gpt-5.4` is recommended — it is the model this engine was tested against. |
+| `A365_EVAL_AZURE_OPENAI_MAX_CONCURRENCY` | Parallel check-scoring calls for the `azure-openai` engine. Default `100`. |
 
-The model defaults to Claude Haiku 4.5; override only to move to a newer model without waiting for a CLI release.
+For the coding-agent engines the model defaults to Claude Haiku 4.5; override only to move to a newer model without waiting for a CLI release. The `azure-openai` engine uses whatever deployment you name in `A365_EVAL_AZURE_OPENAI_DEPLOYMENT` (`gpt-5.4` recommended) and authenticates with Entra ID only (no API key — see Step 2).
 
 ### What you will see
 
@@ -268,6 +294,8 @@ If any step results in an error, stop and analyze the error message carefully.
 | `Unauthorized` from `tools/list` | Wrong or expired bearer token | Re-acquire the token (Question 2). |
 | `Could not parse server URL` | URL doesn't include the protocol | Add `http://` or `https://` to the URL. |
 | `No coding agent detected` after `[2/5]` | Neither `copilot` nor `claude` is on PATH | Install one (Step 2) or pass `--eval-engine none`. |
+| `Azure OpenAI is not available` / engine not selected | `azure-openai` chosen but `A365_EVAL_AZURE_OPENAI_ENDPOINT` / `A365_EVAL_AZURE_OPENAI_DEPLOYMENT` are unset | Set both env vars (Step 2). |
+| `Azure OpenAI authentication failed` | Not signed in to Entra ID (or no usable credential) for the `azure-openai` engine | Run `az login` (or configure a managed identity / service-principal credential). |
 | `Failed to write report to <path>` | Output dir not writable | Choose a different `--output-dir` or fix permissions. |
 | Telemetry warning at debug level | Non-blocking — the marker call failed | Ignore. The evaluation runs regardless. |
 

@@ -32,6 +32,7 @@
     <PackageVersion Include="Azure.Identity" Version="1.13.2" />
     <PackageVersion Include="Azure.ResourceManager" Version="1.13.2" />
     <PackageVersion Include="Azure.ResourceManager.AppService" Version="1.2.0" />
+    <PackageVersion Include="OpenAI" Version="2.8.0" />
     <!-- Microsoft Graph -->
     <PackageVersion Include="Microsoft.Graph" Version="5.36.0" />
     <!-- Transitive pin: resolves NU1107 conflict between Microsoft.Extensions.Http 9.x (needs >= 9.0.8)

@@ -56,10 +56,11 @@ private static Command CreateEvaluateSubcommand(ILogger logger, IEvaluationPipel
         var command = new Command(
             "evaluate",
             "Evaluate MCP server tool schema quality and generate an HTML report. " +
-            "Uses a locally installed coding agent (GitHub Copilot or Claude Code) to score semantic checks. " +
+            "Scores semantic checks with a locally installed coding agent (GitHub Copilot or Claude Code), " +
+            "or with your own Azure OpenAI deployment via --eval-engine azure-openai. " +
             "If no agent is detected, the command stops after writing the checklist so you can score it manually with your own LLM, " +
             "or pass --eval-engine none to skip agent probing entirely. " +
-            "Only run this against MCP servers you trust: the server's tool names and descriptions are sent to a locally running coding agent.");
+            "Only run this against MCP servers you trust: the server's tool names and descriptions are sent to the scoring engine.");
 
         // Use a required option (not a positional argument) for consistency with other
         // develop-mcp subcommands and Azure CLI conventions.
@@ -78,9 +79,10 @@ private static Command CreateEvaluateSubcommand(ILogger logger, IEvaluationPipel
         var evalEngineOption = new Option<string>(
             "--eval-engine",
             getDefaultValue: () => "auto",
-            "Which local coding agent scores semantic checks. " +
-            "auto: try github-copilot then claude-code. " +
-            "github-copilot or claude-code: use only that engine. " +
+            "Which engine scores semantic checks. " +
+            "auto: try github-copilot then claude-code (local coding agents). " +
+            "github-copilot or claude-code: use only that local agent. " +
+            "azure-openai: use your own Azure OpenAI deployment as the judge via Entra ID (set A365_EVAL_AZURE_OPENAI_ENDPOINT and A365_EVAL_AZURE_OPENAI_DEPLOYMENT). " +
             "none: skip automatic scoring and expect the checklist to be pre-scored (bring-your-own-LLM).");
 
         var authTokenOption = new Option<string?>(

@@ -46,6 +46,9 @@
     <PackageReference Include="Azure.ResourceManager" />
     <PackageReference Include="Azure.ResourceManager.AppService" />
 
+    <!-- OpenAI SDK (Azure OpenAI BYO-LLM judge for evaluate) -->
+    <PackageReference Include="OpenAI" />
+
     <!-- Microsoft Graph -->
     <PackageReference Include="Microsoft.Graph" />
   </ItemGroup>

@@ -56,5 +56,6 @@ public enum EvalEngine
     Auto,
     GitHubCopilot,
     ClaudeCode,
+    AzureOpenAI,
     None
 }
@@ -401,6 +401,9 @@ private static void ConfigureServices(IServiceCollection services, LogLevel mini
         // adding its ICodingAgentLauncher implementation file plus one line here.
         services.AddSingleton<ICodingAgentLauncher, GitHubCopilotLauncher>();
         services.AddSingleton<ICodingAgentLauncher, ClaudeCodeLauncher>();
+        // Azure OpenAI judge (BYO model via Entra ID). Explicit-only (AutoDetectable=false),
+        // so it never participates in --eval-engine auto despite being registered here.
+        services.AddSingleton<ICodingAgentLauncher, AzureOpenAiLauncher>();
         services.AddSingleton<IChecklistEvaluator, ChecklistEvaluator>();
         services.AddSingleton<IEvaluationAnalyzer, EvaluationAnalyzer>();
         services.AddSingleton<IReportGenerator, ReportGenerator>();