diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json index 427a071..b5e653d 100644 --- a/.claude-plugin/marketplace.json +++ b/.claude-plugin/marketplace.json @@ -29,11 +29,12 @@ { "name": "microsoft-365-agents-toolkit", "source": "./plugins/microsoft-365-agents-toolkit", - "version": "1.3.1", + "version": "1.5.0", "description": "Toolkit for building and evaluating Microsoft 365 Copilot declarative agents — scaffolding, JSON manifest development, capability configuration, and eval workflows.", "skills": [ "./skills/install-atk", "./skills/declarative-agent-developer", + "./skills/teams-app-developer", "./skills/ui-widget-developer", "./skills/m365-agent-evaluator" ] diff --git a/.github/plugin/marketplace.json b/.github/plugin/marketplace.json index c8b90ae..a01d9a0 100644 --- a/.github/plugin/marketplace.json +++ b/.github/plugin/marketplace.json @@ -29,11 +29,12 @@ { "name": "microsoft-365-agents-toolkit", "source": "./plugins/microsoft-365-agents-toolkit", - "version": "1.3.1", + "version": "1.5.0", "description": "Toolkit for building and evaluating Microsoft 365 Copilot declarative agents — scaffolding, JSON manifest development, capability configuration, and eval workflows.", "skills": [ "./plugins/microsoft-365-agents-toolkit/skills/install-atk", "./plugins/microsoft-365-agents-toolkit/skills/declarative-agent-developer", + "./plugins/microsoft-365-agents-toolkit/skills/teams-app-developer", "./plugins/microsoft-365-agents-toolkit/skills/ui-widget-developer", "./plugins/microsoft-365-agents-toolkit/skills/m365-agent-evaluator" ] diff --git a/plugins/microsoft-365-agents-toolkit/.github/plugin/plugin.json b/plugins/microsoft-365-agents-toolkit/.github/plugin/plugin.json index 376f9c1..53366b4 100644 --- a/plugins/microsoft-365-agents-toolkit/.github/plugin/plugin.json +++ b/plugins/microsoft-365-agents-toolkit/.github/plugin/plugin.json @@ -1,7 +1,7 @@ { "name": "microsoft-365-agents-toolkit", "description": "Toolkit for building and evaluating Microsoft 365 Copilot declarative agents — scaffolding, JSON manifest development, capability configuration, and eval workflows.", - "version": "1.4.0", + "version": "1.5.0", "author": { "name": "Microsoft" } diff --git a/plugins/microsoft-365-agents-toolkit/README.md b/plugins/microsoft-365-agents-toolkit/README.md index 3292999..750585f 100644 --- a/plugins/microsoft-365-agents-toolkit/README.md +++ b/plugins/microsoft-365-agents-toolkit/README.md @@ -47,6 +47,7 @@ npx -y --package @microsoft/m365-copilot-eval@latest runevals --prompts-file eva |-------|-------------| | [**install-atk**](./skills/install-atk/SKILL.md) | Install or update the ATK CLI and VS Code extension | | [**declarative-agent-developer**](./skills/declarative-agent-developer/SKILL.md) | Scaffolding, JSON manifest authoring, capability configuration, security patterns, deployment via ATK CLI | +| [**teams-app-developer**](./skills/teams-app-developer/SKILL.md) | Build, test, and deploy code-based Teams apps (Custom Engine Agents, bots, tabs, message extensions) via the ATK CLI | | [**ui-widget-developer**](./skills/ui-widget-developer/SKILL.md) | Build MCP servers with OpenAI Apps SDK widget rendering for Copilot Chat | | [**m365-agent-evaluator**](./skills/m365-agent-evaluator/SKILL.md) | Generate, run, and analyze evaluation suites for M365 Copilot declarative agents | diff --git a/plugins/microsoft-365-agents-toolkit/skills/ui-widget-developer/SKILL.md b/plugins/microsoft-365-agents-toolkit/skills/ui-widget-developer/SKILL.md index 1240df9..2f5cdb5 100644 --- a/plugins/microsoft-365-agents-toolkit/skills/ui-widget-developer/SKILL.md +++ b/plugins/microsoft-365-agents-toolkit/skills/ui-widget-developer/SKILL.md @@ -89,6 +89,7 @@ This skill triggers when building MCP servers with OAI app or widget rendering f | **Existing M365 agent, new MCP server** | MCP server + widgets + mcpPlugin.json | Start at [Implementation](#implementation) | | **Existing MCP server, add Copilot widgets** | Widget support added to existing server | Start at [Copilot Widget Protocol](references/copilot-widget-protocol.md#adaptation-checklist-existing-mcp-server) | | **Language choice** (non-TypeScript) | Protocol requirements | See [Copilot Widget Protocol](references/copilot-widget-protocol.md) for what to implement, [MCP Server Pattern (TypeScript)](references/mcp-server-pattern.md) as a reference | +| **Deploy to Azure** (remote hosting, not just devtunnel) | Provision + Deploy wired in `m365agents.yml` with Bicep, managed identity, App Insights | Follow [Azure Provision & Deploy](references/azure-provision-deploy.md) | --- @@ -372,6 +373,8 @@ Core requirements: ## DevTunnels Setup > **Local testing only.** DevTunnels are for development and testing on your machine. Before sharing the agent more broadly, deploy both the MCP server and widget assets to a hosted environment (e.g., Azure App Service, Azure Static Web Apps, or another hosting provider) and update the agent manifest URLs accordingly. +> +> **Going to Azure?** To make the Agents Toolkit **Provision** and **Deploy** actions create Azure resources (Bicep) and push the MCP server to Azure App Service, see [references/azure-provision-deploy.md](references/azure-provision-deploy.md). It rewrites `m365agents.yml`, adds `infra/` Bicep, and fixes the `env/.env.` files. DevTunnels expose your localhost MCP server to M365 Copilot using **named tunnels** for stable URLs. See [references/devtunnels.md](references/devtunnels.md) for setup scripts, command reference, and troubleshooting. @@ -413,6 +416,25 @@ On first run, provision the agent once the tunnel is up (see AGENT PROVISIONING 3. **Provision + test** — see AGENT PROVISIONING rule for when this is needed; bump `version` in manifest.json if Copilot doesn't reflect changes +## Deploy to Azure (remote hosting) + +The devtunnel workflow above is for **local** development. To host the MCP server on Azure +so it has a stable public URL (and the agent works without a developer machine running), +follow [references/azure-provision-deploy.md](references/azure-provision-deploy.md). + +That reference makes the Agents Toolkit **Provision** and **Deploy** buttons do real Azure +work by adding an `arm/deploy` step and a `deploy:` stage to `m365agents.yml`, plus Bicep +under `infra/`. It provisions, with an identity-first, monitoring-ready posture: + +- a resource group (created — with a location choice — or reused), with its id saved to env +- an App Service (Linux, Node 22) on a **B1** plan by default, with larger SKUs offered +- a **user-assigned managed identity** instead of secrets (Key Vault reference offered when + a secret is unavoidable) +- **Application Insights** + Log Analytics for monitoring and debugging + +Use it whenever the user asks to deploy/host the MCP server on Azure, fix a Provision or +Deploy that does nothing, or wire `m365agents.yml` to Azure infrastructure. + ## Best Practices See [references/best-practices.md](references/best-practices.md) for detailed guidance. diff --git a/plugins/microsoft-365-agents-toolkit/skills/ui-widget-developer/references/azure-provision-deploy.md b/plugins/microsoft-365-agents-toolkit/skills/ui-widget-developer/references/azure-provision-deploy.md new file mode 100644 index 0000000..17ff3c2 --- /dev/null +++ b/plugins/microsoft-365-agents-toolkit/skills/ui-widget-developer/references/azure-provision-deploy.md @@ -0,0 +1,728 @@ +--- +name: azure-provision-deploy +description: | + Make the Agents Toolkit "Provision" and "Deploy" actions do real Azure work for an + existing Microsoft 365 Agents Toolkit project. Fixes the two failure modes seen in + practice: (1) Provision fails or does nothing because m365agents.yml has no Azure + infrastructure step, and (2) there is no Deploy stage, so there is no way to push the + MCP server to Azure. The skill rewrites m365agents.yml, adds Bicep under infra/, and + fixes the env/.env. + env/.env..user files so Provision creates resources and + Deploy ships code to Azure App Service. Provisioning creates (or reuses) a resource + group, an App Service (Linux, Node 22) with a user-assigned managed identity instead of + secrets, optional Storage, and Application Insights for monitoring. +when_to_use: + - "make the Provision and Deploy buttons work in Agents Toolkit" + - "Provision does nothing / Provision fails with no Azure resources" + - "no way to deploy the MCP server to Azure" + - "wire m365agents.yml to Azure with Bicep" + - "add provision + deploy lifecycle to an M365 agent project" + - "deploy MCP server to Azure App Service from the Agents Toolkit extension" + - "create or reuse a resource group and store its id in env" + - "use managed identity instead of secrets for the MCP server" + - "add Application Insights / monitoring to the MCP server deployment" + - "choose the App Service plan SKU (B1 default) for the MCP server" +schema_version: m365agents v1.11 +--- + +# Provision & Deploy an M365 Agents Toolkit project to Azure + +The objective of this skill is simple: after running it, the user can open the +**Microsoft 365 Agents Toolkit** VS Code extension, click **Provision** to create +Azure resources, then click **Deploy** to push the MCP server code to **Azure App +Service** — and end up with a working remote environment. + +It fixes the two real-world failures: + +1. **"Not able to provision."** `m365agents.yml` has no `arm/deploy` step (or the Bicep + CLI is missing), so the Provision action creates nothing or errors out. +2. **"No information to deploy to Azure."** There is no `deploy:` stage and no App + Service target, so there is nowhere for the code to go. + +> **How it runs:** prefer the extension buttons (**Provision** / **Deploy** in the +> Agents Toolkit lifecycle tree). The exact same lifecycle can be run from a terminal +> as a fallback: +> ```bash +> atk provision --env dev # or: npx teamsapp provision --env dev +> atk deploy --env dev # or: npx teamsapp deploy --env dev +> ``` +> +> **Headless terminal (no browser)?** Interactive sign-in (`atk account login azure`) +> can hang because it can't open a browser. Add `--interactive false` to +> `provision`/`deploy` to reuse cached tokens. `arm/deploy` and `azureAppService/zipDeploy` +> still need an Azure credential — run them from the VS Code Provision/Deploy buttons +> (which handle login), or, when only Azure auth is missing, provision the Bicep with +> `az deployment group create` and deploy with `az webapp deploy` using an already +> signed-in Azure CLI. + +## What this skill is NOT + +- It is **not** a fresh `azd init` or a brand-new scaffold — it patches an **existing** + M365 Agents Toolkit project whose `m365agents.yml` already has at least + `teamsApp/create`, `teamsApp/zipAppPackage`, `teamsApp/update`, and + `teamsApp/extendToM365`. +- It does **not** stop to ask clarifying questions — it inspects the repo and infers the + answers. Only stop if a critical fact (Azure resource type or build commands) genuinely + cannot be inferred. +- It does **not** author the declarative agent itself (manifest, instructions). That is + the `declarative-agent-developer` skill's job. + +## Provisioning model — what "good" looks like + +Aim for a deployment the user can trust and operate. Provision in this order and favour +identity over secrets: + +1. **Resource group** (Step 2) — create a new one (offer the user a location) or reuse an + existing one. Persist both the name and the resource id in `env/.env.` so later + stages and re-runs target the same group. +2. **Hosting** (Step 3) — an Azure App Service (Linux, Node 22) plus a **user-assigned + managed identity**, and Storage only if the server needs it. Authenticate to Azure + resources through the managed identity — **do not** issue connection strings or client + secrets unless a third-party dependency genuinely requires one. +3. **Monitoring** (Step 3) — always wire up **Application Insights** (backed by a Log + Analytics workspace) so the user can watch the live service and debug failures. + +Default the App Service plan to **B1 (Linux)**: it is the cheapest SKU that supports +`alwaysOn`, which avoids the worst cold-start stalls. Offer the user a larger plan +(`S1`, `P1v3`, `P2v3`) when they need lower latency or more throughput, and warn that +free/shared tiers (`F1`) cannot keep the server warm — expect slow first requests there. + +## Step 1 — Inspect the repo + +Read these files (in parallel where possible) and remember what you find: + +| File | What to extract | +| --- | --- | +| `m365agents.yml` | current `provision` actions; whether an `arm/deploy` step and a `deploy:` stage exist | +| `m365agents.local.yml` | local debug commands (hints at dev script names) | +| `env/.env.dev` (and other non-local envs) | which envs exist; existing keys and values | +| `env/.env.dev.user` | secret keys (must start with `SECRET_`) | +| `.gitignore` | confirm `env/.env.*.user` and `env/.env.` are ignored | +| `appPackage/manifest.json` and any `*-plugin.json` | which `${{VARS}}` are referenced (e.g. `MCP_SERVER_URL`) | +| `infra/*.bicep` | existing infra, if any (App Service / Functions / Storage) | +| `package.json` (root) | npm workspaces? build scripts? | +| `src/**/package.json` | per-project build commands; production deps | +| Any `widgets/build.*` or `vite.config.*` | ad-hoc build scripts that must run before zip | + +From this, infer: + +- **Target compute** — Azure App Service (Linux, Node.js) is the default. Only choose + Functions or SWA if the project clearly requires it (`host.json`, + `staticwebapp.config.json`). +- **What to build** — every workspace with a `build` script that produces runtime output + (server `dist/`, widget `assets/*.html`, etc.). +- **What to ship** — the runtime closure: `dist/`, prod-only `node_modules/`, + `package.json`, plus built assets. Not source, tests, or markdown. +- **Identity & access** — does the server call Azure resources (Storage, Key Vault, + etc.)? If so, plan a **managed identity + role assignments** instead of keys or + connection strings. +- **Runtime app settings** — scan the server source for `process.env.*`. Each one must + become an App Service setting. +- **Manifest `${{VARS}}`** — those must exist in `env/.env.` by the time + `teamsApp/zipAppPackage` runs. + +## Step 2 — Choose or create the resource group + +All resources land in one resource group. Decide which group **before** generating Bicep, +and record it so every later step and re-run is consistent. + +1. Confirm the signed-in subscription and capture its id → `AZURE_SUBSCRIPTION_ID`: + ```bash + az account show --query "{name:name, id:id}" -o table + ``` + +2. Ask the user (AskUserQuestion) whether to **reuse an existing group** or **create a + new one**. + + - **Reuse** — list candidates and let them pick: + ```bash + az group list --query "[].{name:name, location:location}" -o table + ``` + - **Create** — offer a location choice first (never hard-code one): + ```bash + az account list-locations --query "[].name" -o tsv # full list + ``` + Suggest a few common ones (e.g. `eastus`, `westus2`, `westeurope`, + `australiaeast`), let the user choose, then create the group: + ```bash + az group create -n -l + ``` + +3. Capture the group's name **and** resource id and write both to `env/.env.` + (Step 7): + ```bash + az group show -n --query "{name:name, id:id}" -o table + ``` + - `AZURE_RESOURCE_GROUP_NAME=` + - `AZURE_RESOURCE_GROUP_ID=` + +> `arm/deploy` (Step 6) deploys *into* this group. Storing the resource id means re-runs, +> teardown, and other tooling all act on the same group without re-prompting. + +## Step 3 — Generate `infra/azure.bicep` + +The template provisions, in one deployment: a **user-assigned managed identity**, optional +**Storage** (identity-auth, no keys), **Log Analytics + Application Insights**, and the +**App Service** (Linux, Node 22). Prefer the managed identity for every Azure dependency +and keep secrets out. + +```bicep +@description('Base name used for all resources') +param baseName string = '' + +@description('Environment suffix, e.g. dev/test/prod') +param envSuffix string = 'dev' + +@description('Location for all resources') +param location string = resourceGroup().location + +@description('App Service Plan SKU. B1 (Basic, Linux) is the recommended default — it is the cheapest tier that supports alwaysOn. Scale up (S1, P1v3, P2v3) for lower cold-start latency and more throughput.') +@allowed([ + 'B1' // recommended default — Basic, alwaysOn capable + 'B2' + 'B3' + 'S1' // Standard — better performance, deployment slots + 'P1v3' // Premium v3 — lowest cold start, production workloads + 'P2v3' +]) +param appServicePlanSku string = 'B1' + +@description('Linux Node runtime. Must match the MCP server (Node 22).') +param nodeVersion string = 'NODE|22-lts' + +var miName = '${baseName}-mi-${envSuffix}' +var planName = '${baseName}-plan-${envSuffix}' +var siteName = '${baseName}-app-${envSuffix}' +var laName = '${baseName}-logs-${envSuffix}' +var aiName = '${baseName}-ai-${envSuffix}' +var storageName = toLower(replace('${baseName}st${envSuffix}', '-', '')) + +// ---- User-assigned managed identity (preferred over secrets) ---- +resource uami 'Microsoft.ManagedIdentity/userAssignedIdentities@2023-01-31' = { + name: miName + location: location +} + +// ---- Storage (only if the server needs it) — managed-identity auth, keys disabled ---- +resource storage 'Microsoft.Storage/storageAccounts@2023-05-01' = { + name: storageName + location: location + sku: { + name: 'Standard_LRS' + } + kind: 'StorageV2' + properties: { + allowSharedKeyAccess: false // force Entra ID / managed-identity auth + minimumTlsVersion: 'TLS1_2' + } +} + +// Grant the app's identity data-plane access to storage (no connection strings) +resource storageBlobRole 'Microsoft.Authorization/roleAssignments@2022-04-01' = { + name: guid(storage.id, uami.id, 'Storage Blob Data Contributor') + scope: storage + properties: { + // Storage Blob Data Contributor + roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', 'ba92f5b4-2d11-453d-a403-e96b0029c9fe') + principalId: uami.properties.principalId + principalType: 'ServicePrincipal' + } +} + +// ---- Monitoring: Log Analytics + Application Insights (always provisioned) ---- +resource logs 'Microsoft.OperationalInsights/workspaces@2023-09-01' = { + name: laName + location: location + properties: { + sku: { + name: 'PerGB2018' + } + retentionInDays: 30 + } +} + +resource appInsights 'Microsoft.Insights/components@2020-02-02' = { + name: aiName + location: location + kind: 'web' + properties: { + Application_Type: 'web' + WorkspaceResourceId: logs.id + } +} + +// ---- Compute ---- +resource plan 'Microsoft.Web/serverfarms@2023-12-01' = { + name: planName + location: location + sku: { + name: appServicePlanSku + } + kind: 'linux' + properties: { + reserved: true // required for Linux plans + } +} + +resource site 'Microsoft.Web/sites@2023-12-01' = { + name: siteName + location: location + identity: { + type: 'UserAssigned' + userAssignedIdentities: { + '${uami.id}': {} + } + } + properties: { + serverFarmId: plan.id + httpsOnly: true + siteConfig: { + linuxFxVersion: nodeVersion + // Strategy B (compiled JS + prebuilt node_modules, no Oryx build). For a + // TypeScript server run via tsx, use Strategy A instead: REMOVE this + // appCommandLine and set SCM_DO_BUILD_DURING_DEPLOYMENT='true' (see the + // "Linux startup, Oryx, and node_modules" callout below). + appCommandLine: 'node server/dist/index.js' // adjust to the staged layout + alwaysOn: appServicePlanSku != 'F1' && appServicePlanSku != 'D1' + ftpsState: 'Disabled' + minTlsVersion: '1.2' + appSettings: [ + { name: 'WEBSITE_NODE_DEFAULT_VERSION', value: '~22' } + { name: 'SCM_DO_BUILD_DURING_DEPLOYMENT', value: 'false' } // Strategy A: set 'true' and drop appCommandLine + // Application Insights — connection string only, no instrumentation secret + { name: 'APPLICATIONINSIGHTS_CONNECTION_STRING', value: appInsights.properties.ConnectionString } + { name: 'ApplicationInsightsAgent_EXTENSION_VERSION', value: '~3' } + // Managed identity: the server uses DefaultAzureCredential with this client id + { name: 'AZURE_CLIENT_ID', value: uami.properties.clientId } + { name: 'STORAGE_ACCOUNT_NAME', value: storage.name } + // ... one entry per process.env.* the server reads + ] + } + } +} + +// Outputs: emit anything the toolkit / yaml needs downstream (never secrets) +output appServiceResourceId string = site.id +output mcpServerUrl string = 'https://${site.properties.defaultHostName}' +output managedIdentityClientId string = uami.properties.clientId +``` + +### CRITICAL — Linux startup, Oryx, and `node_modules` + +App Service for Linux uses **Oryx**. How you start the app and whether Oryx builds +must be a **single coherent choice** — mixing them is the most common cause of a +container that builds fine but then crash-loops with `ERR_MODULE_NOT_FOUND` / +`Cannot find package '...'`. + +When Oryx builds (`SCM_DO_BUILD_DURING_DEPLOYMENT=true`) it often **compresses +`node_modules` into `node_modules.tar.gz` and replaces it with a symlink**. That +tarball is only expanded by **Oryx's default startup script**. If you set a custom +`appCommandLine`, that script is replaced and **the extraction never runs** — so +`node_modules` is empty at runtime even though the build "succeeded". A tell-tale +symptom is the runtime log showing a tool being fetched from `/root/.npm/_npx/...` +(npx hit the network because the local copy wasn't there). + +Pick **one** of these — do not mix: + +**Strategy A (recommended for Node/TS) — let Oryx install + start:** +- `SCM_DO_BUILD_DURING_DEPLOYMENT=true` +- **No** `appCommandLine`. Put a `start` script in the deployed `package.json`; + Oryx's default startup extracts `node_modules` and runs `npm start`. +- Ship **source + a slim production `package.json`** — do **not** ship + `node_modules` (avoids Windows long-path zip corruption and Linux/native-module + mismatches). Drop `package-lock.json` from the stage unless it matches the slim + `package.json`, or Oryx's `npm ci` will fail. +- TypeScript: add `tsx` as a **runtime dependency** and use `"start": "tsx main.ts"`. + Never start with `npx tsx ...` — `npx` fetches tsx over the network instead of + using the local copy. + +**Strategy B — fully self-contained, no Oryx build:** +- `SCM_DO_BUILD_DURING_DEPLOYMENT=false` +- Ship a **prebuilt** artifact (compiled `dist/` **and** a real, uncompressed + `node_modules` directory) in the zip. +- A custom `appCommandLine` (e.g. `node dist/index.js`) is fine here, because Oryx + isn't compressing anything, so `node_modules` stays a real directory. + +If you remember nothing else: **a custom `appCommandLine` on top of an Oryx build += empty `node_modules` at runtime.** The Bicep above uses Strategy B; for a +TypeScript server run via `tsx`, prefer Strategy A — set +`SCM_DO_BUILD_DURING_DEPLOYMENT=true`, remove `appCommandLine`, and rely on the +`start` script instead. + +> **No storage needed?** Delete the `storage` account, the `storageBlobRole` assignment, +> and the `STORAGE_ACCOUNT_NAME` app setting. Keep the managed identity, Log Analytics, +> and Application Insights regardless. + +### CRITICAL — Bicep output → env-var naming + +After `arm/deploy` runs, the toolkit writes each Bicep `output` into +`env/.env.` by **uppercasing the camelCase identifier and stripping +underscores**: + +| Bicep output name | Resulting env var | +| ------------------------- | ------------------------- | +| `appServiceResourceId` | `APPSERVICERESOURCEID` | +| `mcpServerUrl` | `MCPSERVERURL` | +| `managedIdentityClientId` | `MANAGEDIDENTITYCLIENTID` | +| `storageAccountName` | `STORAGEACCOUNTNAME` | + +**Always** reference these in `m365agents.yml` exactly as written — e.g. +`${{APPSERVICERESOURCEID}}`. Using `APP_SERVICE_RESOURCE_ID` fails with +`Unresolved placeholders`. + +If the manifest already references something like `${{MCP_SERVER_URL}}` (with +underscores), either keep that variable written manually in `env/.env.` and +emit the same value as a no-underscore output, or rename the manifest var to the +no-underscore form. + +## Step 4 — Generate `infra/azure.parameters.json` + +With managed identity there are usually **no secrets** to pass. Keep the parameter file +small and identity-first: + +```json +{ + "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#", + "contentVersion": "1.0.0.0", + "parameters": { + "baseName": { "value": "" }, + "envSuffix": { "value": "${{TEAMSFX_ENV}}" }, + "appServicePlanSku": { "value": "B1" } + } +} +``` + +**Prefer managed identity over secrets.** Only reach for a stored secret when a +dependency genuinely cannot use Azure identity. When a secret *is* required, present the +user with **both** options below and let them choose (AskUserQuestion) — never silently +bake a raw secret into app settings. + +### When the project needs a secret — offer two options + +**Option A — Move to a managed identity (recommended).** +Replace the secret entirely with the user-assigned managed identity already provisioned in +Step 3. Nothing to rotate, nothing to leak, no expiry-driven downtime. + +- Best choice whenever the dependency supports Entra ID auth, and the **required choice if + the solution needs OBO** (on-behalf-of) — OBO flows should ride on the identity, not a + static secret. +- This usually requires **code changes**: swap secret/connection-string auth for + `DefaultAzureCredential` (passing `AZURE_CLIENT_ID` from the injected app setting), and + add the matching role assignment in Bicep (as shown for Storage in Step 3). +- Outcome: no secret on the server to manage. + +**Option B — Store the secret in Key Vault, referenced by the App Service.** +Use this when the secret is unavoidable (e.g. a third-party API key with no identity +support). Do **not** put the raw value in `appSettings` — store it in Key Vault and let +App Service resolve it at runtime via a **Key Vault reference**: + +``` +@Microsoft.KeyVault(SecretUri=https://.vault.azure.net/secrets/) +``` + +For this to resolve you MUST: + +1. Provision a **Key Vault** in `infra/azure.bicep` and store the secret in it. +2. Ensure the App Service has a **managed identity** (the Step 3 user-assigned identity is + fine — it's already attached to the site). +3. **Authorize that identity to read secrets** from the vault — either an RBAC role + assignment (`Key Vault Secrets User`, role id + `4633458b-17de-408a-b874-0445c86b69e6`) when the vault uses Azure RBAC, or a Key Vault + access policy granting `get` on secrets. +4. Set the app setting to the `@Microsoft.KeyVault(...)` reference string instead of the + literal secret. App Service substitutes the live value at startup. + +Sketch for the Bicep additions (App Service path): + +```bicep +@description('Name of the secret to seed into Key Vault (value passed securely)') +@secure() +param thirdPartyApiKey string + +resource vault 'Microsoft.KeyVault/vaults@2023-07-01' = { + name: '${baseName}-kv-${envSuffix}' + location: location + properties: { + sku: { family: 'A', name: 'standard' } + tenantId: subscription().tenantId + enableRbacAuthorization: true // use RBAC, not access policies + enableSoftDelete: true + } +} + +resource apiKeySecret 'Microsoft.KeyVault/vaults/secrets@2023-07-01' = { + parent: vault + name: 'thirdPartyApiKey' + properties: { value: thirdPartyApiKey } +} + +// Authorize the app's managed identity to read secrets +resource kvSecretsUser 'Microsoft.Authorization/roleAssignments@2022-04-01' = { + name: guid(vault.id, uami.id, 'Key Vault Secrets User') + scope: vault + properties: { + roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', '4633458b-17de-408a-b874-0445c86b69e6') + principalId: uami.properties.principalId + principalType: 'ServicePrincipal' + } +} + +// ...then in the site's appSettings: +// { name: 'THIRD_PARTY_API_KEY', value: '@Microsoft.KeyVault(SecretUri=${apiKeySecret.properties.secretUri})' } +``` + +Only the *seed* value (`thirdPartyApiKey`) flows through the `@secure()` param from +`env/.env..user` (key prefixed `SECRET_`) during Provision. At runtime the server +reads the resolved value from its app setting — it never handles the Key Vault credential +itself. Every non-secret interpolation still comes from `env/.env.`. + +> Whichever option is chosen, secrets must never appear in committed files, Bicep outputs, +> or `appSettings` as literals. + +## Step 5 — Generate a stage script (only for non-trivial layouts) + +If `azureAppService/zipDeploy` cannot simply zip a single `dist/` folder (e.g. a +Node.js app that needs prod-only `node_modules` plus extra asset folders), create +`infra/stage.mjs`: + +```js +// installs --omit=dev in server/, copies dist + node_modules + package.json +// + assets/ into a clean stage directory referenced by zipDeploy. +``` + +Layout the script must produce, matching the Bicep `appCommandLine`: + +``` +deploy-stage/ + server/ + package.json + dist/ + node_modules/ # production only + assets/ # only if the server serves static HTML +``` + +Skip this whole step if a single `dist/` folder is enough. + +### Strategy A variant — TypeScript run via `tsx` (no compile step) + +When the server runs TypeScript directly with `tsx` (no `dist/` compile), the stage +should ship **source + a slim production `package.json`** and let Oryx install deps. +Do **not** copy `node_modules` and do **not** set a custom `appCommandLine`: + +```js +// infra/stage.mjs — produces ./deploy-stage for azureAppService/zipDeploy. +// Run AFTER `npm run build` (which produces any prebuilt assets, e.g. a widget HTML). +import fs from "node:fs"; +import path from "node:path"; + +const root = process.cwd(); +const stage = path.join(root, "deploy-stage"); +fs.rmSync(stage, { recursive: true, force: true }); +fs.mkdirSync(stage, { recursive: true }); + +// Server entry + source (+ any prebuilt assets the server reads at runtime) +for (const f of ["main.ts", "server.ts"]) fs.copyFileSync(path.join(root, f), path.join(stage, f)); +fs.cpSync(path.join(root, "src"), path.join(stage, "src"), { recursive: true }); +// e.g. a prebuilt widget shell consumed by an MCP resource handler: +// fs.mkdirSync(path.join(stage, "dist"), { recursive: true }); +// fs.copyFileSync(path.join(root, "dist/widget.html"), path.join(stage, "dist/widget.html")); + +// Slim production package.json: runtime deps only (incl. tsx), no build script, +// so Oryx installs prod deps and `npm start` launches the server. +const pkg = JSON.parse(fs.readFileSync(path.join(root, "package.json"), "utf-8")); +fs.writeFileSync(path.join(stage, "package.json"), JSON.stringify({ + name: pkg.name, version: pkg.version, type: pkg.type, private: true, + engines: { node: ">=22 <23" }, + scripts: { start: "tsx main.ts" }, // tsx MUST be in dependencies, not devDependencies + dependencies: pkg.dependencies, +}, null, 2) + "\n"); +``` + +Pair this with `SCM_DO_BUILD_DURING_DEPLOYMENT=true`, **no** `appCommandLine`, and a +`deploy:` stage whose `zipDeploy` `artifactFolder` is `deploy-stage` (see Step 6). + +## Step 6 — Rewrite `m365agents.yml` + +This is the file that actually wires the **Provision** and **Deploy** buttons. The +`arm/deploy` step is what makes Provision create Azure resources; the `deploy:` stage is +what makes Deploy ship code. + +```yaml +# yaml-language-server: $schema=https://aka.ms/m365-agents-toolkits/v1.11/yaml.schema.json +version: v1.11 +environmentFolderPath: ./env + +provision: + - uses: arm/deploy # <- makes "Provision" create Azure resources + with: + subscriptionId: ${{AZURE_SUBSCRIPTION_ID}} + resourceGroupName: ${{AZURE_RESOURCE_GROUP_NAME}} + bicepCliVersion: v0.30.23 # download Bicep CLI if not on PATH + templates: + - path: ./infra/azure.bicep + parameters: ./infra/azure.parameters.json + deploymentName: -${{TEAMSFX_ENV}} + - uses: teamsApp/create + with: + name: ${{APP_NAME_SUFFIX}} + writeToEnvironmentFile: + teamsAppId: TEAMS_APP_ID + - uses: teamsApp/zipAppPackage + with: + manifestPath: ./appPackage/manifest.json + outputZipPath: ./appPackage/build/appPackage.${{TEAMSFX_ENV}}.zip + outputFolder: ./appPackage/build + - uses: teamsApp/update + with: + appPackagePath: ./appPackage/build/appPackage.${{TEAMSFX_ENV}}.zip + - uses: teamsApp/extendToM365 + with: + appPackagePath: ./appPackage/build/appPackage.${{TEAMSFX_ENV}}.zip + writeToEnvironmentFile: + titleId: M365_TITLE_ID + appId: M365_APP_ID + +deploy: # <- makes "Deploy" push code to Azure + - uses: cli/runNpmCommand + with: + workingDirectory: src/ + args: install --no-audit --no-fund + - uses: cli/runNpmCommand + with: + workingDirectory: src/ + args: run build + # repeat for every workspace with runtime-consumed build output (widgets, etc.) + - uses: script + with: + run: node infra/stage.mjs # only if Step 5 produced a stage script + - uses: azureAppService/zipDeploy + with: + artifactFolder: + ignoreFile: .deployignore + resourceId: ${{APPSERVICERESOURCEID}} # NOTE: no underscores +``` + +## Step 7 — Update env files + +`env/.env.dev` (committed, no secrets): +``` +TEAMSFX_ENV=dev +APP_NAME_SUFFIX=dev +AZURE_SUBSCRIPTION_ID= +AZURE_RESOURCE_GROUP_NAME= +AZURE_RESOURCE_GROUP_ID= +TEAMS_APP_TENANT_ID= +# Any var the manifest already references (MCP_SERVER_URL etc.) +# Bicep outputs are appended here automatically after Provision +# (APPSERVICERESOURCEID, MCPSERVERURL, MANAGEDIDENTITYCLIENTID, ...) +``` + +`env/.env.dev.user` (gitignored, secrets only — must start with `SECRET_`): +``` +# Empty by design when using managed identity. Add SECRET_* keys only for +# unavoidable third-party secrets that Azure identity cannot cover. +``` + +`env/.env.dev.sample` (committed, doc-only): +- mirror the keys with empty values, plus a comment for the `.user` keys. + +> Replace `dev` with the actual environment name for any non-local env (e.g. `test`, +> `prod`). The `local` env is handled by `m365agents.local.yml` and is out of scope here. + +## Step 8 — `.gitignore` and `.deployignore` + +`.gitignore` must ignore: +``` +env/.env.*.user +env/.env. +``` + +Create `.deployignore` so `zipDeploy` skips dev-only files: +``` +.git/ +.github/ +.vscode/ +*.md +src/ # source — only the staged dist is shipped +tests/ +*.test.* +node_modules/.cache/ +``` + +## Step 9 — Verify, don't just edit + +After writing the files, run from the extension (or the CLI fallback) and confirm +each step actually succeeds: + +1. **Provision** — `atk provision --env dev` (or `npx teamsapp provision --env dev`) + must succeed end-to-end and create the App Service, managed identity, and + Application Insights in the chosen resource group: + ```bash + az resource list -g -o table + ``` +2. **Deploy** — `atk deploy --env dev` (or `npx teamsapp deploy --env dev`) must + succeed end-to-end. +3. **Health check** — `curl -I https://.azurewebsites.net/` + must eventually return 2xx (App Service may need ~30s to warm up after the first + deploy; on B1 the first request after idle can be slow). +4. **Monitoring** — confirm Application Insights is receiving telemetry (open the + resource in the portal, or verify `APPLICATIONINSIGHTS_CONNECTION_STRING` is present + in the App Service settings) so the user has live diagnostics. + +If anything fails, consult the troubleshooting table before changing unrelated things. + +## Troubleshooting cheatsheet + +| Symptom | Root cause | Fix | +| --- | --- | --- | +| **Provision does nothing / no Azure resources appear** | `m365agents.yml` has no `arm/deploy` step | Add the `arm/deploy` action (Step 6) pointing at `infra/azure.bicep`. | +| **Deploy button missing or "nothing to deploy"** | No `deploy:` stage in `m365agents.yml` | Add the `deploy:` stage with `cli/runNpmCommand` build steps + `azureAppService/zipDeploy`. | +| `InvalidYamlSchemaError ... Unable to parse yaml file` | Action key not in v1.11 schema, extra property like `writeToEnvironmentFile` on `arm/deploy`, or `script.shell: pwsh` (must be a path) | Validate against `https://aka.ms/m365-agents-toolkits/v1.11/yaml.schema.json`. `arm/deploy` has no `writeToEnvironmentFile` — outputs are auto-saved. Drop `shell:` from `script`. | +| `CompileBicepError ... spawn bicep ENOENT` | Bicep CLI not on PATH | Add `bicepCliVersion: v0.30.23` (or any released version) under `arm/deploy.with`. | +| `Unresolved placeholders ["FOO_BAR"]` during deploy | Variable name in yaml doesn't match what `arm/deploy` wrote | Bicep outputs become `UPPERCASENOUNDERSCORE`. Use `${{APPSERVICERESOURCEID}}`, not `${{APP_SERVICE_RESOURCE_ID}}`. | +| `MissingEnvironmentVariablesError ... SECRET_X` | Secret missing from `env/.env..user` | Secrets must be prefixed `SECRET_` and live in the `.user` file. | +| App Service responds 404 / `Cannot GET /mcp` | Wrong `appCommandLine` or wrong `artifactFolder` layout | Make `appCommandLine` match the staged layout (`server/dist/index.js`). | +| **Container builds OK but crash-loops** with `Cannot find package '...'` / `ERR_MODULE_NOT_FOUND`; runtime log shows a tool fetched from `/root/.npm/_npx/...` | A **custom `appCommandLine` bypassed Oryx's `node_modules` extraction** (Oryx compressed it to `node_modules.tar.gz` + a `/node_modules` symlink that only the default startup script expands) | Remove the custom `appCommandLine` and use a `start` script (Strategy A), **or** disable the Oryx build and ship a real uncompressed `node_modules` (Strategy B). Never start with `npx ` — ship the tool (e.g. `tsx`) as a runtime dependency. | +| `az webapp deploy` / `zipDeploy` reports **"Site failed to start within 10 mins"** but the site is actually up | The deploy poller can time out seconds before the container's warm-up probe succeeds | Check `GET /health` and the docker log (`Site started ...` with the new deployment id) before assuming failure — only re-deploy if health is genuinely down. | +| Deployed `node_modules` is **missing files** / native modules fail to load on Linux | `node_modules` was built on Windows and zipped (long-path truncation or OS-specific binaries) | Don't ship `node_modules`; use Strategy A and let Oryx install on Linux. | +| Oryx build fails running `npm run build` (missing vite/tsc entry, etc.) | Oryx auto-runs a `build` script if present, but the stage doesn't include build inputs | Ship a **slim `package.json` with no `build` script** (Strategy A) so Oryx only runs `npm install`; prebuild assets locally before staging. | +| App Service 401/500 on a specific route | Runtime app settings missing (e.g. `AAD_APP_CLIENT_SECRET`) | Add to Bicep `appSettings`; re-provision; secret comes from the `${{SECRET_*}}` parameter. | +| `arm/deploy` leaks a secret in deployment outputs | `output ... = ...storageConnectionString` | Don't output it, or annotate `#disable-next-line outputs-should-not-contain-secrets`. | +| `AuthorizationFailed` / `RoleAssignmentUpdateNotPermitted` during Provision | The Bicep `roleAssignments` resource needs the deployer to have **Owner** or **User Access Administrator** on the resource group | Grant that role on the RG (or have an owner run Provision once); role assignments can't be created by Contributor alone. | +| Storage returns **403** at runtime with the managed identity | Role assignment not yet propagated, or code still uses keys while `allowSharedKeyAccess: false` | Use `DefaultAzureCredential` with `AZURE_CLIENT_ID` from the env; wait ~1–2 min for the role to propagate; never re-enable shared keys. | +| Application Insights shows **no telemetry** | `APPLICATIONINSIGHTS_CONNECTION_STRING` or `ApplicationInsightsAgent_EXTENSION_VERSION` missing from `appSettings` | Add both app settings in the Bicep `site` resource and re-provision. | +| First request very slow / server seems asleep | `F1`/shared plan (no `alwaysOn`) or cold container | Default to **B1** (alwaysOn capable); scale to `S1`/`P1v3` for lower cold start. | +| App setting shows the literal `@Microsoft.KeyVault(...)` string / value not resolved | App Service identity can't read the vault, wrong `SecretUri`, or vault uses access policies while you granted RBAC (or vice-versa) | Confirm the managed identity has `Key Vault Secrets User` (RBAC vaults) or a `get` access policy; verify the `SecretUri` matches the secret; re-provision. | + +## Conventions to remember + +1. **Provision = `arm/deploy`; Deploy = `azureAppService/zipDeploy`.** If either button + "doesn't work," the corresponding stage is usually missing from `m365agents.yml`. +2. **Bicep output naming**: camelCase becomes UPPERCASE-NO-UNDERSCORES. Always. +3. **Identity over secrets**: prefer a user-assigned managed identity + role assignments. + When a secret is unavoidable, offer the user **two** options — (A) move to managed + identity (required for OBO), or (B) store it in Key Vault and reference it from app + settings via `@Microsoft.KeyVault(SecretUri=...)` with the identity authorized on the + vault. Never bake a raw secret into `appSettings`. +4. **One resource group**: create or reuse a group up front; store its name and id in + `env/.env.` so every run targets the same group. +5. **Always add Application Insights**: every App Service ships with App Insights + Log + Analytics wired in for monitoring and debugging. +6. **Default plan B1 (Linux)**: cheapest SKU with `alwaysOn`; offer `S1`/`P1v3` for + performance, and warn that `F1` cannot stay warm. +7. **Provision order**: `arm/deploy` first, then `teamsApp/create`, + `teamsApp/zipAppPackage`, `teamsApp/update`, `teamsApp/extendToM365`. The Teams package + step runs **after** Bicep so the manifest can interpolate Bicep outputs + (e.g. `${{MCPSERVERURL}}`). +8. **Stage before zipDeploy**: never zip the source repo. Always stage a clean folder. +9. **Idempotence**: `arm/deploy` with the same `deploymentName` is a safe upsert — re-run + Provision freely. +10. **Linux startup is one coherent choice** (Strategy A *or* B, never mixed): + **A)** Oryx build (`SCM_DO_BUILD_DURING_DEPLOYMENT=true`) + a `start` script + **no** + `appCommandLine`, shipping source + a slim `package.json` (no `node_modules`); or + **B)** no Oryx build (`=false`) + a prebuilt, uncompressed `node_modules` + a custom + `appCommandLine`. A custom `appCommandLine` on top of an Oryx build leaves + `node_modules` empty at runtime. For TypeScript, ship `tsx` as a **runtime + dependency** and use `"start": "tsx main.ts"` (never `npx tsx`). +11. **Trust `/health`, not the deploy poller**: `zipDeploy` may report a start timeout + seconds before the container is actually ready — verify `GET /health` and the docker + log before re-deploying.