Skip to content

fix: resolve KV reference timing bug on backend appSettings#196

Open
bradystroud wants to merge 1 commit intomainfrom
fix-kv-reference-resolution
Open

fix: resolve KV reference timing bug on backend appSettings#196
bradystroud wants to merge 1 commit intomainfrom
fix-kv-reference-resolution

Conversation

@bradystroud
Copy link
Copy Markdown
Member

Symptom

Every `/v1/embeddings` call from staging returned 401. App Insights showed the literal string `@Microsoft.KeyVault(SecretUri=...)` being sent as the API key — Azure was passing the unresolved KV reference straight through. The Postgres connection string was failing the same way (`Couldn't set @microsoft.keyvault(secreturi` from `Npgsql.NpgsqlConnectionStringBuilder`).

Root cause

`backendAppService` had its `appSettings` inlined in the resource body. Inferred deploy order:

  1. KV created (no policies)
  2. Secrets written
  3. App Service created with KV-reference appSettings — tries to resolve, fails because no access policy yet, caches the failure
  4. `keyVaultAccessPolicy` adds the policy

The App Service caches the initial KV-reference resolution failure as the literal string and won't re-resolve without an `appSettings` write. Restarts don't help.

This explains why the access policy looked correct in the portal but resolution still failed.

Fix

Split `appSettings` into a separate `Microsoft.Web/sites/config` child resource that explicitly `dependsOn` `keyVaultAccessPolicy`. Compiled ARM dependsOn graph now correctly orders:

```
KV -> secrets -> App Service -> access policy -> appSettings
```

Also:

  • Removed the misleading `accessPolicies: []` from the parent KV (policy is owned by `keyVaultAccessPolicy`)
  • Bumped `Microsoft.Web/sites` API version 2020-12-01 -> 2024-04-01

Test plan

  • `az bicep build` clean
  • Verified compiled ARM `dependsOn` graph: `appsettings` waits on `accessPolicies/add`
  • Stage CI deploy succeeds
  • Send a chat message, verify `/v1/embeddings` returns 200 in App Insights

🤖 Generated with Claude Code

Symptom: every call to OpenAI returned 401, with App Insights showing
the literal string '@Microsoft.KeyVault(SecretUri=...)' being sent as
the API key. Same root cause behind the Postgres KeyNotFoundException
on the connection string.

Cause: backendAppService had appSettings inlined in its body, so ARM
created the App Service (and tried to resolve the KV references) before
keyVaultAccessPolicy granted the identity 'get' on secrets. The App
Service caches that initial failure as the literal reference string;
no amount of subsequent restarts re-evaluates without an appSettings
write.

Fix: split the appSettings into a separate Microsoft.Web/sites/config
'appsettings' child resource that explicitly dependsOn
keyVaultAccessPolicy. The compiled ARM template now orders:
KV -> secrets -> App Service -> access policy -> appSettings.
While we're here, also drop the misleading 'accessPolicies: []' on the
parent KV (the policy is owned by the keyVaultAccessPolicy resource)
and bump backendAppService API version to 2024-04-01.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant