Skip to content

bug: reserve _provider_* policy key prefix and enforce at gateway boundaries #1982

Description

@johntmyers

Agent Diagnostic

Investigation reviewing PR #1914 and the discussion in #1914 (comment).

Skills/tools used: code search across crates/openshell-policy, crates/openshell-server/src/grpc, and crates/openshell-sandbox; git history on the original composition PR (#1037).

Findings:

  • crates/openshell-policy/src/compose.rs:15-34 defines provider_rule_name() which prefixes provider-derived rules with _provider_. The doc comment at compose.rs:41-42 declares this key namespace reserved for derived data.
  • crates/openshell-policy/src/compose.rs:62-75 (unique_provider_rule_key) collision-renames provider rules to _provider_<name>_2, _3, ... when a key already exists in the source policy. This is the mechanism by which a stale baked-in _provider_* rule shadows the live one.
  • crates/openshell-policy/src/merge.rs:400-416 and merge.rs:642-656 already filter _provider_* keys out of the endpoint-overlap fallback, confirming the prefix is treated as reserved internally.
  • crates/openshell-server/src/grpc/validation.rs:619-628 (validate_policy_safety) does not reject user-authored policy with _provider_* network policy keys. There is no enforcement at the gateway boundary for either SetSandboxPolicy, CreateSandbox, or the supervisor-originated enrichment write-back path.

Net: the prefix is reserved by convention and by the composition layer, but not by any input validation. That asymmetry produces the bug.

Description

Actual behavior: A user-authored or sandbox-enriched policy can contain network_policies keys with the _provider_* prefix. When provider profile composition runs, the composer detects the collision and re-keys the provider rule as _provider_<name>_2. The original (now stale) _provider_<name> from the user policy remains in the effective policy. Non-additive provider profile updates (rename / remove endpoint) silently no-op because the live _2 rule sits behind the stale baseline rule.

This breaks the core update flow that PR #1914 introduces: openshell provider profile update succeeds, but for any sandbox whose persisted policy already contains the prior _provider_* key, the change does not take effect.

Expected behavior: The _provider_* key namespace is reserved for derived data. Any user-supplied policy or sandbox-originated write that attempts to set keys under that namespace is rejected (or filtered) at the gateway boundary, so provider composition always produces a single authoritative _provider_<name> rule per profile.

Reproduction Steps

Path 1 — policy get --full round-trip (user-driven)

openshell provider create --name test --type github --credential FOO=bar
openshell sandbox create --provider test -- true
openshell policy get <sandbox> --full > current-policy.yaml
openshell policy set <sandbox> -f current-policy.yaml
openshell provider profile update -f updated-profile.yaml  # remove or rename an endpoint
openshell policy get <sandbox> --full
# Observed: _provider_test (stale, from user policy) AND _provider_test_2 (current)

Path 2 — supervisor enrichment write-back (no user action)

Reproducer from pimlock — happens automatically as part of normal sandbox startup:

cat > custom-policy.yaml <<END
version: 1
filesystem_policy:
  include_workdir: true
  read_only: []
  read_write: []
landlock:
  compatibility: best_effort
process:
  run_as_user: sandbox
  run_as_group: sandbox
network_policies:
  sandbox_only:
    endpoints:
      - host: sandbox.example.com
        port: 443
        protocol: rest
        access: read-only
        enforcement: enforce
    binaries: []
END

openshell provider create --name test --type github --credential FOO=bar
openshell sandbox create --policy custom-policy.yaml --provider test -- true
openshell policy get --full
# Observed: both _provider_test and _provider_test_2 present

Full script with pauses for live-state exploration: https://gist.github.com/pimlock/d10de2a4ca5611a9a9ae78fce7e0a489?permalink_comment_id=6212304#gistcomment-6212304

Root cause of path 2: the supervisor enriches the effective policy (including provider-injected rules) and syncs it back to the gateway. The gateway treats the synced policy as authoritative user policy, baking the _provider_* keys into the source. On the next provider poll, composition collision-renames the new provider rule to _provider_test_2.

Proposed Fix

  1. Validation: extend validate_policy_safety (or add a sibling validator) to reject network_policies keys matching ^_provider_ on every user-facing write path — at minimum SetSandboxPolicy and CreateSandbox. Error message should explain that the prefix is reserved for provider composition and direct users to policy get (without --full) for round-trippable output.
  2. Supervisor enrichment write-back: strip _provider_* keys before the supervisor pushes enriched policy back to the gateway. Provider-derived rules are gateway-authoritative; the sandbox should never be the source of truth for them. Belt-and-braces: the gateway's enrichment ingestion path should also drop any _provider_* keys it receives from a sandbox.
  3. No change to policy get --full: --full continues to render the effective policy including provider contributions — that's the purpose of the flag. policy get without --full already excludes _provider_* (merge.rs:416), so the documented round-trip workflow stays intact.
  4. Docs: brief note in docs/sandboxes/manage-providers.mdx about the reservation.

Beta-stage, hard reject is fine.

Environment

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:gatewayGateway server and control-plane workarea:policyPolicy engine and policy lifecycle workarea:providersstate:agent-readyApproved for agent implementationstate:pr-openedPR has been opened for this issue

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions