Skip to content

feat(claude-code): OTLP governance receiver + Loki store#404

Merged
Shion1305 merged 3 commits into
mainfrom
feat/claude-code-otlp-governance
Jun 1, 2026
Merged

feat(claude-code): OTLP governance receiver + Loki store#404
Shion1305 merged 3 commits into
mainfrom
feat/claude-code-otlp-governance

Conversation

@Shion1305
Copy link
Copy Markdown
Owner

What

Adds a Claude Code OTLP receiver for governance: an OpenTelemetry Collector
published on the internal Gateway at cc.i.shion1305.com, persisting Claude Code
telemetry to a new single-binary Loki store.

Claude Code ──OTLP/HTTP + Bearer──► cc.i.shion1305.com (internal Gateway, WireGuard-only)
                                       └► OTel Collector (bearertokenauth)
                                            ├─ logs    → Loki (90d, longhorn-hdd 100Gi)
                                            └─ metrics → Prometheus (ServiceMonitor)
                                                          Grafana ← Loki datasource

Changes

  • apps/claude-code-app.yaml + claude-code/ — opentelemetry-collector chart
    (contrib image; core lacks bearertokenauth). OTLP/HTTP only (HTTPRoute is
    HTTP-only; gRPC would need a GRPCRoute). logs → Loki, metrics → Prometheus.
    HTTPRoute + ReferenceGrant on the internal Gateway, ESO SecretStore/ExternalSecret
    for the bearer token, ServiceMonitor, README.
  • apps/loki-app.yaml + loki/values.yaml — Loki 7.0.0, SingleBinary,
    filesystem, 90-day retention on longhorn-hdd (100Gi).
  • grafana/datasource-loki.yaml — cross-namespace Loki datasource (stable uid: loki).
  • vault/scripts/setup-eso-policies.sheso-claude-code policy + role
    (dedicated claude-code/ KV mount).
  • aqua.yaml — pin hashicorp/vault CLI to v2.0.0 (separate, unrelated change).

Networking

No custom NetworkPolicy required — the cluster-wide allow-from-infra policies
already permit Envoy Gateway → collector and Prometheus scrapes, and the Kyverno
cross-namespace isolation generator allows same-namespace collector ↔ Loki.

What it captures

Claude Code's OTel event stream (user_prompt, tool_result, tool_decision,
api_request, api_error) + metrics (token/cost/session/active-time). Not full
assistant responses. Prompt content only when the client sets OTEL_LOG_USER_PROMPTS=1.

Manual steps before it goes Healthy

  1. Vault (policy/role land via the script; mount + token are out-of-band):
    vault secrets enable -path=claude-code kv-v2
    vault kv put claude-code/otlp token="$(openssl rand -hex 32)"
    The collector Pod stays pending until the claude-code-otlp-token Secret syncs.
  2. Confirm cc.i.shion1305.com resolves to the internal Gateway (10.130.5.21;
    the *.i.shion1305.com wildcard + cert should cover it).

Validation

scripts/render-validate.sh passes for all apps including the new claude-code
and loki.

Shion1305 added 3 commits June 2, 2026 04:39
Deploy an OpenTelemetry Collector that receives Claude Code telemetry over
OTLP at cc.i.shion1305.com (internal Gateway, WireGuard-only) and persists
conversation/usage data for governance. The endpoint is protected by a bearer
token validated by the collector's bearertokenauth extension; the token is
sourced from Vault (dedicated claude-code/ KV mount) via ESO.

- Collector (opentelemetry-collector chart, contrib image): OTLP/HTTP only,
  logs -> Loki, metrics -> Prometheus via ServiceMonitor.
- Loki (single-binary, filesystem): 90-day retention on longhorn-hdd (100Gi),
  queryable in Grafana through a new Loki datasource.
- eso-claude-code policy/role added to setup-eso-policies.sh.

No custom NetworkPolicy needed: existing allow-from-infra + the Kyverno
cross-namespace isolation generator already cover Gateway->collector,
Prometheus scrape, and same-namespace collector<->Loki.

Signed-off-by: Shion Ichikawa <shion1305@gmail.com>
Roll the vault CLI tool back from v2.0.1 to v2.0.0.

Signed-off-by: Shion Ichikawa <shion1305@gmail.com>
Cross-namespace ingress is default-denied (Kyverno generator); egress is open.
The only cluster-wide ingress allows are node/apiserver, envoy-gateway-system,
and the grafana namespace. Spell out the "who needs to reach this app on
ingress?" rule so new apps don't silently rely on, or miss, a cross-namespace
allow.

Signed-off-by: Shion Ichikawa <shion1305@gmail.com>
@Shion1305 Shion1305 merged commit 5cd28af into main Jun 1, 2026
1 check passed
@Shion1305 Shion1305 deleted the feat/claude-code-otlp-governance branch June 1, 2026 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant