OA Skill

Observability Agent: read-only data gateway for logs, events, and metrics. Supports both K8s clusters and bare metal/VM servers (standalone mode). This document is served by OA at GET /skill.md.

Operating Modes

OA runs in one of two modes, auto-detected by the presence of KUBERNETES_SERVICE_HOST.

Mode	Detection	Targets	Log Source	Events	Metrics Source
K8s	`KUBERNETES_SERVICE_HOST` present	Pods (namespace/selector)	K8s container logs API	K8s Events	Pod annotation-based scrape
Standalone	`KUBERNETES_SERVICE_HOST` absent	Services (`OA_SERVICES` env)	File tail + journalctl	None	Direct URL scrape

Base

Auth: Authorization: Bearer <JWT> (required on protected API requests)
No-auth endpoints: /healthz, /livez, /readyz, /skill.md, /.well-known/skill.md

Auth/JWT

OA verifies JWTs using an HS256 shared secret.

OA_JWT_SECRET (required, HS256 shared secret, min 32 chars)

JWT rules:

Algorithm: HS256
exp claim required (recommended 5–15 min)
Missing or invalid JWT → 401
Scoped JWT missing namespace/service/capability scope → 403

The client (AI Agent) signs an HS256 JWT using OA_JWT_SECRET (env) and sends it with each request. The secret is used only in runtime memory — never expose it in logs, files, or output.

Authorization claims:

{
  "sub": "agent-01",
  "allowedNamespaces": ["prod", "monitoring"],
  "allowedServices": ["validator-*"],
  "capabilities": ["pods", "logs", "events", "metrics"],
  "admin": false
}

K8s pod discovery requires pods capability and namespace scope.
K8s namespace scopes support exact names and * wildcards; allowedNamespaces: ["*"] permits all namespaces and ns=*.
K8s selector bundles require pods capability because selector targeting performs pod discovery internally.
Bundle create/status/download enforce the bundle target scope and requested capabilities.
Standalone allowedServices entries can use * wildcards; allowedServices: ["*"] permits all configured services.
Non-admin discovery responses are redacted.
Legacy JWTs with no authorization scope claims keep full access for compatibility.

Primary Workflow (bundle-first)

Create bundle: POST /v1/bundles
Poll status: GET /v1/bundles/{bundleId} — every 1–2 s, up to 30 s until done
Download: GET /v1/bundles/{bundleId}/download → ndjson.gz
Analyze: stream-parse NDJSON, then AI analyzes

Target Discovery

K8s Mode: Pod Search

GET /v1/pods?ns=<namespace>&q=<substring>

ns: namespace (* = all namespaces; requires admin or allowedNamespaces: ["*"])
selector: label selector
q: pod name substring search

Response: namespace, name, labels, containers[], ready, phase. Admin responses also include podIP, annotations, and nodeName.

Standalone Mode: Service List

GET /v1/services

Returns registered services configured via OA_SERVICES env, filtered by JWT service scope.

Admin response example:

{
  "items": [
    { "name": "solana-validator", "logs": ["/var/log/solana/validator.log"], "journal": null, "metrics": "http://localhost:9090/metrics" },
    { "name": "rpc-node", "logs": ["/var/log/solana/rpc.log"], "journal": null, "metrics": null }
  ]
}

Bundle Request

timeWindow (relative / absolute)

OA supports two time window modes. Use only one at a time. In standalone mode, timeWindow is a journal-only selector.

Relative:

{ "timeWindow": { "sinceSeconds": 600 } }

Absolute (UTC, ISO8601Z):

{
  "timeWindow": {
    "start": "2026-02-09T00:00:00Z",
    "end": "2026-02-09T00:10:00Z"
  }
}

Rules:

Using both sinceSeconds and start/end → 400
In standalone mode, time windows apply only to journal sources; file sources use tailLines

K8s Mode: selector-based (multiple Pods)

{
  "timeWindow": { "sinceSeconds": 600 },
  "target": {
    "namespace": "*",
    "selector": "app=web,tier=backend"
  },
  "include": {
    "logs": { "enabled": true, "tailLines": 2000, "previous": true, "timestamps": true },
    "events": { "enabled": true },
    "metrics": { "enabled": true }
  },
  "limits": {
    "maxPods": 20,
    "maxTotalLogLines": 50000,
    "metricsTimeoutMs": 2000
  }
}

K8s Mode: direct Pod targeting (single/specific Pods)

{
  "timeWindow": { "sinceSeconds": 600 },
  "target": {
    "pods": [
      { "namespace": "default", "pod": "my-app-pod-0" }
    ]
  },
  "include": {
    "logs": { "enabled": true, "tailLines": 2000, "previous": true, "timestamps": true },
    "events": { "enabled": true },
    "metrics": { "enabled": true }
  }
}

selector and pods[] are mutually exclusive. Providing both → 400.

Standalone Mode: service-based

{
  "target": {
    "kind": "services",
    "services": ["solana-validator", "rpc-node"]
  },
  "include": {
    "logs": { "enabled": true, "tailLines": 2000, "includePatterns": ["ERROR"], "excludePatterns": ["healthcheck"] },
    "metrics": { "enabled": true }
  },
  "limits": {
    "maxTotalLogLines": 50000,
    "metricsTimeoutMs": 2000
  }
}

Standalone rules:

Use target.kind: "services" with a required target.services array of names registered in OA_SERVICES, or target.kind: "all" for every registered service
kind is "services" when a services array is present and no explicit kind is supplied
events is ignored in standalone requests
previous and timestamps are ignored in standalone requests
File logs are collected via tail -n <include.logs.tailLines> from paths configured per service
Journal logs are collected via journalctl; they use timeWindow when supplied, otherwise include.logs.tailLines
When logs are enabled, timeWindow is accepted only when selected standalone services include a configured journal source; file logs are never time-filtered
OA applies include/exclude filters before the final maxTotalLogLines, then globally merges matching records by parsed timestamp. Untimestamped records inherit the previous timestamp seen from the same source for ranking, or source read order when no previous source timestamp exists. Diagnostic skipped/error records are emitted outside this returned-line budget and counted as diagnosticRecords in log_summary
Clients cannot request arbitrary file paths or journal units; only registered OA_SERVICES entries are available
OA uses the current process OS permissions and does not elevate privileges

Standalone log API constraints:

Field	Applies to	Behavior
`include.logs.tailLines`	File logs, journal logs without `timeWindow`	Passed to `tail -n` for files and `journalctl -n` for journals
`timeWindow.sinceSeconds`	Journal logs only	Relative journal window; rejected when logs are enabled and selected services have no journal source
`timeWindow.start` / `timeWindow.end`	Journal logs only	Absolute journal window; both fields required together; rejected when logs are enabled and selected services have no journal source
`limits.maxTotalLogLines`	Standalone log lines	Final returned-line budget after filtering and global merge; diagnostic skipped/error records are outside this cap

K8s selector bundle note:

Selector targets list matching pods internally before collecting logs/events/metrics.
Scoped tokens therefore need both pods capability and the requested data-source capabilities for selector bundles.

Log Line Filters (includePatterns / excludePatterns)

include.logs.includePatterns: string[] keeps only lines containing at least one substring (like grep). include.logs.excludePatterns: string[] removes lines by substring match (like grep -v). Standalone applies include/exclude filters before the final maxTotalLogLines budget. includePatterns is standalone-only; excludePatterns also works in K8s mode.

Example:

{
  "include": {
    "logs": {
      "enabled": true,
      "includePatterns": ["ERROR", "panic"],
      "excludePatterns": ["GET /healthz", "healthcheck"]
    }
  }
}

NDJSON Record Types

Common Records

type	Description	Key Fields
`meta`	Bundle metadata	bundleId, createdAt, params

K8s Mode Records

type	Description	Key Fields
`log`	Container log	namespace, pod, container, ts, line, previous?, skipped?, reason?
`event`	K8s event	namespace, reason, message, ts, involvedObject
`metrics_text`	Pod metrics	namespace, pod, port, path, ts, ok/skipped/error, content

Standalone Mode Records

type	Description	Key Fields
`log`	File log	service, file, ts, line, skipped?, reason?
`log`	Journal log	service, journal, journalScope?, journalUser?, ts, line, skipped?, reason?
`log_error`	User journal error	service, journal, journalScope, journalUser, ts, reason, error
`log_summary`	Log budget/source summary	ts, lineLimited, matchedLogRecords, returnedLogRecords, diagnosticRecords, sources[]
`metrics_text`	Service metrics	service, url, ts, ok/skipped/error, content

Standalone log skip reasons:

file_not_found: log file does not exist
read_error: file read failed (permissions, etc.)
journalctl_not_found: journalctl binary not found
journal_permission_denied: journalctl reported insufficient journal permissions
journal_read_error: journalctl execution failed (permissions, etc.)

Standalone metrics status:

Status	Meaning	Fields
Success	Scrape OK	`ok: true`, `content: "# HELP ..."`
Normal skip	No metrics URL configured	`skipped: true`, `reason: "no_metrics_url"`
Timeout	Response timed out	`ok: false`, `error: "timeout after 2000ms"`
Failure	Connection failed	`ok: false`, `error: "fetch_failed: ECONNREFUSED"`

K8s Previous Logs

If a pod has not restarted, previous=true logs may not exist and K8s may return 400/404. This is normal and must not fail the bundle. OA writes a skip record in this case:

{"type":"log","namespace":"ns","pod":"p","container":"c","ts":"...","previous":true,"skipped":true,"reason":"no_previous_container"}

K8s Metrics — 3 States

Status	Meaning	Fields
Success	Scrape OK	`ok: true`, `content: "# HELP ..."`
Normal skip	No annotation (pod does not expose metrics)	`skipped: true`, `reason: "annotation_missing"`
Failure	Annotation present but connection failed (anomaly signal)	`ok: false`, `error: "timeout after 2000ms"`

Analysis Guide (for AI Agents)

Priority

Events (K8s only): OOMKilled, CrashLoopBackOff, FailedScheduling
Logs: panic, fatal, segfault, timeout, connection refused
Metrics: ok:false is an anomaly signal (service down / network issue), skipped:true is normal

Analysis Method

Group recurring errors by signature + count occurrences
Record first/last occurrence timestamps
Drill down: in K8s use narrower selector / single pod; in standalone use single service, lower tailLines, or a shorter journal timeWindow

Target Interpretation UX

K8s Mode

User Input	Action
"Analyze backend logs"	`GET /v1/pods?q=backend` → bundle all matching pods
"Only my-app pod 0"	`target.pods: [{namespace: "default", pod: "my-app-pod-0"}]`
"All cluster error logs"	`namespace: "*"`, logs only, cluster ERROR/WARN

Standalone Mode

User Input	Action
"Analyze solana validator logs"	`GET /v1/services` → `target.services: ["solana-validator"]`
"Check all service status"	`target.kind: "all"`
"Only rpc-node metrics"	`target.services: ["rpc-node"]`, logs disabled, metrics only

Defaults

K8s Mode

Field	Default
sinceSeconds	600 (10 min)
tailLines	2000
namespace	`*` (all)
containers	all
previous	true
timestamps	true (forced true in absolute time mode)

Standalone Mode

Field	Default
timeWindow	none (journal sources use `tailLines` unless requested)
tailLines	2000

Limits

Common

Field	Value
maxTotalLogLines	50,000
sinceSecondsMax	3,600 (1 hour)
metricsTimeoutMs	2,000
bundle TTL	60 min auto-delete

K8s Mode

Field	Value
maxPods	20
maxMetricsPods	20

Standalone Configuration

Standalone mode defines services via the OA_SERVICES env:

export OA_JWT_SECRET="replace-with-at-least-32-random-chars"
export OA_SERVICES='[
  {"name":"solana-validator","logs":["/var/log/solana/validator.log"],"metrics":"http://localhost:9090/metrics"},
  {"name":"rpc-node","logs":["/var/log/solana/rpc.log"]}
]'
node dist/index.js

Service definition fields:

Field	Required	Description
`name`	Yes	Unique service identifier
`logs`	No	Array of log file paths to collect
`journal`	No	systemd unit name (journalctl log collection)
`journalScope`	No	`system` (default) or `user`
`journalUser`	No	Username or UID required when `journalScope` is `user`
`metrics`	No	Prometheus metrics URL

Standalone permission model:

File and journal readability depends on the OS permissions of the OA process.
OA does not create users, join system groups, run sudo, or bypass systemd journal permissions.
Full system and user journal visibility is possible only when the existing process account can already read those journals.
User journal permission and journalUser resolution failures are emitted as log_error records instead of empty log output.
Metrics URLs are operator-provided trusted configuration and may point at localhost or private networks for compatibility.

Standalone time windows:

File log requests read the latest configured line budget with tail; sinceSeconds and absolute windows do not seek or filter file contents.
Journal requests use either timeWindow (--since/--until) or the configured line budget, not both.
timeWindow on a standalone request is rejected when logs are enabled and the selected services have no configured journal source.

Notes

Always prefer the bundle API (raw endpoints are for small-scale debugging)
Use multiple smaller bundles to drill down rather than one large time range
metrics_text with ok:false is an anomaly signal by itself
skipped:true is normal (the service/pod does not expose metrics)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OA Skill

Operating Modes

Base

Auth/JWT

Primary Workflow (bundle-first)

Target Discovery

K8s Mode: Pod Search

Standalone Mode: Service List

Bundle Request

timeWindow (relative / absolute)

K8s Mode: selector-based (multiple Pods)

K8s Mode: direct Pod targeting (single/specific Pods)

Standalone Mode: service-based

Log Line Filters (includePatterns / excludePatterns)

NDJSON Record Types

Common Records

K8s Mode Records

Standalone Mode Records

K8s Previous Logs

K8s Metrics — 3 States

Analysis Guide (for AI Agents)

Priority

Analysis Method

Target Interpretation UX

K8s Mode

Standalone Mode

Defaults

K8s Mode

Standalone Mode

Limits

Common

K8s Mode

Standalone Configuration

Notes

FilesExpand file tree

skill.md

Latest commit

History

skill.md

File metadata and controls

OA Skill

Operating Modes

Base

Auth/JWT

Primary Workflow (bundle-first)

Target Discovery

K8s Mode: Pod Search

Standalone Mode: Service List

Bundle Request

timeWindow (relative / absolute)

K8s Mode: selector-based (multiple Pods)

K8s Mode: direct Pod targeting (single/specific Pods)

Standalone Mode: service-based

Log Line Filters (includePatterns / excludePatterns)

NDJSON Record Types

Common Records

K8s Mode Records

Standalone Mode Records

K8s Previous Logs

K8s Metrics — 3 States

Analysis Guide (for AI Agents)

Priority

Analysis Method

Target Interpretation UX

K8s Mode

Standalone Mode

Defaults

K8s Mode

Standalone Mode

Limits

Common

K8s Mode

Standalone Configuration

Notes