Skip to content

feat: add instrument-app skill for orq.ai observability#12

Merged
Baukebrenninkmeijer merged 10 commits into
mainfrom
feat/RES-545-instrument-app-skill
Apr 7, 2026
Merged

feat: add instrument-app skill for orq.ai observability#12
Baukebrenninkmeijer merged 10 commits into
mainfrom
feat/RES-545-instrument-app-skill

Conversation

@arianpasquali
Copy link
Copy Markdown
Collaborator

@arianpasquali arianpasquali commented Mar 24, 2026

Summary

  • Adds new setup-observability skill that guides users through instrumenting LLM applications with orq.ai tracing
  • Covers both integration modes: AI Router (proxy, zero-code traces) and Observability (OpenTelemetry/OpenInference)
  • 5-phase workflow: assess current state → choose mode → implement → verify baseline → enrich traces
  • Includes 3 resource files: framework integrations table (28 frameworks), @traced decorator guide, baseline checklist
  • Updates README with the new skill in the skills table and a mention of orq-ai/claude-plugins

Closes RES-545

Test plan

  • Install as plugin: claude --plugin-dir . and confirm setup-observability skill appears
  • Test trigger: ask "Help me add orq.ai tracing to my app" — should activate the skill
  • Verify resource file links resolve correctly from SKILL.md
  • Validate @traced code examples against latest orq.ai Python SDK
  • Verify all framework docs links are correct

New skill that guides users through instrumenting LLM applications with
orq.ai tracing — covering AI Router proxy, OpenTelemetry integrations,
the @Traced decorator, and trace enrichment with metadata.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@linear
Copy link
Copy Markdown

linear Bot commented Mar 24, 2026

RES-545 Create instrument-app skill for orq.ai observability

Goal

Create an instrument-app skill that guides users through adding orq.ai observability to their LLM applications. This fills a gap in our skill ecosystem: existing skills cover build → evaluate → optimize, but nothing helps users set up tracing in the first place. Traces are the prerequisite for analyze-trace-failures, so this skill completes the pipeline.

Inspired by Langfuse's instrumentation skill, adapted for orq.ai's two integration modes.

What Was Done

Created 4 files:

  • skills/instrument-app/SKILL.md — Main skill with a 5-phase workflow:
    1. Assess current state (detect framework, SDK, existing instrumentation)
    2. Choose integration mode (AI Router vs Observability vs both)
    3. Implement integration (framework-specific setup with code examples)
    4. Verify baseline (traces appearing, model/tokens captured, span hierarchy)
    5. Enrich traces (session_id, user_id, tags, @traced for custom spans)
  • skills/instrument-app/resources/framework-integrations.md — Table of all 28 supported frameworks with integration modes (AI Router / Observability / Control Tower), decision guide, and quick setup patterns for Python and Node.js
  • skills/instrument-app/resources/traced-decorator-guide.md — Full @traced decorator reference: 6 span types (agent, llm, tool, retrieval, embedding, function), parameters, sync/async examples, and common mistakes
  • skills/instrument-app/resources/baseline-checklist.md — Verification checklist (model name, tokens, trace names, span hierarchy, sensitive data) plus a "what to add next" inference table based on code patterns

Updated README.md with instrument-app at the top of the skills table.

Key Design Decisions

  • Covers both integration modes: AI Router (proxy, zero-code traces) and Observability (OpenTelemetry/OpenInference for framework-level spans)
  • Code inference approach (like Langfuse): detect framework from imports, infer session/user patterns from code rather than asking the user about everything
  • Framework details in resource file: 28 frameworks is too much for the main SKILL.md — kept it scannable with details in resources
  • Anti-patterns table: common mistakes like wrong import order (Add system-prompt-optimizer and synthetic-dataset-v2 skills #1 cause of "traces not appearing"), generic trace names, logging PII

What's Left

  • Test as plugin: claude --plugin-dir . and confirm skill triggers
  • Validate @traced code examples against latest orq.ai Python SDK
  • Verify all framework docs links are correct
  • Consider adding a Node.js/TypeScript equivalent for @traced once SDK supports it

@arianpasquali arianpasquali marked this pull request as draft March 24, 2026 11:05
The claude-plugins repo is still a work in progress, deferring the
mention until it's ready.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rename skill to better reflect what it does. Update README skills table
and add "Instrument an Existing App" workflow example.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@arianpasquali arianpasquali marked this pull request as ready for review March 24, 2026 16:54
@currentlycodinng
Copy link
Copy Markdown
Collaborator

currentlycodinng commented Mar 25, 2026

Code Review Findings

Bugs

  1. Wrong OpenTelemetry import path (SKILL.md:159, resources/framework-integrations.md:95)
    Both files use:

    from opentelemetry.instrumentation.openai import OpenAIInstrumentor

    Should be:

    from openinference.instrumentation.openai import OpenAIInstrumentor

    This would cause an ImportError for anyone copying the example.

  2. Missing smoke test (tests/skills.md)
    All other skills have an entry in the smoke test file. setup-observability is missing both a test section and a Critical Files entry.

- Fix import: opentelemetry.instrumentation.openai → openinference.instrumentation.openai
- Rename heading from "Instrument App" to "Setup Observability"

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@currentlycodinng
Copy link
Copy Markdown
Collaborator

All good now, can be merged.

@Baukebrenninkmeijer
Copy link
Copy Markdown
Collaborator

PR Review Summary

PR: feat: add instrument-app skill for orq.ai observability
Author: arianpasquali | Files: 6 changed (+569 −5)


Critical Issues (3)

  1. Wrong import path for @traced decoratortraced-decorator-guide.md
    All code examples use from orq_ai_sdk.tracing import traced, but the official orq.ai docs show the correct path as from orq_ai_sdk.traced import traced. Users will get an ImportError.

  2. user_id as a top-level @traced parameter may not existSKILL.md Phase 5 + traced-decorator-guide.md
    The examples show @traced(name="chat", user_id=current_user.id) but the official SDK docs don't list user_id as a direct parameter. If it needs to go in attributes={"user_id": ...}, all examples are wrong.

  3. orq_traced_input() / orq_traced_output() — verify these existtraced-decorator-guide.md + baseline-checklist.md
    These helper functions are used in multiple examples but aren't referenced in the official docs. If they don't exist in the SDK, the code examples will fail.

Important Issues (5)

  1. PR title / skill name mismatch — PR title says "instrument-app" but the skill is named setup-observability everywhere in the code. Clarify which is correct.

  2. Import order fragility with auto-formattersSKILL.md Phase 3
    The instrument()import OpenAI ordering will be silently broken by isort/ruff. The skill warns about this but should recommend # noqa: E402 or an isort:skip comment to prevent automated reordering.

  3. Silent overwrite of existing OTEL configSKILL.md Phase 3
    Setting OTEL_EXPORTER_OTLP_ENDPOINT via os.environ[...] silently replaces any existing OTEL configuration (Datadog, Jaeger, etc.). Should warn users to check for existing config first.

  4. orq* wildcard in allowed-tools frontmatterSKILL.md line 4
    Verify the skill runner supports glob patterns here. If not, tools won't be available and the skill will silently lack orq capabilities.

  5. Hardcoded "my-app" service nameSKILL.md OTEL config example
    Users copying this will all share the same service name, making traces indistinguishable. Add a comment emphasizing this must be changed.

Suggestions (6)

  1. LangChain example uses model="gpt-4o" without the provider/model prefix — contradicts SKILL.md's own instruction to use openai/gpt-4o format.

  2. "10x more metadata" claim in Constraints section is unsubstantiated — soften to "significantly more metadata" or define what's counted.

  3. Three dashboard links all point to https://my.orq.ai/ — use deep-link paths or consolidate into one link.

  4. Node.js anti-pattern ("No graceful shutdown") is orphaned — no Node.js OTEL examples exist in the skill. Either add Node.js setup or remove this row.

  5. @ in anchor URL (#custom-tracing-using-the-@traced-decorator) may not resolve correctly in all Markdown renderers.

  6. PII capture defaults undocumented — The skill shows capture_input=False for sensitive functions but never states that the default is True, meaning users who don't set these flags will silently send all function inputs to orq.ai.

Strengths

  • Well-structured 5-phase workflow (Assess → Choose → Implement → Verify → Enrich) with a verification gate before enrichment
  • Excellent Constraints section with rationale ("Why these constraints")
  • Framework integrations table covers 28 frameworks with accurate AI Router/Observability columns
  • Baseline checklist with "Auto with AI Router?" columns saves users from unnecessary manual work
  • Good troubleshooting section covering the most common failure modes
  • Cross-file consistency is generally strong

Recommended Action

  1. Must fix before merge: items 1–3 (broken import paths / nonexistent APIs)
  2. Should fix: items 4–8 (naming, safety, silent failures)
  3. Nice to have: items 9–14 (polish)

🤖 Generated with Claude Code

@arianpasquali arianpasquali marked this pull request as draft March 26, 2026 08:19
arianpasquali and others added 2 commits March 26, 2026 13:14
- Fix @Traced import path: orq_ai_sdk.tracing → orq_ai_sdk.traced (verified against official docs)
- Fix LangChain model format: gpt-4o → openai/gpt-4o (provider/model format)
- Replace hardcoded service.name=my-app with <your-app-name> placeholder
- Soften unsubstantiated "10x more metadata" claim
- Add warning about overwriting existing OTEL config (Datadog, Jaeger, etc.)
- Add auto-formatter guidance (isort/noqa) for critical import ordering

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@arianpasquali arianpasquali marked this pull request as ready for review March 26, 2026 14:42
@Baukebrenninkmeijer
Copy link
Copy Markdown
Collaborator

Review Finding: get_evaluator_llm / get_evaluator_python MCP tools don't exist

These tools are referenced in 5 files but don't exist on orq-remote-mcp (which only has create_llm_eval and create_python_eval):

  • skills/build-evaluator/SKILL.md
  • skills/build-agent/resources/api-reference.md
  • skills/generate-synthetic-dataset/resources/api-reference.md
  • skills/run-experiment/resources/api-reference.md
  • tests/mcp-tools.md (test cases 18-19 will fail)

The closest equivalent is evaluator_get on orq-mcp-global, which retrieves any evaluator by ID. Consider replacing these references with evaluator_get or an HTTP API fallback (GET /v2/evaluators/<ID>).

Copy link
Copy Markdown
Collaborator

@Baukebrenninkmeijer Baukebrenninkmeijer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some tools don't have the right name, but looks good otherwise imo.

@Baukebrenninkmeijer
Copy link
Copy Markdown
Collaborator

Review: Context Gaps for AI Coding Assistants

Evaluated whether the skill files provide enough context for an AI coding assistant to generate working code without hallucinating. The AI Router happy path is solid, but several gaps would cause failures:


1. @traced SDK initialization is missing (HIGH)

The skill shows from orq_ai_sdk.traced import traced and jumps straight to decorating functions. But @traced requires the orq SDK client to be initialized first (Orq(api_key=...)). Without this, an AI assistant will generate code where @traced silently does nothing (no API key → no trace export).

Suggested fix — show the full init-to-traced flow in traced-decorator-guide.md:

from orq_ai_sdk import Orq
from orq_ai_sdk.traced import traced
import os

client = Orq(api_key=os.getenv("ORQ_API_KEY"))

@traced(name="my-operation", type="function")
def my_function():
    ...

2. How to enrich AI Router traces with metadata is not covered (HIGH)

Phase 5 says "add session_id, user_id, tags" but only shows how via @traced attributes and raw OTEL span attributes. For users on the AI Router path (the recommended default), there's no explanation of how to attach custom metadata to the automatic traces. The skill should either show the mechanism (extra headers? context attributes?) or explicitly acknowledge the gap.

3. No Node.js/TypeScript support for @traced or OTEL (MEDIUM)

The skill claims to support "Python / Node.js / both" (SKILL.md:93), and the AI Router examples cover both languages. But:

  • @traced is Python-only — no Node.js equivalent exists in the docs
  • The OTEL setup example is Python-only
  • No npm install commands for instrumentor packages

An AI assistant asked to "add tracing to my Express app" would hallucinate a Node.js @traced equivalent. Should either add Node.js OTEL examples or explicitly scope @traced / Observability mode as Python-only.

4. No pip install / npm install commands (MEDIUM)

The skill says "install the framework's OpenInference instrumentor package" but never gives package names. An assistant would have to guess pip install openinference-instrumentation-openai. Should include at least the common ones or a pattern like pip install openinference-instrumentation-{framework}.

5. import os missing from framework-integrations.md snippets (LOW)

Python snippets at lines 46-54 and 66-75 use os.getenv() without import os. Minor but would cause NameError if copied verbatim. (SKILL.md correctly includes it.)

6. "Control Tower" column in framework table is unexplained (LOW)

framework-integrations.md has a "Control Tower" column (yes for LangGraph, OpenAI Agents, Vercel AI SDK) but it's never defined anywhere in the skill. Readers/assistants won't know what it means.


TL;DR: AI Router setup path works well. The main gaps are around @traced initialization (silent failure risk), trace enrichment for AI Router users, and Node.js coverage beyond AI Router.

@Baukebrenninkmeijer
Copy link
Copy Markdown
Collaborator

Review Finding: Node.js/TypeScript coverage gap in setup-observability skill

The skill is heavily Python-focused. Node.js is fully supported in the orq.ai docs but the skill doesn't give an AI assistant enough context to generate correct Node.js instrumentation. Specific gaps:

1. No Node.js OTEL setup example

The skill only shows Python for Observability mode. The docs have a complete Node.js equivalent:

import { NodeSDK } from '@opentelemetry/sdk-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { OpenAIInstrumentation } from '@opentelemetry/instrumentation-openai';

const sdk = new NodeSDK({
  traceExporter: new OTLPTraceExporter(),
  instrumentations: [new OpenAIInstrumentation()],
});
sdk.start();

Required packages: @opentelemetry/sdk-node, @opentelemetry/exporter-trace-otlp-http, @opentelemetry/api, plus framework-specific instrumentors.

2. @traced is Python-only — not called out

The decorator guide (traced-decorator-guide.md) never states it's Python-only. An AI assistant will try to generate a Node.js @traced equivalent that doesn't exist, leading to hallucinated code.

3. Node.js instrumentor packages differ from Python

  • Python: openinference-instrumentation-openai (OpenInference ecosystem)
  • Node.js: @opentelemetry/instrumentation-openai (native OpenTelemetry ecosystem)

The skill only mentions OpenInference packages, so an assistant will use the wrong package for Node.js.

4. Node.js requires explicit shutdown

process.on('SIGTERM', () => {
  sdk.shutdown().then(() => process.exit(0));
});

The anti-patterns table mentions this (line 241) but the setup workflow doesn't include it as a step.

Suggestion

Either add a Node.js section to framework-integrations.md and SKILL.md covering the differences, or explicitly scope the Observability code examples as Python-only and link to the docs for Node.js setup.

… context

- Replace non-existent `get_evaluator_llm`/`get_evaluator_python` with `evaluator_get` across 4 skills
- Add SDK init prerequisite to @Traced guide (silent failure without Orq client)
- Document capture_input/capture_output defaults as True (PII risk)
- Add missing `import os` to framework-integrations code snippets
- Explain Control Tower column in framework integrations table
- Scope @Traced and OTEL examples as Python-only, add Node.js pointers

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Baukebrenninkmeijer
Copy link
Copy Markdown
Collaborator

Re-review: Remaining Issues (after latest commits)

Most feedback from earlier comments has been addressed — @traced is now Python-scoped, SDK init prerequisite added, import os fixed, Control Tower explained, Node.js callouts added. Nice work.

Three items still open:

1. tests/mcp-tools.md:34-35get_evaluator_llm / get_evaluator_python still referenced

These tools don't exist on orq-remote-mcp yet (only evaluator_get on orq-mcp-global). The test cases:

18. `get_evaluator_llm(key=orq-skills-test-llm-eval)` → verify returns prompt and model
19. `get_evaluator_python(key=orq-skills-test-py-eval)` → verify returns code

Note: There are new tools on staging (get_python_eval / get_llm_eval) that should cover this functionality. Please verify these exist on staging and update the tool names to match whatever lands in production.

2. tests/skills.md:92-157compare-agents test scenarios for non-existent skill

5 test scenarios + Critical Files references (lines 153-156) for skills/compare-agents/ which doesn't exist in the repo. These should ship with the PR that adds that skill, not this one.

3. my.orq.ai vs api.orq.ai in curl examples (pre-existing, low priority)

Several api-reference files use https://my.orq.ai/v2/... as the API base URL in curl examples (e.g., build-agent/resources/api-reference.md). The actual API base per docs is https://api.orq.ai/v2/.... Pre-existing — not introduced by this PR — but worth a cleanup pass.

arianpasquali and others added 3 commits April 7, 2026 11:13
- Replace non-existent get_evaluator_llm/get_evaluator_python with evaluator_get in mcp-tools tests
- Remove compare-agents test scenarios (should ship with compare-agents PR, not this one)
- Remove compare-agents from Critical Files list

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Baukebrenninkmeijer Baukebrenninkmeijer merged commit 815185c into main Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants