feat: add instrument-app skill for orq.ai observability by arianpasquali · Pull Request #12 · orq-ai/assistant-plugins

arianpasquali · 2026-03-24T11:03:15Z

Summary

Adds new setup-observability skill that guides users through instrumenting LLM applications with orq.ai tracing
Covers both integration modes: AI Router (proxy, zero-code traces) and Observability (OpenTelemetry/OpenInference)
5-phase workflow: assess current state → choose mode → implement → verify baseline → enrich traces
Includes 3 resource files: framework integrations table (28 frameworks), @traced decorator guide, baseline checklist
Updates README with the new skill in the skills table and a mention of orq-ai/claude-plugins

Closes RES-545

Test plan

Install as plugin: claude --plugin-dir . and confirm setup-observability skill appears
Test trigger: ask "Help me add orq.ai tracing to my app" — should activate the skill
Verify resource file links resolve correctly from SKILL.md
Validate @traced code examples against latest orq.ai Python SDK
Verify all framework docs links are correct

@Traced

New skill that guides users through instrumenting LLM applications with orq.ai tracing — covering AI Router proxy, OpenTelemetry integrations, the @Traced decorator, and trace enrichment with metadata. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

linear · 2026-03-24T11:03:20Z

RES-545 Create instrument-app skill for orq.ai observability

Goal

Create an instrument-app skill that guides users through adding orq.ai observability to their LLM applications. This fills a gap in our skill ecosystem: existing skills cover build → evaluate → optimize, but nothing helps users set up tracing in the first place. Traces are the prerequisite for analyze-trace-failures, so this skill completes the pipeline.

Inspired by Langfuse's instrumentation skill, adapted for orq.ai's two integration modes.

What Was Done

Created 4 files:

skills/instrument-app/SKILL.md — Main skill with a 5-phase workflow:
1. Assess current state (detect framework, SDK, existing instrumentation)
2. Choose integration mode (AI Router vs Observability vs both)
3. Implement integration (framework-specific setup with code examples)
4. Verify baseline (traces appearing, model/tokens captured, span hierarchy)
5. Enrich traces (session_id, user_id, tags, @traced for custom spans)
skills/instrument-app/resources/framework-integrations.md — Table of all 28 supported frameworks with integration modes (AI Router / Observability / Control Tower), decision guide, and quick setup patterns for Python and Node.js
skills/instrument-app/resources/traced-decorator-guide.md — Full @traced decorator reference: 6 span types (agent, llm, tool, retrieval, embedding, function), parameters, sync/async examples, and common mistakes
skills/instrument-app/resources/baseline-checklist.md — Verification checklist (model name, tokens, trace names, span hierarchy, sensitive data) plus a "what to add next" inference table based on code patterns

Updated README.md with instrument-app at the top of the skills table.

Key Design Decisions

Covers both integration modes: AI Router (proxy, zero-code traces) and Observability (OpenTelemetry/OpenInference for framework-level spans)
Code inference approach (like Langfuse): detect framework from imports, infer session/user patterns from code rather than asking the user about everything
Framework details in resource file: 28 frameworks is too much for the main SKILL.md — kept it scannable with details in resources
Anti-patterns table: common mistakes like wrong import order (Add system-prompt-optimizer and synthetic-dataset-v2 skills #1 cause of "traces not appearing"), generic trace names, logging PII

What's Left

Test as plugin: claude --plugin-dir . and confirm skill triggers
Validate @traced code examples against latest orq.ai Python SDK
Verify all framework docs links are correct
Consider adding a Node.js/TypeScript equivalent for @traced once SDK supports it

The claude-plugins repo is still a work in progress, deferring the mention until it's ready. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Rename skill to better reflect what it does. Update README skills table and add "Instrument an Existing App" workflow example. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

currentlycodinng · 2026-03-25T11:21:11Z

Code Review Findings

Bugs

Wrong OpenTelemetry import path (SKILL.md:159, resources/framework-integrations.md:95)
Both files use:
```
from opentelemetry.instrumentation.openai import OpenAIInstrumentor
```
Should be:
```
from openinference.instrumentation.openai import OpenAIInstrumentor
```
This would cause an ImportError for anyone copying the example.
Missing smoke test (tests/skills.md)
All other skills have an entry in the smoke test file. setup-observability is missing both a test section and a Critical Files entry.

- Fix import: opentelemetry.instrumentation.openai → openinference.instrumentation.openai - Rename heading from "Instrument App" to "Setup Observability" Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

currentlycodinng · 2026-03-25T11:24:08Z

All good now, can be merged.

Baukebrenninkmeijer · 2026-03-25T21:22:42Z

PR Review Summary

PR: feat: add instrument-app skill for orq.ai observability
Author: arianpasquali | Files: 6 changed (+569 −5)

Critical Issues (3)

Wrong import path for @traced decorator — traced-decorator-guide.md
All code examples use from orq_ai_sdk.tracing import traced, but the official orq.ai docs show the correct path as from orq_ai_sdk.traced import traced. Users will get an ImportError.
user_id as a top-level @traced parameter may not exist — SKILL.md Phase 5 + traced-decorator-guide.md
The examples show @traced(name="chat", user_id=current_user.id) but the official SDK docs don't list user_id as a direct parameter. If it needs to go in attributes={"user_id": ...}, all examples are wrong.
orq_traced_input() / orq_traced_output() — verify these exist — traced-decorator-guide.md + baseline-checklist.md
These helper functions are used in multiple examples but aren't referenced in the official docs. If they don't exist in the SDK, the code examples will fail.

Important Issues (5)

PR title / skill name mismatch — PR title says "instrument-app" but the skill is named setup-observability everywhere in the code. Clarify which is correct.
Import order fragility with auto-formatters — SKILL.md Phase 3
The instrument() → import OpenAI ordering will be silently broken by isort/ruff. The skill warns about this but should recommend # noqa: E402 or an isort:skip comment to prevent automated reordering.
Silent overwrite of existing OTEL config — SKILL.md Phase 3
Setting OTEL_EXPORTER_OTLP_ENDPOINT via os.environ[...] silently replaces any existing OTEL configuration (Datadog, Jaeger, etc.). Should warn users to check for existing config first.
orq* wildcard in allowed-tools frontmatter — SKILL.md line 4
Verify the skill runner supports glob patterns here. If not, tools won't be available and the skill will silently lack orq capabilities.
Hardcoded "my-app" service name — SKILL.md OTEL config example
Users copying this will all share the same service name, making traces indistinguishable. Add a comment emphasizing this must be changed.

Suggestions (6)

LangChain example uses model="gpt-4o" without the provider/model prefix — contradicts SKILL.md's own instruction to use openai/gpt-4o format.
"10x more metadata" claim in Constraints section is unsubstantiated — soften to "significantly more metadata" or define what's counted.
Three dashboard links all point to https://my.orq.ai/ — use deep-link paths or consolidate into one link.
Node.js anti-pattern ("No graceful shutdown") is orphaned — no Node.js OTEL examples exist in the skill. Either add Node.js setup or remove this row.
@ in anchor URL (#custom-tracing-using-the-@traced-decorator) may not resolve correctly in all Markdown renderers.
PII capture defaults undocumented — The skill shows capture_input=False for sensitive functions but never states that the default is True, meaning users who don't set these flags will silently send all function inputs to orq.ai.

Strengths

Well-structured 5-phase workflow (Assess → Choose → Implement → Verify → Enrich) with a verification gate before enrichment
Excellent Constraints section with rationale ("Why these constraints")
Framework integrations table covers 28 frameworks with accurate AI Router/Observability columns
Baseline checklist with "Auto with AI Router?" columns saves users from unnecessary manual work
Good troubleshooting section covering the most common failure modes
Cross-file consistency is generally strong

Recommended Action

Must fix before merge: items 1–3 (broken import paths / nonexistent APIs)
Should fix: items 4–8 (naming, safety, silent failures)
Nice to have: items 9–14 (polish)

🤖 Generated with Claude Code

@Traced

- Fix @Traced import path: orq_ai_sdk.tracing → orq_ai_sdk.traced (verified against official docs) - Fix LangChain model format: gpt-4o → openai/gpt-4o (provider/model format) - Replace hardcoded service.name=my-app with <your-app-name> placeholder - Soften unsubstantiated "10x more metadata" claim - Add warning about overwriting existing OTEL config (Datadog, Jaeger, etc.) - Add auto-formatter guidance (isort/noqa) for critical import ordering Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Baukebrenninkmeijer · 2026-03-27T12:58:01Z

Review Finding: `get_evaluator_llm` / `get_evaluator_python` MCP tools don't exist

These tools are referenced in 5 files but don't exist on orq-remote-mcp (which only has create_llm_eval and create_python_eval):

skills/build-evaluator/SKILL.md
skills/build-agent/resources/api-reference.md
skills/generate-synthetic-dataset/resources/api-reference.md
skills/run-experiment/resources/api-reference.md
tests/mcp-tools.md (test cases 18-19 will fail)

The closest equivalent is evaluator_get on orq-mcp-global, which retrieves any evaluator by ID. Consider replacing these references with evaluator_get or an HTTP API fallback (GET /v2/evaluators/<ID>).

Baukebrenninkmeijer

Some tools don't have the right name, but looks good otherwise imo.

Baukebrenninkmeijer · 2026-03-30T13:30:52Z

Review: Context Gaps for AI Coding Assistants

Evaluated whether the skill files provide enough context for an AI coding assistant to generate working code without hallucinating. The AI Router happy path is solid, but several gaps would cause failures:

1. `@traced` SDK initialization is missing (HIGH)

The skill shows from orq_ai_sdk.traced import traced and jumps straight to decorating functions. But @traced requires the orq SDK client to be initialized first (Orq(api_key=...)). Without this, an AI assistant will generate code where @traced silently does nothing (no API key → no trace export).

Suggested fix — show the full init-to-traced flow in traced-decorator-guide.md:

from orq_ai_sdk import Orq
from orq_ai_sdk.traced import traced
import os

client = Orq(api_key=os.getenv("ORQ_API_KEY"))

@traced(name="my-operation", type="function")
def my_function():
    ...

2. How to enrich AI Router traces with metadata is not covered (HIGH)

Phase 5 says "add session_id, user_id, tags" but only shows how via @traced attributes and raw OTEL span attributes. For users on the AI Router path (the recommended default), there's no explanation of how to attach custom metadata to the automatic traces. The skill should either show the mechanism (extra headers? context attributes?) or explicitly acknowledge the gap.

3. No Node.js/TypeScript support for `@traced` or OTEL (MEDIUM)

The skill claims to support "Python / Node.js / both" (SKILL.md:93), and the AI Router examples cover both languages. But:

@traced is Python-only — no Node.js equivalent exists in the docs
The OTEL setup example is Python-only
No npm install commands for instrumentor packages

An AI assistant asked to "add tracing to my Express app" would hallucinate a Node.js @traced equivalent. Should either add Node.js OTEL examples or explicitly scope @traced / Observability mode as Python-only.

4. No `pip install` / `npm install` commands (MEDIUM)

The skill says "install the framework's OpenInference instrumentor package" but never gives package names. An assistant would have to guess pip install openinference-instrumentation-openai. Should include at least the common ones or a pattern like pip install openinference-instrumentation-{framework}.

5. `import os` missing from `framework-integrations.md` snippets (LOW)

Python snippets at lines 46-54 and 66-75 use os.getenv() without import os. Minor but would cause NameError if copied verbatim. (SKILL.md correctly includes it.)

6. "Control Tower" column in framework table is unexplained (LOW)

framework-integrations.md has a "Control Tower" column (yes for LangGraph, OpenAI Agents, Vercel AI SDK) but it's never defined anywhere in the skill. Readers/assistants won't know what it means.

TL;DR: AI Router setup path works well. The main gaps are around @traced initialization (silent failure risk), trace enrichment for AI Router users, and Node.js coverage beyond AI Router.

Baukebrenninkmeijer · 2026-03-30T14:20:32Z

Review Finding: Node.js/TypeScript coverage gap in `setup-observability` skill

The skill is heavily Python-focused. Node.js is fully supported in the orq.ai docs but the skill doesn't give an AI assistant enough context to generate correct Node.js instrumentation. Specific gaps:

1. No Node.js OTEL setup example

The skill only shows Python for Observability mode. The docs have a complete Node.js equivalent:

import { NodeSDK } from '@opentelemetry/sdk-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { OpenAIInstrumentation } from '@opentelemetry/instrumentation-openai';

const sdk = new NodeSDK({
  traceExporter: new OTLPTraceExporter(),
  instrumentations: [new OpenAIInstrumentation()],
});
sdk.start();

Required packages: @opentelemetry/sdk-node, @opentelemetry/exporter-trace-otlp-http, @opentelemetry/api, plus framework-specific instrumentors.

2. `@traced` is Python-only — not called out

The decorator guide (traced-decorator-guide.md) never states it's Python-only. An AI assistant will try to generate a Node.js @traced equivalent that doesn't exist, leading to hallucinated code.

3. Node.js instrumentor packages differ from Python

Python: openinference-instrumentation-openai (OpenInference ecosystem)
Node.js: @opentelemetry/instrumentation-openai (native OpenTelemetry ecosystem)

The skill only mentions OpenInference packages, so an assistant will use the wrong package for Node.js.

4. Node.js requires explicit shutdown

process.on('SIGTERM', () => {
  sdk.shutdown().then(() => process.exit(0));
});

The anti-patterns table mentions this (line 241) but the setup workflow doesn't include it as a step.

Suggestion

Either add a Node.js section to framework-integrations.md and SKILL.md covering the differences, or explicitly scope the Observability code examples as Python-only and link to the docs for Node.js setup.

@Traced

… context - Replace non-existent `get_evaluator_llm`/`get_evaluator_python` with `evaluator_get` across 4 skills - Add SDK init prerequisite to @Traced guide (silent failure without Orq client) - Document capture_input/capture_output defaults as True (PII risk) - Add missing `import os` to framework-integrations code snippets - Explain Control Tower column in framework integrations table - Scope @Traced and OTEL examples as Python-only, add Node.js pointers Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Baukebrenninkmeijer · 2026-04-02T14:10:28Z

Re-review: Remaining Issues (after latest commits)

Most feedback from earlier comments has been addressed — @traced is now Python-scoped, SDK init prerequisite added, import os fixed, Control Tower explained, Node.js callouts added. Nice work.

Three items still open:

1. `tests/mcp-tools.md:34-35` — `get_evaluator_llm` / `get_evaluator_python` still referenced

These tools don't exist on orq-remote-mcp yet (only evaluator_get on orq-mcp-global). The test cases:

18. `get_evaluator_llm(key=orq-skills-test-llm-eval)` → verify returns prompt and model
19. `get_evaluator_python(key=orq-skills-test-py-eval)` → verify returns code

Note: There are new tools on staging (get_python_eval / get_llm_eval) that should cover this functionality. Please verify these exist on staging and update the tool names to match whatever lands in production.

2. `tests/skills.md:92-157` — `compare-agents` test scenarios for non-existent skill

5 test scenarios + Critical Files references (lines 153-156) for skills/compare-agents/ which doesn't exist in the repo. These should ship with the PR that adds that skill, not this one.

3. `my.orq.ai` vs `api.orq.ai` in curl examples (pre-existing, low priority)

Several api-reference files use https://my.orq.ai/v2/... as the API base URL in curl examples (e.g., build-agent/resources/api-reference.md). The actual API base per docs is https://api.orq.ai/v2/.... Pre-existing — not introduced by this PR — but worth a cleanup pass.

- Replace non-existent get_evaluator_llm/get_evaluator_python with evaluator_get in mcp-tools tests - Remove compare-agents test scenarios (should ship with compare-agents PR, not this one) - Remove compare-agents from Critical Files list Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

arianpasquali marked this pull request as draft March 24, 2026 11:05

docs: remove claude-plugins mention from README

12055d4

The claude-plugins repo is still a work in progress, deferring the mention until it's ready. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

arianpasquali requested a review from Baukebrenninkmeijer March 24, 2026 16:41

refactor: rename instrument-app to setup-observability

9f9ffc2

Rename skill to better reflect what it does. Update README skills table and add "Instrument an Existing App" workflow example. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

arianpasquali marked this pull request as ready for review March 24, 2026 16:54

arianpasquali requested review from KarinaKKarinaK and currentlycodinng March 24, 2026 18:50

fix: correct OpenInference import path and stale heading

96161ba

- Fix import: opentelemetry.instrumentation.openai → openinference.instrumentation.openai - Rename heading from "Instrument App" to "Setup Observability" Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

arianpasquali marked this pull request as draft March 26, 2026 08:19

arianpasquali and others added 2 commits March 26, 2026 13:14

test: add setup-observability smoke tests and resolve merge conflict

bbde329

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

arianpasquali marked this pull request as ready for review March 26, 2026 14:42

Baukebrenninkmeijer mentioned this pull request Mar 27, 2026

feat: add compare-agents skill for cross-framework evaluation #13

Merged

5 tasks

Baukebrenninkmeijer requested changes Mar 30, 2026

View reviewed changes

arianpasquali requested a review from Baukebrenninkmeijer April 2, 2026 09:50

arianpasquali and others added 3 commits April 7, 2026 11:13

merge: resolve conflicts with main — keep evaluator details endpoint

e0c90c3

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: remove compare-agents tests re-introduced by merge

9c615d8

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Baukebrenninkmeijer approved these changes Apr 7, 2026

View reviewed changes

Baukebrenninkmeijer merged commit 815185c into main Apr 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add instrument-app skill for orq.ai observability#12

feat: add instrument-app skill for orq.ai observability#12
Baukebrenninkmeijer merged 10 commits into
mainfrom
feat/RES-545-instrument-app-skill

arianpasquali commented Mar 24, 2026 •

edited by currentlycodinng

Loading

Uh oh!

linear Bot commented Mar 24, 2026

Goal

What Was Done

Key Design Decisions

What's Left

Uh oh!

currentlycodinng commented Mar 25, 2026 •

edited

Loading

Uh oh!

currentlycodinng commented Mar 25, 2026

Uh oh!

Baukebrenninkmeijer commented Mar 25, 2026

Uh oh!

Baukebrenninkmeijer commented Mar 27, 2026

Uh oh!

Baukebrenninkmeijer left a comment

Uh oh!

Baukebrenninkmeijer commented Mar 30, 2026

Uh oh!

Baukebrenninkmeijer commented Mar 30, 2026

Uh oh!

Baukebrenninkmeijer commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

arianpasquali commented Mar 24, 2026 • edited by currentlycodinng Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

linear Bot commented Mar 24, 2026

Goal

What Was Done

Key Design Decisions

What's Left

Uh oh!

currentlycodinng commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Findings

Bugs

Uh oh!

currentlycodinng commented Mar 25, 2026

Uh oh!

Baukebrenninkmeijer commented Mar 25, 2026

PR Review Summary

Critical Issues (3)

Important Issues (5)

Suggestions (6)

Strengths

Recommended Action

Uh oh!

Baukebrenninkmeijer commented Mar 27, 2026

Review Finding: get_evaluator_llm / get_evaluator_python MCP tools don't exist

Uh oh!

Baukebrenninkmeijer left a comment

Choose a reason for hiding this comment

Uh oh!

Baukebrenninkmeijer commented Mar 30, 2026

Review: Context Gaps for AI Coding Assistants

1. @traced SDK initialization is missing (HIGH)

2. How to enrich AI Router traces with metadata is not covered (HIGH)

3. No Node.js/TypeScript support for @traced or OTEL (MEDIUM)

4. No pip install / npm install commands (MEDIUM)

5. import os missing from framework-integrations.md snippets (LOW)

6. "Control Tower" column in framework table is unexplained (LOW)

Uh oh!

Baukebrenninkmeijer commented Mar 30, 2026

Review Finding: Node.js/TypeScript coverage gap in setup-observability skill

1. No Node.js OTEL setup example

2. @traced is Python-only — not called out

3. Node.js instrumentor packages differ from Python

4. Node.js requires explicit shutdown

Suggestion

Uh oh!

Baukebrenninkmeijer commented Apr 2, 2026

Re-review: Remaining Issues (after latest commits)

1. tests/mcp-tools.md:34-35 — get_evaluator_llm / get_evaluator_python still referenced

2. tests/skills.md:92-157 — compare-agents test scenarios for non-existent skill

3. my.orq.ai vs api.orq.ai in curl examples (pre-existing, low priority)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

arianpasquali commented Mar 24, 2026 •

edited by currentlycodinng

Loading

currentlycodinng commented Mar 25, 2026 •

edited

Loading

Review Finding: `get_evaluator_llm` / `get_evaluator_python` MCP tools don't exist

1. `@traced` SDK initialization is missing (HIGH)

3. No Node.js/TypeScript support for `@traced` or OTEL (MEDIUM)

4. No `pip install` / `npm install` commands (MEDIUM)

5. `import os` missing from `framework-integrations.md` snippets (LOW)

Review Finding: Node.js/TypeScript coverage gap in `setup-observability` skill

2. `@traced` is Python-only — not called out

1. `tests/mcp-tools.md:34-35` — `get_evaluator_llm` / `get_evaluator_python` still referenced

2. `tests/skills.md:92-157` — `compare-agents` test scenarios for non-existent skill

3. `my.orq.ai` vs `api.orq.ai` in curl examples (pre-existing, low priority)