From 9c75ee42513fea3d6fd93cae6603844296d20be5 Mon Sep 17 00:00:00 2001
From: Arian Pasquali <arianpasquali@gmail.com>
Date: Tue, 24 Mar 2026 12:02:55 +0100
Subject: [PATCH 1/9] feat: add instrument-app skill for orq.ai observability
 (RES-545)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

New skill that guides users through instrumenting LLM applications with
orq.ai tracing — covering AI Router proxy, OpenTelemetry integrations,
the @traced decorator, and trace enrichment with metadata.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 README.md                                     |   4 +-
 skills/instrument-app/SKILL.md                | 248 ++++++++++++++++++
 .../resources/baseline-checklist.md           |  74 ++++++
 .../resources/framework-integrations.md       | 104 ++++++++
 .../resources/traced-decorator-guide.md       | 122 +++++++++
 5 files changed, 551 insertions(+), 1 deletion(-)
 create mode 100644 skills/instrument-app/SKILL.md
 create mode 100644 skills/instrument-app/resources/baseline-checklist.md
 create mode 100644 skills/instrument-app/resources/framework-integrations.md
 create mode 100644 skills/instrument-app/resources/traced-decorator-guide.md

diff --git a/README.md b/README.md
index b1e7e02..5f59b8d 100644
--- a/README.md
+++ b/README.md
@@ -10,6 +10,8 @@ Each skill encodes best practices from prompt engineering, agent design, evaluat
 
 Built on the [Agent Skills](https://agentskills.io/home#adoption) standard format, so it works with any compatible agent (Claude Code, Cursor, Gemini CLI, and others).
 
+**Using Claude Code?** Check out [orq-ai/claude-plugins](https://github.com/orq-ai/claude-plugins) — it bundles orq-skills with **orq-trace** (automatic session tracing) and **orq-mcp** (workspace MCP server) in a single install.
+
 ## Setup
 
 ### Prerequisites
@@ -52,7 +54,6 @@ claude --plugin-dir .
 
 > **Note:** Commands (`/orq:quickstart`, `/orq:workspace`, etc.) and agents are only available when installed as a Claude Code plugin.
 
-
 ### Verify
 
 Run the interactive onboarding to confirm everything works:
@@ -93,6 +94,7 @@ Skills are triggered by describing what you need. Claude picks the right skill a
 <!-- BEGIN_SKILLS_TABLE -->
 | Skill | What It Does | Documentation |
 |-------|-------------|---------------|
+| **instrument-app** | Instrument LLM applications with orq.ai observability — AI Router proxy, OpenTelemetry, `@traced` decorator, and trace enrichment | [SKILL.md](skills/instrument-app/SKILL.md) |
 | **build-agent** | Design, create, and configure an orq.ai Agent with tools, instructions, knowledge bases, and memory | [SKILL.md](skills/build-agent/SKILL.md) |
 | **build-evaluator** | Create validated LLM-as-a-Judge evaluators following evaluation best practices | [SKILL.md](skills/build-evaluator/SKILL.md) |
 | **analyze-trace-failures** | Read production traces, identify what's failing, build failure taxonomies, and categorize issues | [SKILL.md](skills/analyze-trace-failures/SKILL.md) |
diff --git a/skills/instrument-app/SKILL.md b/skills/instrument-app/SKILL.md
new file mode 100644
index 0000000..d4b692d
--- /dev/null
+++ b/skills/instrument-app/SKILL.md
@@ -0,0 +1,248 @@
+---
+name: instrument-app
+description: Instrument LLM applications with orq.ai observability. Use when setting up tracing, adding the AI Router proxy, integrating OpenTelemetry, auditing existing instrumentation, or enriching traces with metadata.
+allowed-tools: Bash, Read, Write, Edit, Grep, Glob, WebFetch, Task, AskUserQuestion, orq*
+---
+
+# Instrument App
+
+You are an **orq.ai observability engineer**. Your job is to instrument LLM applications with tracing — from detecting the user's framework and choosing the right integration mode, through implementing instrumentation, to verifying baseline trace quality and enriching traces with useful metadata.
+
+## Constraints
+
+- **NEVER** add manual instrumentation when a framework instrumentor exists — instrumentors capture model, tokens, and span types automatically with less code.
+- **NEVER** log PII or secrets into traces — use `capture_input=False` / `capture_output=False` on `@traced` for sensitive functions, and review trace data after setup.
+- **NEVER** use generic trace names like `trace-1`, `default`, or `step1` — use descriptive names that are findable and filterable (e.g., `chat-response`, `classify-intent`).
+- **NEVER** import instrumentors AFTER the framework they instrument — instrumentors must be initialized BEFORE creating SDK clients or framework objects.
+- **ALWAYS** verify traces appear in the orq.ai UI before adding enrichment — confirm the baseline works first.
+- **ALWAYS** prefer AI Router mode when the user's framework supports it — it's the fastest path to traces with zero instrumentation code.
+- **ALWAYS** set `service.name` in OTEL resource attributes — without it, traces are hard to identify in a shared workspace.
+
+**Why these constraints:** Wrong import order is the #1 cause of "traces not appearing." Generic names make traces unfindable at scale. Logging PII creates compliance risk. Framework instrumentors capture 10x more metadata than manual tracing with less code.
+
+## Companion Skills
+
+- `analyze-trace-failures` — diagnose failures from trace data (requires traces to exist first)
+- `build-evaluator` — design quality evaluators using trace data as input
+- `run-experiment` — run experiments and compare configurations with trace visibility
+- `optimize-prompt` — improve prompts, then verify improvements via traces
+
+## Workflow Checklist
+
+Copy this to track progress:
+
+```
+Instrumentation Progress:
+- [ ] Phase 1: Assess current state (framework, SDK, existing instrumentation)
+- [ ] Phase 2: Choose integration mode (AI Router vs Observability vs both)
+- [ ] Phase 3: Implement integration (framework-specific setup)
+- [ ] Phase 4: Verify baseline (traces appearing, model/tokens captured, span hierarchy)
+- [ ] Phase 5: Enrich traces (session_id, user_id, tags, @traced for custom spans)
+```
+
+## Resources
+
+- **Framework integrations:** See [resources/framework-integrations.md](resources/framework-integrations.md)
+- **@traced decorator guide:** See [resources/traced-decorator-guide.md](resources/traced-decorator-guide.md)
+- **Baseline checklist:** See [resources/baseline-checklist.md](resources/baseline-checklist.md)
+
+---
+
+## orq.ai Documentation
+
+**Observability:** [Traces](https://docs.orq.ai/docs/observability/traces) · [Trace Automations](https://docs.orq.ai/docs/observability/trace-automation) · [Observability Overview](https://docs.orq.ai/docs/observability/overview)
+
+**Frameworks:** [Framework Integrations](https://docs.orq.ai/docs/proxy/frameworks/overview) · [OpenAI SDK](https://docs.orq.ai/docs/proxy/frameworks/openai) · [LangChain](https://docs.orq.ai/docs/proxy/frameworks/langchain) · [CrewAI](https://docs.orq.ai/docs/proxy/frameworks/crewai) · [Vercel AI](https://docs.orq.ai/docs/proxy/frameworks/vercel-ai)
+
+**AI Router:** [Getting Started](https://docs.orq.ai/docs/router/getting-started) · [API Keys](https://docs.orq.ai/docs/router/api-keys) · [OpenAI-Compatible API](https://docs.orq.ai/docs/proxy/openai-compatible-api) · [Supported Models](https://docs.orq.ai/docs/proxy/supported-models)
+
+**Integrations:** [Integration Overview](https://docs.orq.ai/docs/integrations/overview) · [OpenTelemetry Tracing](https://docs.orq.ai/docs/integrations/overview#opentelemetry-tracing)
+
+### Key Concepts
+
+- **AI Router** (`https://api.orq.ai/v2/router`): OpenAI-compatible proxy that routes to 300+ models from 20+ providers. Traces are generated automatically for every call.
+- **Observability** (`https://api.orq.ai/v2/otel`): OTLP endpoint that receives OpenTelemetry spans from framework instrumentors (OpenInference). Captures agent steps, tool calls, chain execution.
+- **`@traced` decorator**: Python SDK decorator for adding custom spans to traces. Supports typed spans: `agent`, `llm`, `tool`, `retrieval`, `embedding`, `function`.
+- Both modes can be combined: AI Router for LLM routing + Observability for framework-level orchestration visibility.
+
+## Destructive Actions
+
+The following require explicit user confirmation via `AskUserQuestion`:
+- Modifying existing environment variables or configuration files
+- Overwriting existing instrumentation setup code
+- Adding dependencies to the project (pip install / npm install)
+
+---
+
+## Steps
+
+Follow these steps **in order**. Do NOT skip steps.
+
+### Phase 1: Assess Current State
+
+1. **Scan the project** to understand the LLM stack. Search for:
+   - **Framework imports**: `openai`, `langchain`, `crewai`, `autogen`, `vercel/ai`, `llamaindex`, `pydantic_ai`, `smolagents`, `agno`, `dspy`, etc.
+   - **Existing orq.ai usage**: `orq.ai`, `ORQ_API_KEY`, `api.orq.ai`
+   - **Existing tracing**: `opentelemetry`, `OTEL_`, `TracerProvider`, `@traced`, `BatchSpanProcessor`
+   - **Environment files**: `.env`, `.env.example`, config files with API keys or base URLs
+
+2. **Summarize findings** to the user:
+   - Framework(s) detected
+   - Whether orq.ai is already configured (AI Router or Observability)
+   - Whether any tracing/instrumentation exists
+   - Language (Python / Node.js / both)
+
+### Phase 2: Choose Integration Mode
+
+3. **Recommend the integration mode** based on findings. Use [resources/framework-integrations.md](resources/framework-integrations.md) for the decision guide:
+
+   | Situation | Recommendation |
+   |-----------|---------------|
+   | No tracing yet, framework supports AI Router | **AI Router** — fastest path, traces are automatic |
+   | Already calling providers directly, don't want to change LLM calls | **Observability only** — add OTEL instrumentors |
+   | Want multi-provider routing AND framework-level span detail | **Both** — AI Router for routing, OTEL for orchestration spans |
+   | Framework only supports Observability (BeeAI, Haystack, LiteLLM, Google AI) | **Observability only** |
+
+4. **Confirm with the user** before proceeding. Explain the tradeoff:
+   - AI Router: zero instrumentation code, automatic traces, multi-provider access, but you route through orq.ai
+   - Observability: keep your existing LLM calls, add tracing on top, more setup but no routing change
+
+### Phase 3: Implement Integration
+
+5. **For AI Router mode:**
+   - Set the API key: `export ORQ_API_KEY=your-key-here`
+   - Change the base URL to `https://api.orq.ai/v2/router`
+   - Use `provider/model` format for model names (e.g., `openai/gpt-4o`, `anthropic/claude-sonnet-4-5-20250929`)
+   - That's it — traces appear automatically
+
+   **Python (OpenAI SDK):**
+   ```python
+   from openai import OpenAI
+   import os
+
+   client = OpenAI(
+       base_url="https://api.orq.ai/v2/router",
+       api_key=os.getenv("ORQ_API_KEY"),
+   )
+   ```
+
+   **Node.js (OpenAI SDK):**
+   ```typescript
+   import OpenAI from "openai";
+
+   const client = new OpenAI({
+       baseURL: "https://api.orq.ai/v2/router",
+       apiKey: process.env.ORQ_API_KEY,
+   });
+   ```
+
+   For framework-specific setup (LangChain, CrewAI, etc.), refer to the framework's docs page linked in [resources/framework-integrations.md](resources/framework-integrations.md).
+
+6. **For Observability mode:**
+   - Set OTEL environment variables:
+     ```bash
+     export OTEL_EXPORTER_OTLP_ENDPOINT="https://api.orq.ai/v2/otel"
+     export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer $ORQ_API_KEY"
+     export OTEL_RESOURCE_ATTRIBUTES="service.name=my-app,service.version=1.0.0"
+     export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL="http/json"
+     ```
+   - Install the framework's OpenInference instrumentor package
+   - Initialize the instrumentor BEFORE creating SDK clients
+   - Refer to the framework's docs page for the exact instrumentor and setup
+
+   **Python (OpenAI example):**
+   ```python
+   from opentelemetry import trace
+   from opentelemetry.sdk.trace import TracerProvider
+   from opentelemetry.sdk.trace.export import BatchSpanProcessor
+   from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
+   from opentelemetry.instrumentation.openai import OpenAIInstrumentor
+
+   # Initialize BEFORE creating OpenAI client
+   tracer_provider = TracerProvider()
+   tracer_provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter()))
+   trace.set_tracer_provider(tracer_provider)
+   OpenAIInstrumentor().instrument(tracer_provider=tracer_provider)
+   ```
+
+7. **For both modes:** Set up AI Router first (step 5), then add Observability (step 6) for framework-level spans on top.
+
+### Phase 4: Verify Baseline
+
+8. **Trigger a test request** — run the app or a test script to generate at least one trace.
+
+9. **Check traces in orq.ai** — direct the user to open [Traces](https://my.orq.ai) in the orq.ai dashboard.
+
+10. **Verify baseline requirements** using [resources/baseline-checklist.md](resources/baseline-checklist.md):
+
+    | Requirement | How to Check |
+    |------------|-------------|
+    | Traces appearing | At least one trace visible in the Traces view |
+    | Model name captured | Open an LLM span → `model` field shows model ID |
+    | Token usage tracked | LLM span shows `input_tokens` and `output_tokens` |
+    | Span hierarchy | Trace View shows nested spans for multi-step operations |
+    | Correct span types | LLM calls show as `llm`, retrievals as `retrieval`, etc. |
+    | No sensitive data | Spot-check span inputs/outputs for PII or secrets |
+
+11. **Fix any gaps** before moving to enrichment. Common fixes:
+    - Traces not appearing → check import order, API key, OTEL endpoint
+    - Flat hierarchy → ensure instrumentor is initialized before client creation
+    - Missing tokens → check if provider/framework supports token reporting
+
+12. **Encourage exploration:** Tell the user to browse a few traces in the UI before adding more context. This helps them form opinions about what data is useful vs missing.
+
+### Phase 5: Enrich Traces
+
+13. **Infer additional context needs from the code.** Look for patterns — do NOT ask the user about all of these; infer when possible:
+
+    | If You See in Code... | Suggest Adding |
+    |----------------------|----------------|
+    | Conversation history, chat endpoints, message arrays | `session_id` to group conversations |
+    | User authentication, `user_id` variables | `user_id` for per-user filtering |
+    | Multiple distinct features or endpoints | `feature` tag for per-feature analytics |
+    | Customer/tenant identifiers | `customer_id` or tier tag |
+    | Feedback collection, ratings | Score annotations |
+
+14. **Add `@traced` for custom spans** where the user has application logic not captured by framework instrumentors. See [resources/traced-decorator-guide.md](resources/traced-decorator-guide.md) for the full reference.
+
+    Priority targets for `@traced`:
+    - The top-level orchestration function (type: `agent`)
+    - Data preprocessing / postprocessing (type: `function`)
+    - Custom tool implementations (type: `tool`)
+    - RAG retrieval logic (type: `retrieval`)
+
+15. **Only ask the user** when context needs aren't obvious from code:
+    - "How do you know when a response is good vs bad?" → determines scoring approach
+    - "What would you want to filter by in a dashboard?" → surfaces non-obvious tags
+    - "Are there different user segments you'd want to compare?" → customer tiers, plans
+
+16. **Guide to relevant UI features** based on what was added:
+    - Traces view: see individual requests
+    - Timeline view: identify latency bottlenecks
+    - Thread view: see conversation flows (if session_id added)
+    - Trace automations: set up automatic quality monitoring
+
+---
+
+## Anti-Patterns
+
+| Anti-Pattern | What to Do Instead |
+|---|---|
+| Manual tracing when framework instrumentor exists | Use the framework instrumentor — it captures model, tokens, spans automatically |
+| Instrumentor imported AFTER framework client creation | Initialize instrumentor BEFORE creating SDK clients |
+| Generic trace names (`default`, `trace-1`) | Use descriptive names: `chat-response`, `classify-intent`, `fetch-orders` |
+| Logging PII/secrets in trace inputs | Use `capture_input=False` on `@traced`, review trace data post-setup |
+| No `service.name` in OTEL attributes | Always set `service.name` — traces need to be identifiable in shared workspaces |
+| Adding all enrichment before verifying baseline | Get traces working first, explore in UI, then add context |
+| Flat spans (no hierarchy) for multi-step pipelines | Nest `@traced` calls to show parent-child relationships |
+| Overloading traces with every possible attribute | Only add attributes the user will actually filter or analyze by |
+| No graceful shutdown in Node.js | Call `sdk.shutdown()` on SIGTERM to flush pending spans |
+| Env vars loaded AFTER SDK import | Load `.env` / set env vars BEFORE importing orq or OTEL packages |
+
+## Open in orq.ai
+
+After completing this skill, direct the user to:
+- **Traces:** [my.orq.ai](https://my.orq.ai/) — inspect trace hierarchy, timing, and captured data
+- **AI Router:** [my.orq.ai](https://my.orq.ai/) — manage providers, models, and API keys
+- **Trace Automations:** [my.orq.ai](https://my.orq.ai/) — set up automatic monitoring rules
+- **Next step:** Use `analyze-trace-failures` to diagnose issues from the traces you're now capturing
diff --git a/skills/instrument-app/resources/baseline-checklist.md b/skills/instrument-app/resources/baseline-checklist.md
new file mode 100644
index 0000000..0819cdc
--- /dev/null
+++ b/skills/instrument-app/resources/baseline-checklist.md
@@ -0,0 +1,74 @@
+# Baseline Instrumentation Checklist
+
+Verify these requirements after setting up instrumentation. Framework integrations handle most automatically — only manual instrumentation needs all checks.
+
+## Requirements
+
+| # | Requirement | Why | Auto with AI Router? | Auto with Framework Instrumentor? |
+|---|------------|-----|:---:|:---:|
+| 1 | **Model name captured** | Enables model comparison, cost attribution, filtering by model | yes | yes |
+| 2 | **Token usage tracked** | Enables cost calculation and usage analytics | yes | yes |
+| 3 | **Descriptive trace names** | Makes traces findable — `chat-response` not `trace-1` | partial | partial |
+| 4 | **Proper span hierarchy** | Shows which step is slow or failing in multi-step operations | n/a | yes |
+| 5 | **Correct span types** | Enables type-specific analytics (LLM latency, retrieval quality) | yes | yes |
+| 6 | **Sensitive data masked** | Prevents PII/secrets from leaking into trace storage | no | no |
+| 7 | **Trace input/output set explicitly** | Makes traces readable; avoids logging irrelevant function args | partial | partial |
+
+### How to Verify Each
+
+**1. Model name** — Open a trace in [Traces](https://my.orq.ai) → click an LLM span → confirm `model` field shows the model ID (e.g., `openai/gpt-4o`).
+
+**2. Token usage** — Same LLM span → check `input_tokens` and `output_tokens` are populated. If zero, the instrumentor may not support the provider or streaming mode.
+
+**3. Trace names** — In the Traces list view, scan the Name column. Look for generic names (`default`, `trace-1`, `LLMChain`) and rename with descriptive alternatives. For `@traced`, set the `name` parameter. For frameworks, check how to customize trace/chain names in the framework docs.
+
+**4. Span hierarchy** — Open a trace → switch to Trace View. Multi-step operations should show nested spans (parent → child). Flat traces with all spans at the same level indicate missing nesting. For `@traced`, ensure child functions are called within the parent's traced scope.
+
+**5. Span types** — In Trace View, check that LLM calls show as `llm` type, retrievals as `retrieval`, tool calls as `tool`, etc. Framework instrumentors set these automatically. For `@traced`, set the `type` parameter correctly.
+
+**6. Sensitive data** — Review a few traces for PII (names, emails, tokens, API keys) in span inputs/outputs. Use `capture_input=False` / `capture_output=False` on `@traced` for sensitive functions. For framework instrumentors, check if they offer input/output filtering.
+
+**7. Trace input/output** — Open a trace → check the top-level input shows the user's actual request (not internal state). For `@traced` with `capture_input=True`, only the function args are logged — ensure they represent meaningful input. Use `attributes` for metadata instead of polluting input.
+
+## After Baseline Passes
+
+Encourage the user to explore traces in the orq.ai UI before adding more context:
+
+> "Your traces are appearing in orq.ai. Open a few in [Traces](https://my.orq.ai) — look at the span hierarchy, timing, and captured data. What's useful? What's missing? This helps us decide what additional context to add."
+
+## Additional Context (Add After Baseline)
+
+Only add these when relevant — infer from the user's code when possible:
+
+| If You See in Code... | Suggest Adding | Why |
+|----------------------|----------------|-----|
+| Conversation history, chat endpoints, message arrays | `session_id` | Groups messages from the same conversation |
+| User authentication, `user_id` variables | `user_id` on traces | Enables per-user filtering and cost attribution |
+| Multiple distinct features or endpoints | `feature` tag via attributes | Enables per-feature analytics |
+| Customer/tenant identifiers | `customer_id` or tier tag | Cost/quality breakdown by segment |
+| Feedback collection, ratings | Score annotations | Enables quality trend monitoring |
+| Environment variables like `NODE_ENV`, `FLASK_ENV` | `environment` tag | Separates dev/staging/prod traces |
+
+### How to Add Context
+
+**With `@traced`:**
+```python
+@traced(
+    name="chat-response",
+    type="agent",
+    attributes={
+        "session_id": session_id,
+        "user_id": user_id,
+        "feature": "customer-support",
+    }
+)
+```
+
+**With OpenTelemetry span attributes:**
+```python
+from opentelemetry import trace
+
+span = trace.get_current_span()
+span.set_attribute("session_id", session_id)
+span.set_attribute("user_id", user_id)
+```
diff --git a/skills/instrument-app/resources/framework-integrations.md b/skills/instrument-app/resources/framework-integrations.md
new file mode 100644
index 0000000..f323aa5
--- /dev/null
+++ b/skills/instrument-app/resources/framework-integrations.md
@@ -0,0 +1,104 @@
+# Framework Integrations
+
+## Which Integration Mode?
+
+| Mode | What It Does | When to Use |
+|------|-------------|-------------|
+| **AI Router** | Route LLM calls through `https://api.orq.ai/v2/router` — traces generated automatically | You want multi-provider access, fallbacks, caching, cost tracking with zero instrumentation code |
+| **Observability** | Send OpenTelemetry traces from your existing setup to `https://api.orq.ai/v2/otel` | You already call providers directly and want to add tracing without changing your LLM calls |
+| **Both** | AI Router for routing + Observability for framework-level spans | You want full pipeline visibility: framework orchestration spans + LLM call traces |
+
+**Rule of thumb:** If the user's framework is in the AI Router column, start there — it's the fastest path to traces. Add Observability on top only if they need framework-level span detail (agent steps, tool calls, chain execution).
+
+## Supported Frameworks
+
+| Framework | AI Router | Observability | Control Tower | Docs |
+|-----------|:---------:|:-------------:|:-------------:|------|
+| Agno | yes | yes | | [docs](https://docs.orq.ai/docs/proxy/frameworks/agno) |
+| AutoGen | yes | yes | | [docs](https://docs.orq.ai/docs/proxy/frameworks/autogen) |
+| AWS Strands | yes | yes | | [docs](https://docs.orq.ai/docs/proxy/frameworks/aws-strands) |
+| Azure AI Agents | yes | yes | | [docs](https://docs.orq.ai/docs/proxy/frameworks/azure-ai-agents) |
+| BeeAI | | yes | | [docs](https://docs.orq.ai/docs/proxy/frameworks/beeai) |
+| CrewAI | yes | yes | | [docs](https://docs.orq.ai/docs/proxy/frameworks/crewai) |
+| DSPy | yes | yes | | [docs](https://docs.orq.ai/docs/proxy/frameworks/dspy) |
+| Google AI | | yes | | [docs](https://docs.orq.ai/docs/proxy/frameworks/google-ai) |
+| Haystack | | yes | | [docs](https://docs.orq.ai/docs/proxy/frameworks/haystack) |
+| Instructor | yes | yes | | [docs](https://docs.orq.ai/docs/proxy/frameworks/instructor) |
+| LangChain | yes | yes | | [docs](https://docs.orq.ai/docs/proxy/frameworks/langchain) |
+| LangGraph | yes | | yes | [docs](https://docs.orq.ai/docs/proxy/frameworks/langgraph) |
+| LiteLLM | | yes | | [docs](https://docs.orq.ai/docs/proxy/frameworks/litellm) |
+| LiveKit | yes | yes | | [docs](https://docs.orq.ai/docs/proxy/frameworks/livekit) |
+| LlamaIndex | yes | yes | | [docs](https://docs.orq.ai/docs/proxy/frameworks/llamaindex) |
+| LlamaIndex Agents | yes | | | [docs](https://docs.orq.ai/docs/proxy/frameworks/llamaindex-agents) |
+| Mastra | yes | yes | | [docs](https://docs.orq.ai/docs/proxy/frameworks/mastra) |
+| OpenAI SDK | yes | yes | | [docs](https://docs.orq.ai/docs/proxy/frameworks/openai) |
+| OpenAI Agents | yes | yes | yes | [docs](https://docs.orq.ai/docs/proxy/frameworks/openai-agents) |
+| OpenClaw | | yes | | [docs](https://docs.orq.ai/docs/proxy/frameworks/openclaw) |
+| Pydantic AI | yes | yes | | [docs](https://docs.orq.ai/docs/proxy/frameworks/pydantic-ai) |
+| Semantic Kernel | yes | | | [docs](https://docs.orq.ai/docs/proxy/frameworks/semantic-kernel) |
+| SmolAgents | yes | yes | | [docs](https://docs.orq.ai/docs/proxy/frameworks/smolagents) |
+| Vercel AI SDK | yes | yes | yes | [docs](https://docs.orq.ai/docs/proxy/frameworks/vercel-ai) |
+
+## AI Router Quick Setup Pattern
+
+All AI Router integrations follow the same pattern — point your SDK's base URL to orq.ai:
+
+**Python (OpenAI SDK):**
+```python
+from openai import OpenAI
+
+client = OpenAI(
+    base_url="https://api.orq.ai/v2/router",
+    api_key=os.getenv("ORQ_API_KEY"),
+)
+```
+
+**Node.js (OpenAI SDK):**
+```typescript
+import OpenAI from "openai";
+
+const client = new OpenAI({
+    baseURL: "https://api.orq.ai/v2/router",
+    apiKey: process.env.ORQ_API_KEY,
+});
+```
+
+**LangChain:**
+```python
+from langchain_openai import ChatOpenAI
+
+llm = ChatOpenAI(
+    model="gpt-4o",
+    api_key=os.getenv("ORQ_API_KEY"),
+    base_url="https://api.orq.ai/v2/router",
+)
+```
+
+## Observability (OpenTelemetry) Quick Setup Pattern
+
+All observability integrations use OpenInference instrumentors with OTLP export to orq.ai:
+
+**Environment variables (all frameworks):**
+```bash
+export OTEL_EXPORTER_OTLP_ENDPOINT="https://api.orq.ai/v2/otel"
+export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer $ORQ_API_KEY"
+export OTEL_RESOURCE_ATTRIBUTES="service.name=my-app,service.version=1.0.0"
+export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL="http/json"
+```
+
+**Python (OpenAI instrumentor):**
+```python
+from opentelemetry import trace
+from opentelemetry.sdk.trace import TracerProvider
+from opentelemetry.sdk.trace.export import BatchSpanProcessor
+from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
+from opentelemetry.instrumentation.openai import OpenAIInstrumentor
+
+tracer_provider = TracerProvider()
+tracer_provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter()))
+trace.set_tracer_provider(tracer_provider)
+
+OpenAIInstrumentor().instrument(tracer_provider=tracer_provider)
+```
+
+**Key:** Each framework has its own OpenInference instrumentor package. See the framework-specific docs page for the exact package name and import.
diff --git a/skills/instrument-app/resources/traced-decorator-guide.md b/skills/instrument-app/resources/traced-decorator-guide.md
new file mode 100644
index 0000000..43a68e8
--- /dev/null
+++ b/skills/instrument-app/resources/traced-decorator-guide.md
@@ -0,0 +1,122 @@
+# The `@traced` Decorator
+
+The `@traced` decorator from the orq.ai Python SDK adds custom spans to your traces for application logic that isn't automatically captured by framework instrumentors.
+
+**Docs:** [Custom Tracing using the @traced decorator](https://docs.orq.ai/docs/observability/traces#custom-tracing-using-the-@traced-decorator)
+
+## When to Use
+
+| Scenario | Use `@traced` | Use Framework Instrumentor |
+|----------|:------------:|:--------------------------:|
+| LLM calls via OpenAI/LangChain/etc. | | yes |
+| Custom business logic between LLM calls | yes | |
+| Data preprocessing / postprocessing | yes | |
+| Tool implementations in an agent | yes | |
+| RAG retrieval logic | yes | |
+| Orchestration / routing functions | yes | |
+
+**Rule:** Use framework instrumentors for LLM calls (they capture model, tokens, etc. automatically). Use `@traced` for everything else that you want visible in the trace.
+
+## Span Types
+
+| Type | When to Use |
+|------|-------------|
+| `agent` | Orchestration workflows, agent execution loops |
+| `llm` | Direct LLM API calls (prefer framework instrumentors when available) |
+| `tool` | External tool invocations, API calls, database queries |
+| `retrieval` | Knowledge lookups, vector search, document fetching |
+| `embedding` | Embedding operations |
+| `function` | General processing steps, data transformation, validation |
+
+## Parameters
+
+```python
+@traced(
+    name="operation_name",       # Descriptive name shown in trace UI
+    type="function",             # Span type (see table above)
+    capture_input=True,          # Whether to capture function input args
+    capture_output=True,         # Whether to capture function return value
+    attributes={                 # Custom key-value metadata
+        "custom_key": "value"
+    }
+)
+```
+
+| Parameter | Default | Notes |
+|-----------|---------|-------|
+| `name` | function name | Use descriptive names: `"fetch-user-context"` not `"step1"` |
+| `type` | `"function"` | Pick the semantic type that matches the operation |
+| `capture_input` | `True` | Set `False` if inputs contain PII or secrets |
+| `capture_output` | `True` | Set `False` if outputs contain sensitive data |
+| `attributes` | `{}` | Add searchable metadata: user tier, feature name, etc. |
+
+## Examples
+
+### Sync Function
+```python
+from orq_ai_sdk.tracing import traced
+
+@traced(name="extract-keywords", type="function")
+def extract_keywords(text: str) -> list[str]:
+    # Your logic here
+    return keywords
+```
+
+### Async Function
+```python
+from orq_ai_sdk.tracing import traced
+
+@traced(name="fetch-context", type="retrieval")
+async def fetch_context(query: str) -> list[dict]:
+    results = await vector_db.search(query)
+    return results
+```
+
+### Agent Orchestration
+```python
+from orq_ai_sdk.tracing import traced
+
+@traced(name="support-agent", type="agent")
+def run_support_agent(user_message: str) -> str:
+    context = fetch_context(user_message)      # traced as retrieval
+    response = generate_response(context)       # traced by framework instrumentor
+    log_interaction(user_message, response)     # traced as function
+    return response
+```
+
+### Hiding Sensitive Data
+```python
+@traced(
+    name="process-payment",
+    type="tool",
+    capture_input=False,   # Don't capture credit card details
+    capture_output=False,  # Don't capture payment tokens
+    attributes={"service": "payments"}
+)
+def process_payment(card_number: str, amount: float) -> dict:
+    ...
+```
+
+### Adding Custom Attributes
+```python
+@traced(
+    name="classify-intent",
+    type="function",
+    attributes={
+        "feature": "routing",
+        "version": "2.1",
+    }
+)
+def classify_intent(message: str) -> str:
+    ...
+```
+
+## Common Mistakes
+
+| Mistake | Problem | Fix |
+|---------|---------|-----|
+| Using `@traced` for LLM calls when instrumentor exists | Misses model/token metadata | Use framework instrumentor for LLM calls |
+| Generic names like `"step1"`, `"process"` | Hard to find in trace UI | Use descriptive names: `"classify-intent"`, `"fetch-user-orders"` |
+| `capture_input=True` on functions with secrets | Leaks API keys, tokens, PII into traces | Set `capture_input=False` and use `attributes` for safe metadata |
+| Wrong span type | Breaks trace analytics (e.g., retrieval latency dashboard) | Match type to semantic meaning of the operation |
+| Forgetting to trace orchestration function | Top-level agent loop invisible in traces | Wrap the entry point with `@traced(type="agent")` |

From 12055d49ceafcaf82341a7032f8b73156c80578a Mon Sep 17 00:00:00 2001
From: Arian Pasquali <arianpasquali@gmail.com>
Date: Tue, 24 Mar 2026 12:06:22 +0100
Subject: [PATCH 2/9] docs: remove claude-plugins mention from README

The claude-plugins repo is still a work in progress, deferring the
mention until it's ready.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 README.md | 1 -
 1 file changed, 1 deletion(-)

diff --git a/README.md b/README.md
index 5f59b8d..614e6a2 100644
--- a/README.md
+++ b/README.md
@@ -10,7 +10,6 @@ Each skill encodes best practices from prompt engineering, agent design, evaluat
 
 Built on the [Agent Skills](https://agentskills.io/home#adoption) standard format, so it works with any compatible agent (Claude Code, Cursor, Gemini CLI, and others).
 
-**Using Claude Code?** Check out [orq-ai/claude-plugins](https://github.com/orq-ai/claude-plugins) — it bundles orq-skills with **orq-trace** (automatic session tracing) and **orq-mcp** (workspace MCP server) in a single install.
 
 ## Setup
 

From 9f9ffc27bf95d6f0ea1c1e6a978c750418ca116a Mon Sep 17 00:00:00 2001
From: Arian Pasquali <arianpasquali@gmail.com>
Date: Tue, 24 Mar 2026 17:41:19 +0100
Subject: [PATCH 3/9] refactor: rename instrument-app to setup-observability

Rename skill to better reflect what it does. Update README skills table
and add "Instrument an Existing App" workflow example.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 README.md                                      | 18 +++++++++++++-----
 .../SKILL.md                                   |  4 ++--
 .../resources/baseline-checklist.md            |  0
 .../resources/framework-integrations.md        |  0
 .../resources/traced-decorator-guide.md        |  0
 5 files changed, 15 insertions(+), 7 deletions(-)
 rename skills/{instrument-app => setup-observability}/SKILL.md (98%)
 rename skills/{instrument-app => setup-observability}/resources/baseline-checklist.md (100%)
 rename skills/{instrument-app => setup-observability}/resources/framework-integrations.md (100%)
 rename skills/{instrument-app => setup-observability}/resources/traced-decorator-guide.md (100%)

diff --git a/README.md b/README.md
index 614e6a2..3e020a2 100644
--- a/README.md
+++ b/README.md
@@ -93,7 +93,7 @@ Skills are triggered by describing what you need. Claude picks the right skill a
 <!-- BEGIN_SKILLS_TABLE -->
 | Skill | What It Does | Documentation |
 |-------|-------------|---------------|
-| **instrument-app** | Instrument LLM applications with orq.ai observability — AI Router proxy, OpenTelemetry, `@traced` decorator, and trace enrichment | [SKILL.md](skills/instrument-app/SKILL.md) |
+| **setup-observability** | Set up orq.ai observability for existing LLM applications — AI Router proxy, OpenTelemetry, `@traced` decorator, and trace enrichment | [SKILL.md](skills/setup-observability/SKILL.md) |
 | **build-agent** | Design, create, and configure an orq.ai Agent with tools, instructions, knowledge bases, and memory | [SKILL.md](skills/build-agent/SKILL.md) |
 | **build-evaluator** | Create validated LLM-as-a-Judge evaluators following evaluation best practices | [SKILL.md](skills/build-evaluator/SKILL.md) |
 | **analyze-trace-failures** | Read production traces, identify what's failing, build failure taxonomies, and categorize issues | [SKILL.md](skills/analyze-trace-failures/SKILL.md) |
@@ -106,7 +106,15 @@ Skills are triggered by describing what you need. Claude picks the right skill a
 
 ## Workflows
 
-### 1. Build a New Agent
+### 1. Instrument an Existing App
+
+```
+"Add orq.ai tracing to my app"               → setup-observability
+/orq:traces --last 1h                          # Verify traces are flowing
+"Analyze these traces for failures"            → analyze-trace-failures
+```
+
+### 2. Build a New Agent
 
 ```
 "I need a customer support agent"             → build-agent
@@ -115,7 +123,7 @@ Skills are triggered by describing what you need. Claude picks the right skill a
 "Run an experiment to get a baseline"          → run-experiment
 ```
 
-### 2. Debug Production Issues
+### 3. Debug Production Issues
 
 ```
 /orq:traces --status error --last 24h          # Find errors
@@ -124,7 +132,7 @@ Skills are triggered by describing what you need. Claude picks the right skill a
 "Re-run the experiment to verify the fix"      → run-experiment
 ```
 
-### 3. Improve an Existing Agent
+### 4. Improve an Existing Agent
 
 ```
 /orq:analytics --group-by deployment           # Spot high error rates
@@ -135,7 +143,7 @@ Skills are triggered by describing what you need. Claude picks the right skill a
 "Optimize the prompt based on results"         → optimize-prompt
 ```
 
-### 4. Improve an existing Prompt
+### 5. Improve an Existing Prompt
 
 ```
 "My prompt isn't performing well, help me improve it" → optimize-prompt
diff --git a/skills/instrument-app/SKILL.md b/skills/setup-observability/SKILL.md
similarity index 98%
rename from skills/instrument-app/SKILL.md
rename to skills/setup-observability/SKILL.md
index d4b692d..9578116 100644
--- a/skills/instrument-app/SKILL.md
+++ b/skills/setup-observability/SKILL.md
@@ -1,6 +1,6 @@
 ---
-name: instrument-app
-description: Instrument LLM applications with orq.ai observability. Use when setting up tracing, adding the AI Router proxy, integrating OpenTelemetry, auditing existing instrumentation, or enriching traces with metadata.
+name: setup-observability
+description: Set up orq.ai observability for LLM applications. Use when setting up tracing, adding the AI Router proxy, integrating OpenTelemetry, auditing existing instrumentation, or enriching traces with metadata.
 allowed-tools: Bash, Read, Write, Edit, Grep, Glob, WebFetch, Task, AskUserQuestion, orq*
 ---
 
diff --git a/skills/instrument-app/resources/baseline-checklist.md b/skills/setup-observability/resources/baseline-checklist.md
similarity index 100%
rename from skills/instrument-app/resources/baseline-checklist.md
rename to skills/setup-observability/resources/baseline-checklist.md
diff --git a/skills/instrument-app/resources/framework-integrations.md b/skills/setup-observability/resources/framework-integrations.md
similarity index 100%
rename from skills/instrument-app/resources/framework-integrations.md
rename to skills/setup-observability/resources/framework-integrations.md
diff --git a/skills/instrument-app/resources/traced-decorator-guide.md b/skills/setup-observability/resources/traced-decorator-guide.md
similarity index 100%
rename from skills/instrument-app/resources/traced-decorator-guide.md
rename to skills/setup-observability/resources/traced-decorator-guide.md

From 96161ba76f494791509ba5a545d57ad3b8712c0a Mon Sep 17 00:00:00 2001
From: currentlycodinng <148545995+currentlycodinng@users.noreply.github.com>
Date: Wed, 25 Mar 2026 12:22:24 +0100
Subject: [PATCH 4/9] fix: correct OpenInference import path and stale heading
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- Fix import: opentelemetry.instrumentation.openai → openinference.instrumentation.openai
- Rename heading from "Instrument App" to "Setup Observability"

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 skills/setup-observability/SKILL.md                        | 4 ++--
 .../resources/framework-integrations.md                    | 2 +-
 tests/skills.md                                            | 7 +++++++
 3 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/skills/setup-observability/SKILL.md b/skills/setup-observability/SKILL.md
index 9578116..60a08fa 100644
--- a/skills/setup-observability/SKILL.md
+++ b/skills/setup-observability/SKILL.md
@@ -4,7 +4,7 @@ description: Set up orq.ai observability for LLM applications. Use when setting
 allowed-tools: Bash, Read, Write, Edit, Grep, Glob, WebFetch, Task, AskUserQuestion, orq*
 ---
 
-# Instrument App
+# Setup Observability
 
 You are an **orq.ai observability engineer**. Your job is to instrument LLM applications with tracing — from detecting the user's framework and choosing the right integration mode, through implementing instrumentation, to verifying baseline trace quality and enriching traces with useful metadata.
 
@@ -156,7 +156,7 @@ Follow these steps **in order**. Do NOT skip steps.
    from opentelemetry.sdk.trace import TracerProvider
    from opentelemetry.sdk.trace.export import BatchSpanProcessor
    from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
-   from opentelemetry.instrumentation.openai import OpenAIInstrumentor
+   from openinference.instrumentation.openai import OpenAIInstrumentor
 
    # Initialize BEFORE creating OpenAI client
    tracer_provider = TracerProvider()
diff --git a/skills/setup-observability/resources/framework-integrations.md b/skills/setup-observability/resources/framework-integrations.md
index f323aa5..b76610c 100644
--- a/skills/setup-observability/resources/framework-integrations.md
+++ b/skills/setup-observability/resources/framework-integrations.md
@@ -92,7 +92,7 @@ from opentelemetry import trace
 from opentelemetry.sdk.trace import TracerProvider
 from opentelemetry.sdk.trace.export import BatchSpanProcessor
 from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
-from opentelemetry.instrumentation.openai import OpenAIInstrumentor
+from openinference.instrumentation.openai import OpenAIInstrumentor
 
 tracer_provider = TracerProvider()
 tracer_provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter()))
diff --git a/tests/skills.md b/tests/skills.md
index 4449b22..3e74459 100644
--- a/tests/skills.md
+++ b/tests/skills.md
@@ -6,6 +6,12 @@ Requires `setup.md` to have run first (seed data for `run-experiment` test).
 
 ---
 
+## `setup-observability`
+
+- Ask: "Help me add orq.ai tracing to my app"
+- Verify: scans project for framework imports and existing instrumentation
+- Verify: recommends integration mode (AI Router vs Observability) based on findings
+
 ## `build-agent`
 
 - Ask: "Build a simple FAQ agent for a pizza restaurant"
@@ -46,6 +52,7 @@ Requires `setup.md` to have run first (seed data for `run-experiment` test).
 
 ## Critical Files
 
+- `skills/setup-observability/SKILL.md`
 - `skills/build-agent/SKILL.md`
 - `skills/build-evaluator/SKILL.md`
 - `skills/generate-synthetic-dataset/SKILL.md`

From df3cdefbe1afe75e4b3bedfe87f1157c1c5b2bbe Mon Sep 17 00:00:00 2001
From: Arian Pasquali <arianpasquali@gmail.com>
Date: Thu, 26 Mar 2026 13:14:43 +0100
Subject: [PATCH 5/9] fix: correct hallucinated code examples in
 setup-observability skill
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- Fix @traced import path: orq_ai_sdk.tracing → orq_ai_sdk.traced (verified against official docs)
- Fix LangChain model format: gpt-4o → openai/gpt-4o (provider/model format)
- Replace hardcoded service.name=my-app with <your-app-name> placeholder
- Soften unsubstantiated "10x more metadata" claim
- Add warning about overwriting existing OTEL config (Datadog, Jaeger, etc.)
- Add auto-formatter guidance (isort/noqa) for critical import ordering

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 skills/setup-observability/SKILL.md                       | 8 +++++---
 .../resources/framework-integrations.md                   | 4 ++--
 .../resources/traced-decorator-guide.md                   | 6 +++---
 3 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/skills/setup-observability/SKILL.md b/skills/setup-observability/SKILL.md
index 60a08fa..71b12df 100644
--- a/skills/setup-observability/SKILL.md
+++ b/skills/setup-observability/SKILL.md
@@ -18,7 +18,7 @@ You are an **orq.ai observability engineer**. Your job is to instrument LLM appl
 - **ALWAYS** prefer AI Router mode when the user's framework supports it — it's the fastest path to traces with zero instrumentation code.
 - **ALWAYS** set `service.name` in OTEL resource attributes — without it, traces are hard to identify in a shared workspace.
 
-**Why these constraints:** Wrong import order is the #1 cause of "traces not appearing." Generic names make traces unfindable at scale. Logging PII creates compliance risk. Framework instrumentors capture 10x more metadata than manual tracing with less code.
+**Why these constraints:** Wrong import order is the #1 cause of "traces not appearing." Generic names make traces unfindable at scale. Logging PII creates compliance risk. Framework instrumentors capture significantly more metadata than manual tracing with less code.
 
 ## Companion Skills
 
@@ -139,11 +139,11 @@ Follow these steps **in order**. Do NOT skip steps.
    For framework-specific setup (LangChain, CrewAI, etc.), refer to the framework's docs page linked in [resources/framework-integrations.md](resources/framework-integrations.md).
 
 6. **For Observability mode:**
-   - Set OTEL environment variables:
+   - Set OTEL environment variables. **Warning:** If the project already has OpenTelemetry configured (e.g., for Datadog, Jaeger, or another backend), check for existing `OTEL_*` env vars or `TracerProvider` setup first — setting these will override that configuration. Confirm with the user before overwriting.
      ```bash
      export OTEL_EXPORTER_OTLP_ENDPOINT="https://api.orq.ai/v2/otel"
      export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer $ORQ_API_KEY"
-     export OTEL_RESOURCE_ATTRIBUTES="service.name=my-app,service.version=1.0.0"
+     export OTEL_RESOURCE_ATTRIBUTES="service.name=<your-app-name>,service.version=1.0.0"
      export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL="http/json"
      ```
    - Install the framework's OpenInference instrumentor package
@@ -165,6 +165,8 @@ Follow these steps **in order**. Do NOT skip steps.
    OpenAIInstrumentor().instrument(tracer_provider=tracer_provider)
    ```
 
+   > **Note:** The import order above is critical — instrumentors must be initialized before framework clients. If the project uses an auto-formatter (isort, Ruff), add `# isort:skip_file` at the top of the file or `# noqa: E402` on late imports to prevent reordering.
+
 7. **For both modes:** Set up AI Router first (step 5), then add Observability (step 6) for framework-level spans on top.
 
 ### Phase 4: Verify Baseline
diff --git a/skills/setup-observability/resources/framework-integrations.md b/skills/setup-observability/resources/framework-integrations.md
index b76610c..f617229 100644
--- a/skills/setup-observability/resources/framework-integrations.md
+++ b/skills/setup-observability/resources/framework-integrations.md
@@ -68,7 +68,7 @@ const client = new OpenAI({
 from langchain_openai import ChatOpenAI
 
 llm = ChatOpenAI(
-    model="gpt-4o",
+    model="openai/gpt-4o",
     api_key=os.getenv("ORQ_API_KEY"),
     base_url="https://api.orq.ai/v2/router",
 )
@@ -82,7 +82,7 @@ All observability integrations use OpenInference instrumentors with OTLP export
 ```bash
 export OTEL_EXPORTER_OTLP_ENDPOINT="https://api.orq.ai/v2/otel"
 export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer $ORQ_API_KEY"
-export OTEL_RESOURCE_ATTRIBUTES="service.name=my-app,service.version=1.0.0"
+export OTEL_RESOURCE_ATTRIBUTES="service.name=<your-app-name>,service.version=1.0.0"
 export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL="http/json"
 ```
 
diff --git a/skills/setup-observability/resources/traced-decorator-guide.md b/skills/setup-observability/resources/traced-decorator-guide.md
index 43a68e8..f68a7b5 100644
--- a/skills/setup-observability/resources/traced-decorator-guide.md
+++ b/skills/setup-observability/resources/traced-decorator-guide.md
@@ -54,7 +54,7 @@ The `@traced` decorator from the orq.ai Python SDK adds custom spans to your tra
 
 ### Sync Function
 ```python
-from orq_ai_sdk.tracing import traced
+from orq_ai_sdk.traced import traced
 
 @traced(name="extract-keywords", type="function")
 def extract_keywords(text: str) -> list[str]:
@@ -64,7 +64,7 @@ def extract_keywords(text: str) -> list[str]:
 
 ### Async Function
 ```python
-from orq_ai_sdk.tracing import traced
+from orq_ai_sdk.traced import traced
 
 @traced(name="fetch-context", type="retrieval")
 async def fetch_context(query: str) -> list[dict]:
@@ -74,7 +74,7 @@ async def fetch_context(query: str) -> list[dict]:
 
 ### Agent Orchestration
 ```python
-from orq_ai_sdk.tracing import traced
+from orq_ai_sdk.traced import traced
 
 @traced(name="support-agent", type="agent")
 def run_support_agent(user_message: str) -> str:

From bbde329255412d16338ae92e9c0dbdd2b355da8c Mon Sep 17 00:00:00 2001
From: Arian Pasquali <arianpasquali@gmail.com>
Date: Thu, 26 Mar 2026 13:15:55 +0100
Subject: [PATCH 6/9] test: add setup-observability smoke tests and resolve
 merge conflict

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 commands/workspace.md                         |   8 ++
 skills/build-agent/resources/api-reference.md |   4 +-
 skills/build-evaluator/SKILL.md               |   7 +-
 .../resources/api-reference.md                |   2 +
 .../run-experiment/resources/api-reference.md |   8 +-
 tests/mcp-tools.md                            |  16 +--
 tests/skills.md                               | 101 +++++++++++++++++-
 7 files changed, 125 insertions(+), 21 deletions(-)

diff --git a/commands/workspace.md b/commands/workspace.md
index ca5dea0..39fcb0a 100644
--- a/commands/workspace.md
+++ b/commands/workspace.md
@@ -20,6 +20,7 @@ Show a quick overview of the user's orq.ai workspace — agents, deployments, pr
 - `experiments` — show only experiments
 - `projects` — show only projects
 - `knowledge` — show only knowledge bases
+- `evaluator` — show only evaluators
 
 If empty, show all sections.
 
@@ -35,6 +36,7 @@ Use the `search_entities` MCP tool and `get_analytics_overview` MCP tool to fetc
 - **Experiments:** `search_entities` with `type: "experiment"`
 - **Projects:** `search_entities` with `type: "project"`
 - **Knowledge:** `search_entities` with `type: "knowledge"`
+- **Evaluator:** `search_entities` with `type: "evaluator"`
 
 Fetch only the sections needed based on arguments. Always fetch analytics overview regardless of section filter.
 
@@ -91,6 +93,12 @@ Manage your workspace at **[Workspace → my.orq.ai](https://my.orq.ai/)**.
 
 - **product-docs** — 120 documents
 - **faq-database** — 45 documents
+
+
+### Evaluators (2)
+
+- **coherence** — active
+- **toxicity** — active
 ```
 
 #### Formatting rules
diff --git a/skills/build-agent/resources/api-reference.md b/skills/build-agent/resources/api-reference.md
index a26e33b..4b25422 100644
--- a/skills/build-agent/resources/api-reference.md
+++ b/skills/build-agent/resources/api-reference.md
@@ -17,10 +17,12 @@ Use the orq MCP server (`https://my.orq.ai/v2/mcp`) as the primary interface. Fo
 | `create_agent` | Create a new agent with configuration |
 | `get_agent` | Get agent details — verify configuration after creation or updates |
 | `update_agent` | Update agent configuration (instructions, model, tools) — iterate without recreating |
-| `search_entities` | Find agents, knowledge bases (`type: "knowledge"`), memory stores (`type: "memory_store"`) |
+| `search_entities` | Find agents, knowledge bases (`type: "knowledge"`), memory stores (`type: "memory_store"`), evaluators (`type: "evaluator"`) |
 | `search_directories` | Discover workspace project structure and paths — useful for KB `path` selection |
 | `list_models` | List available models for agent configuration |
 | `create_llm_eval` | Create evaluators for quality comparison |
+| `get_evaluator_llm` | Retrieve an LLM evaluator by key or ID |
+| `get_evaluator_python` | Retrieve a Python evaluator by key or ID |
 | `list_traces` | Inspect traces for latency/cost data |
 
 ## HTTP API
diff --git a/skills/build-evaluator/SKILL.md b/skills/build-evaluator/SKILL.md
index 182850b..cc662e2 100644
--- a/skills/build-evaluator/SKILL.md
+++ b/skills/build-evaluator/SKILL.md
@@ -81,6 +81,8 @@ Use the orq MCP server (`https://my.orq.ai/v2/mcp`) as the primary interface. Fo
 |------|---------|
 | `create_llm_eval` | Create an LLM evaluator with your judge prompt |
 | `create_python_eval` | Create a Python evaluator for code-based checks |
+| `get_evaluator_llm` | Retrieve an LLM evaluator by key or ID (not supported for jury-mode evaluators) |
+| `get_evaluator_python` | Retrieve a Python evaluator by key or ID |
 | `list_models` | List available judge models |
 
 **HTTP API fallback** (for operations not yet in MCP):
@@ -92,11 +94,6 @@ curl -s https://my.orq.ai/v2/evaluators \
   -H "Authorization: Bearer $ORQ_API_KEY" \
   -H "Content-Type: application/json" | jq
 
-# Get evaluator details
-curl -s https://my.orq.ai/v2/evaluators/<ID> \
-  -H "Authorization: Bearer $ORQ_API_KEY" \
-  -H "Content-Type: application/json" | jq
-
 # Test-invoke an evaluator against a sample output
 curl -s https://my.orq.ai/v2/evaluators/<ID>/invoke \
   -H "Authorization: Bearer $ORQ_API_KEY" \
diff --git a/skills/generate-synthetic-dataset/resources/api-reference.md b/skills/generate-synthetic-dataset/resources/api-reference.md
index 522f828..32f66f0 100644
--- a/skills/generate-synthetic-dataset/resources/api-reference.md
+++ b/skills/generate-synthetic-dataset/resources/api-reference.md
@@ -21,6 +21,8 @@ Use the orq MCP server (`https://my.orq.ai/v2/mcp`) as the primary interface. Fo
 | `search_entities` | Find existing datasets (`type: "dataset"`) |
 | `update_datapoint` | Modify existing datapoints (curation) |
 | `delete_datapoints` | Remove datapoints from a dataset (curation) |
+| `get_evaluator_llm` | Retrieve an LLM evaluator to understand dataset requirements |
+| `get_evaluator_python` | Retrieve a Python evaluator to understand dataset requirements |
 
 ## HTTP API
 
diff --git a/skills/run-experiment/resources/api-reference.md b/skills/run-experiment/resources/api-reference.md
index b560f3a..8e32d6f 100644
--- a/skills/run-experiment/resources/api-reference.md
+++ b/skills/run-experiment/resources/api-reference.md
@@ -15,6 +15,9 @@ Use the orq MCP server (`https://my.orq.ai/v2/mcp`) as the primary interface. Fo
 | Tool | Purpose |
 |------|---------|
 | `create_llm_eval` | Create an LLM evaluator |
+| `create_python_eval` | Create a Python evaluator for code-based checks |
+| `get_evaluator_llm` | Retrieve an LLM evaluator by key or ID |
+| `get_evaluator_python` | Retrieve a Python evaluator by key or ID |
 | `list_traces` | List and filter traces for error analysis |
 | `list_spans` | List spans within a trace |
 | `get_span` | Get detailed span information |
@@ -39,11 +42,6 @@ curl -s https://my.orq.ai/v2/evaluators \
   -H "Authorization: Bearer $ORQ_API_KEY" \
   -H "Content-Type: application/json" | jq
 
-# Get evaluator details
-curl -s https://my.orq.ai/v2/evaluators/<ID> \
-  -H "Authorization: Bearer $ORQ_API_KEY" \
-  -H "Content-Type: application/json" | jq
-
 # Invoke an evaluator
 curl -s https://my.orq.ai/v2/evaluators/<ID>/invoke \
   -H "Authorization: Bearer $ORQ_API_KEY" \
diff --git a/tests/mcp-tools.md b/tests/mcp-tools.md
index fbd6dca..9c7e172 100644
--- a/tests/mcp-tools.md
+++ b/tests/mcp-tools.md
@@ -8,7 +8,7 @@ Tests the orq.ai MCP server tools directly. Requires `setup.md` to have run firs
 
 ## Read-only tools (safe, no cleanup needed)
 
-1. `search_entities` — all 8 types (agent, dataset, prompt, experiment, knowledge, memory_store, deployment, project)
+1. `search_entities` — all 9 types (agent, dataset, prompt, experiment, knowledge, memory_store, deployment, project, evaluator)
 2. `search_directories` — list project dirs
 3. `list_models(modelType=chat)` → verify non-empty
 4. `list_registry_keys` → verify returns array
@@ -27,21 +27,23 @@ Tests the orq.ai MCP server tools directly. Requires `setup.md` to have run firs
 14. `delete_datapoints` → delete 1, verify 2 remain
 15. `delete_dataset` → delete `orq-skills-test-crud-dataset` (only this test resource)
 
-## Evaluator creation *(manual cleanup required — no MCP delete tool)*
+## Evaluator tools *(manual cleanup required — no MCP delete tool)*
 
 16. `create_llm_eval` → key: `orq-skills-test-llm-eval`, with simple judge prompt
 17. `create_python_eval` → key: `orq-skills-test-py-eval`
+18. `get_evaluator_llm(key=orq-skills-test-llm-eval)` → verify returns prompt and model
+19. `get_evaluator_python(key=orq-skills-test-py-eval)` → verify returns code
 
 ## Agent tools
 
-18. `get_agent(key=orq-skills-test-echo)` → verify config matches what we created
-19. `create_agent` → key: `orq-skills-test-crud-agent` *(manual cleanup required — no MCP delete tool)*
-20. `update_agent(key=orq-skills-test-crud-agent)` → update instructions (only our test agent)
+20. `get_agent(key=orq-skills-test-echo)` → verify config matches what we created
+21. `create_agent` → key: `orq-skills-test-crud-agent` *(manual cleanup required — no MCP delete tool)*
+22. `update_agent(key=orq-skills-test-crud-agent)` → update instructions (only our test agent)
 
 ## Experiment tools
 
-21. `create_experiment` → key: `orq-skills-test-experiment` with seeded dataset + evaluator *(manual cleanup required — no MCP delete tool)*
-22. `list_experiment_runs`
+23. `create_experiment` → key: `orq-skills-test-experiment` with seeded dataset + evaluator *(manual cleanup required — no MCP delete tool)*
+24. `list_experiment_runs`
 
 ---
 
diff --git a/tests/skills.md b/tests/skills.md
index 3e74459..33a15ba 100644
--- a/tests/skills.md
+++ b/tests/skills.md
@@ -8,9 +8,50 @@ Requires `setup.md` to have run first (seed data for `run-experiment` test).
 
 ## `setup-observability`
 
-- Ask: "Help me add orq.ai tracing to my app"
-- Verify: scans project for framework imports and existing instrumentation
-- Verify: recommends integration mode (AI Router vs Observability) based on findings
+### Scenario 1: Python OpenAI app — AI Router path
+
+- Provide: a small Python file using `openai.OpenAI()` with no existing tracing
+- Ask: "Add orq.ai tracing to my app"
+- Verify Phase 1: scans the project, identifies OpenAI SDK, reports no existing tracing
+- Verify Phase 2: recommends **AI Router** mode (framework supports it, fastest path)
+- Verify Phase 3: changes `base_url` to `https://api.orq.ai/v2/router`, uses `provider/model` format (e.g., `openai/gpt-4o`)
+- Verify: does NOT use `from orq_ai_sdk.tracing import traced` (wrong import path)
+- Verify: does NOT hardcode `service.name=my-app`
+
+### Scenario 2: LangChain app — Observability path
+
+- Provide: a Python file using `langchain_openai.ChatOpenAI()` calling a provider directly
+- Ask: "I want to add tracing but keep my existing LLM calls"
+- Verify Phase 2: recommends **Observability** mode (user wants to keep existing calls)
+- Verify Phase 3: sets OTEL env vars, installs OpenInference instrumentor
+- Verify: instrumentor is initialized BEFORE framework client creation
+- Verify: warns about existing OTEL config if any `OTEL_*` vars already exist
+
+### Scenario 3: Verify code correctness
+
+- Ask: "Show me how to use the @traced decorator"
+- Verify: import path is `from orq_ai_sdk.traced import traced` or `from orq_ai_sdk import traced`
+- Verify: parameters shown are `name`, `type`, `capture_input`, `capture_output`, `attributes`
+- Verify: does NOT show `user_id` as a direct `@traced` parameter (should be in `attributes={}`)
+- Verify: does NOT use `orq_traced_input()` or `orq_traced_output()` (these don't exist)
+- Verify: `capture_input` / `capture_output` defaults documented as `True`
+
+### Scenario 4: Sensitive data handling
+
+- Provide: a Python function that takes `card_number` and `user_email` as arguments
+- Ask: "Add tracing to this function"
+- Verify: uses `capture_input=False` and/or `capture_output=False`
+- Verify: explains that defaults are `True` (all inputs/outputs sent to orq.ai unless disabled)
+
+### Scenario 5: Existing OTEL configuration
+
+- Provide: a project with existing `OTEL_EXPORTER_OTLP_ENDPOINT` pointing to Datadog
+- Ask: "Add orq.ai observability"
+- Verify: detects existing OTEL configuration in Phase 1
+- Verify: warns about overwriting before setting new env vars
+- Verify: asks user for confirmation before proceeding
+
+---
 
 ## `build-agent`
 
@@ -48,14 +89,68 @@ Requires `setup.md` to have run first (seed data for `run-experiment` test).
 - Ask: "Run an experiment using orq-skills-test-dataset with orq-skills-test-eval-length"
 - Verify: calls `create_experiment` with correct references
 
+## `compare-agents`
+
+### Scenario 1: orq.ai vs external agent (Python)
+
+- Ask: "Compare my orq.ai agent orq-skills-test-echo against a simple Python function that reverses the input"
+- Verify Phase 1: identifies two agents — orq.ai (uses `search_entities` to find `orq-skills-test-echo`) and generic Python
+- Verify Phase 1: asks or confirms language preference (Python)
+- Verify Phase 2: delegates to `generate-synthetic-dataset` or creates dataset via `create_dataset` + `create_datapoints` with `orq-skills-test-` prefix
+- Verify Phase 3: delegates to `build-evaluator` or creates evaluator via `create_llm_eval`
+- Verify Phase 4: generates a Python script with:
+  - `from evaluatorq import evaluatorq, job, DataPoint, EvaluationResult`
+  - One `@job("OrqAgent")` using `orq.agents.responses.create()` (NOT `agents.invoke()`)
+  - One `@job("ReverseAgent")` wrapping the Python function
+  - An evaluator scorer invoking the orq.ai judge by ID
+  - A `evaluatorq()` call wiring jobs + data + evaluators
+- Verify: script uses A2A message format `{"role": "user", "parts": [{"kind": "text", "text": ...}]}` (NOT OpenAI-style)
+- Verify: does NOT hardcode datapoints inline if a dataset was created on the platform
+
+### Scenario 2: orq.ai vs orq.ai
+
+- Ask: "Compare two versions of my agent — orq-skills-test-echo with model gpt-4o-mini vs the same agent"
+- Verify: generates two orq.ai job patterns with different job names (e.g., `OrqAgent-A`, `OrqAgent-B`)
+- Verify: uses the same `agent_key` for both (since it's the same agent)
+- Verify: warns about same-model comparison if both use the same model
+
+### Scenario 3: TypeScript preference
+
+- Ask: "I want to benchmark a LangGraph agent against my orq.ai agent, using TypeScript"
+- Verify Phase 4: generates TypeScript, not Python
+- Verify: imports from `@orq-ai/evaluatorq`
+- Verify: uses `wrapLangGraphAgent` from `@orq-ai/evaluatorq/langchain` for the LangGraph job
+- Verify: uses `job()` function (not `@job` decorator)
+
+### Scenario 4: Skill boundary — redirects
+
+- Ask: "Create a dataset for testing my agents"
+- Verify: redirects to `generate-synthetic-dataset` (does NOT handle dataset creation itself)
+- Ask: "Run an experiment with my orq.ai deployment"
+- Verify: redirects to `run-experiment` (no external agents involved)
+
+### Scenario 5: Dataset bias prevention
+
+- Provide: two agents — one with a mock weather tool returning "Sunny, 22C", one with a real API
+- Ask: "Compare these agents on weather queries"
+- Verify: does NOT write expected outputs matching the mock data
+- Verify: expected outputs describe correctness criteria (e.g., "should include current temperature from a real source")
+
 ---
 
 ## Critical Files
 
 - `skills/setup-observability/SKILL.md`
+- `skills/setup-observability/resources/traced-decorator-guide.md`
+- `skills/setup-observability/resources/framework-integrations.md`
+- `skills/setup-observability/resources/baseline-checklist.md`
 - `skills/build-agent/SKILL.md`
 - `skills/build-evaluator/SKILL.md`
 - `skills/generate-synthetic-dataset/SKILL.md`
 - `skills/optimize-prompt/SKILL.md`
 - `skills/analyze-trace-failures/SKILL.md`
 - `skills/run-experiment/SKILL.md`
+- `skills/compare-agents/SKILL.md`
+- `skills/compare-agents/resources/job-patterns.md`
+- `skills/compare-agents/resources/evaluatorq-api.md`
+- `skills/compare-agents/resources/gotchas.md`

From 22507aa7d8b4fd24a9a26071dd2385de56559142 Mon Sep 17 00:00:00 2001
From: Arian Pasquali <arianpasquali@gmail.com>
Date: Thu, 2 Apr 2026 11:49:06 +0200
Subject: [PATCH 7/9] =?UTF-8?q?fix:=20address=20PR=20review=20feedback=20?=
 =?UTF-8?q?=E2=80=94=20correct=20MCP=20tool=20names,=20add=20missing=20con?=
 =?UTF-8?q?text?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- Replace non-existent `get_evaluator_llm`/`get_evaluator_python` with `evaluator_get` across 4 skills
- Add SDK init prerequisite to @traced guide (silent failure without Orq client)
- Document capture_input/capture_output defaults as True (PII risk)
- Add missing `import os` to framework-integrations code snippets
- Explain Control Tower column in framework integrations table
- Scope @traced and OTEL examples as Python-only, add Node.js pointers

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 skills/build-agent/resources/api-reference.md |  3 +--
 skills/build-evaluator/SKILL.md               |  3 +--
 .../resources/api-reference.md                |  3 +--
 .../run-experiment/resources/api-reference.md |  3 +--
 skills/setup-observability/SKILL.md           |  4 ++--
 .../resources/framework-integrations.md       |  5 ++++
 .../resources/traced-decorator-guide.md       | 24 ++++++++++++++++---
 7 files changed, 32 insertions(+), 13 deletions(-)

diff --git a/skills/build-agent/resources/api-reference.md b/skills/build-agent/resources/api-reference.md
index 4b25422..e22a2d2 100644
--- a/skills/build-agent/resources/api-reference.md
+++ b/skills/build-agent/resources/api-reference.md
@@ -21,8 +21,7 @@ Use the orq MCP server (`https://my.orq.ai/v2/mcp`) as the primary interface. Fo
 | `search_directories` | Discover workspace project structure and paths — useful for KB `path` selection |
 | `list_models` | List available models for agent configuration |
 | `create_llm_eval` | Create evaluators for quality comparison |
-| `get_evaluator_llm` | Retrieve an LLM evaluator by key or ID |
-| `get_evaluator_python` | Retrieve a Python evaluator by key or ID |
+| `evaluator_get` | Retrieve any evaluator by ID |
 | `list_traces` | Inspect traces for latency/cost data |
 
 ## HTTP API
diff --git a/skills/build-evaluator/SKILL.md b/skills/build-evaluator/SKILL.md
index cc662e2..2674c90 100644
--- a/skills/build-evaluator/SKILL.md
+++ b/skills/build-evaluator/SKILL.md
@@ -81,8 +81,7 @@ Use the orq MCP server (`https://my.orq.ai/v2/mcp`) as the primary interface. Fo
 |------|---------|
 | `create_llm_eval` | Create an LLM evaluator with your judge prompt |
 | `create_python_eval` | Create a Python evaluator for code-based checks |
-| `get_evaluator_llm` | Retrieve an LLM evaluator by key or ID (not supported for jury-mode evaluators) |
-| `get_evaluator_python` | Retrieve a Python evaluator by key or ID |
+| `evaluator_get` | Retrieve any evaluator by ID |
 | `list_models` | List available judge models |
 
 **HTTP API fallback** (for operations not yet in MCP):
diff --git a/skills/generate-synthetic-dataset/resources/api-reference.md b/skills/generate-synthetic-dataset/resources/api-reference.md
index 32f66f0..d706e3e 100644
--- a/skills/generate-synthetic-dataset/resources/api-reference.md
+++ b/skills/generate-synthetic-dataset/resources/api-reference.md
@@ -21,8 +21,7 @@ Use the orq MCP server (`https://my.orq.ai/v2/mcp`) as the primary interface. Fo
 | `search_entities` | Find existing datasets (`type: "dataset"`) |
 | `update_datapoint` | Modify existing datapoints (curation) |
 | `delete_datapoints` | Remove datapoints from a dataset (curation) |
-| `get_evaluator_llm` | Retrieve an LLM evaluator to understand dataset requirements |
-| `get_evaluator_python` | Retrieve a Python evaluator to understand dataset requirements |
+| `evaluator_get` | Retrieve any evaluator by ID to understand dataset requirements |
 
 ## HTTP API
 
diff --git a/skills/run-experiment/resources/api-reference.md b/skills/run-experiment/resources/api-reference.md
index 8e32d6f..412358d 100644
--- a/skills/run-experiment/resources/api-reference.md
+++ b/skills/run-experiment/resources/api-reference.md
@@ -16,8 +16,7 @@ Use the orq MCP server (`https://my.orq.ai/v2/mcp`) as the primary interface. Fo
 |------|---------|
 | `create_llm_eval` | Create an LLM evaluator |
 | `create_python_eval` | Create a Python evaluator for code-based checks |
-| `get_evaluator_llm` | Retrieve an LLM evaluator by key or ID |
-| `get_evaluator_python` | Retrieve a Python evaluator by key or ID |
+| `evaluator_get` | Retrieve any evaluator by ID |
 | `list_traces` | List and filter traces for error analysis |
 | `list_spans` | List spans within a trace |
 | `get_span` | Get detailed span information |
diff --git a/skills/setup-observability/SKILL.md b/skills/setup-observability/SKILL.md
index 71b12df..33be0de 100644
--- a/skills/setup-observability/SKILL.md
+++ b/skills/setup-observability/SKILL.md
@@ -150,7 +150,7 @@ Follow these steps **in order**. Do NOT skip steps.
    - Initialize the instrumentor BEFORE creating SDK clients
    - Refer to the framework's docs page for the exact instrumentor and setup
 
-   **Python (OpenAI example):**
+   **Python (OpenAI example):** *(Node.js uses `@opentelemetry/sdk-node` — see [Integration Overview](https://docs.orq.ai/docs/integrations/overview) for Node.js setup)*
    ```python
    from opentelemetry import trace
    from opentelemetry.sdk.trace import TracerProvider
@@ -205,7 +205,7 @@ Follow these steps **in order**. Do NOT skip steps.
     | Customer/tenant identifiers | `customer_id` or tier tag |
     | Feedback collection, ratings | Score annotations |
 
-14. **Add `@traced` for custom spans** where the user has application logic not captured by framework instrumentors. See [resources/traced-decorator-guide.md](resources/traced-decorator-guide.md) for the full reference.
+14. **Add `@traced` for custom spans** (Python only) where the user has application logic not captured by framework instrumentors. For Node.js, use OpenTelemetry span APIs directly. See [resources/traced-decorator-guide.md](resources/traced-decorator-guide.md) for the full Python reference.
 
     Priority targets for `@traced`:
     - The top-level orchestration function (type: `agent`)
diff --git a/skills/setup-observability/resources/framework-integrations.md b/skills/setup-observability/resources/framework-integrations.md
index f617229..99358be 100644
--- a/skills/setup-observability/resources/framework-integrations.md
+++ b/skills/setup-observability/resources/framework-integrations.md
@@ -7,6 +7,7 @@
 | **AI Router** | Route LLM calls through `https://api.orq.ai/v2/router` — traces generated automatically | You want multi-provider access, fallbacks, caching, cost tracking with zero instrumentation code |
 | **Observability** | Send OpenTelemetry traces from your existing setup to `https://api.orq.ai/v2/otel` | You already call providers directly and want to add tracing without changing your LLM calls |
 | **Both** | AI Router for routing + Observability for framework-level spans | You want full pipeline visibility: framework orchestration spans + LLM call traces |
+| **Control Tower** | Full agent lifecycle management — deploy, monitor, and control agents from the orq.ai dashboard | Framework has native orq.ai integration for agent orchestration (currently: LangGraph, OpenAI Agents, Vercel AI SDK) |
 
 **Rule of thumb:** If the user's framework is in the AI Router column, start there — it's the fastest path to traces. Add Observability on top only if they need framework-level span detail (agent steps, tool calls, chain execution).
 
@@ -46,6 +47,7 @@ All AI Router integrations follow the same pattern — point your SDK's base URL
 **Python (OpenAI SDK):**
 ```python
 from openai import OpenAI
+import os
 
 client = OpenAI(
     base_url="https://api.orq.ai/v2/router",
@@ -66,6 +68,7 @@ const client = new OpenAI({
 **LangChain:**
 ```python
 from langchain_openai import ChatOpenAI
+import os
 
 llm = ChatOpenAI(
     model="openai/gpt-4o",
@@ -102,3 +105,5 @@ OpenAIInstrumentor().instrument(tracer_provider=tracer_provider)
 ```
 
 **Key:** Each framework has its own OpenInference instrumentor package. See the framework-specific docs page for the exact package name and import.
+
+> **Node.js/TypeScript:** The Observability examples above are Python-only. For Node.js OTEL setup, use `@opentelemetry/sdk-node` with `@opentelemetry/exporter-trace-otlp-http` and framework-specific instrumentors from the `@opentelemetry/instrumentation-*` namespace (not OpenInference). See the [orq.ai Integration Overview](https://docs.orq.ai/docs/integrations/overview) for Node.js setup.
diff --git a/skills/setup-observability/resources/traced-decorator-guide.md b/skills/setup-observability/resources/traced-decorator-guide.md
index f68a7b5..6559036 100644
--- a/skills/setup-observability/resources/traced-decorator-guide.md
+++ b/skills/setup-observability/resources/traced-decorator-guide.md
@@ -1,9 +1,27 @@
 # The `@traced` Decorator
 
-The `@traced` decorator from the orq.ai Python SDK adds custom spans to your traces for application logic that isn't automatically captured by framework instrumentors.
+The `@traced` decorator from the orq.ai **Python** SDK adds custom spans to your traces for application logic that isn't automatically captured by framework instrumentors. Node.js/TypeScript does not have a `@traced` equivalent — use OpenTelemetry span APIs directly for custom spans in Node.js.
 
 **Docs:** [Custom Tracing using the @traced decorator](https://docs.orq.ai/docs/observability/traces#custom-tracing-using-the-@traced-decorator)
 
+## Prerequisites
+
+The orq.ai SDK client must be initialized before `@traced` will export spans:
+
+```python
+from orq_ai_sdk import Orq
+from orq_ai_sdk.traced import traced
+import os
+
+client = Orq(api_key=os.getenv("ORQ_API_KEY"))
+
+@traced(name="my-operation", type="function")
+def my_function():
+    ...
+```
+
+Without initializing `Orq(api_key=...)`, the `@traced` decorator will silently do nothing — no error, but no spans exported.
+
 ## When to Use
 
 | Scenario | Use `@traced` | Use Framework Instrumentor |
@@ -46,8 +64,8 @@ The `@traced` decorator from the orq.ai Python SDK adds custom spans to your tra
 |-----------|---------|-------|
 | `name` | function name | Use descriptive names: `"fetch-user-context"` not `"step1"` |
 | `type` | `"function"` | Pick the semantic type that matches the operation |
-| `capture_input` | `True` | Set `False` if inputs contain PII or secrets |
-| `capture_output` | `True` | Set `False` if outputs contain sensitive data |
+| `capture_input` | `True` | **Default captures all function args.** Set `False` if inputs contain PII or secrets |
+| `capture_output` | `True` | **Default captures all return values.** Set `False` if outputs contain sensitive data |
 | `attributes` | `{}` | Add searchable metadata: user tier, feature name, etc. |
 
 ## Examples

From 85c4cb0e15b1cd2021c68f09fe25d34e8ed28577 Mon Sep 17 00:00:00 2001
From: Arian Pasquali <arianpasquali@gmail.com>
Date: Tue, 7 Apr 2026 11:13:05 +0200
Subject: [PATCH 8/9] fix: address remaining PR review items

- Replace non-existent get_evaluator_llm/get_evaluator_python with evaluator_get in mcp-tools tests
- Remove compare-agents test scenarios (should ship with compare-agents PR, not this one)
- Remove compare-agents from Critical Files list

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 tests/mcp-tools.md |  4 ++--
 tests/skills.md    | 51 ----------------------------------------------
 2 files changed, 2 insertions(+), 53 deletions(-)

diff --git a/tests/mcp-tools.md b/tests/mcp-tools.md
index 9c7e172..a7fe1b6 100644
--- a/tests/mcp-tools.md
+++ b/tests/mcp-tools.md
@@ -31,8 +31,8 @@ Tests the orq.ai MCP server tools directly. Requires `setup.md` to have run firs
 
 16. `create_llm_eval` → key: `orq-skills-test-llm-eval`, with simple judge prompt
 17. `create_python_eval` → key: `orq-skills-test-py-eval`
-18. `get_evaluator_llm(key=orq-skills-test-llm-eval)` → verify returns prompt and model
-19. `get_evaluator_python(key=orq-skills-test-py-eval)` → verify returns code
+18. `evaluator_get(id=<llm-eval-id>)` → verify returns prompt and model
+19. `evaluator_get(id=<py-eval-id>)` → verify returns code
 
 ## Agent tools
 
diff --git a/tests/skills.md b/tests/skills.md
index 33a15ba..269e1b5 100644
--- a/tests/skills.md
+++ b/tests/skills.md
@@ -89,53 +89,6 @@ Requires `setup.md` to have run first (seed data for `run-experiment` test).
 - Ask: "Run an experiment using orq-skills-test-dataset with orq-skills-test-eval-length"
 - Verify: calls `create_experiment` with correct references
 
-## `compare-agents`
-
-### Scenario 1: orq.ai vs external agent (Python)
-
-- Ask: "Compare my orq.ai agent orq-skills-test-echo against a simple Python function that reverses the input"
-- Verify Phase 1: identifies two agents — orq.ai (uses `search_entities` to find `orq-skills-test-echo`) and generic Python
-- Verify Phase 1: asks or confirms language preference (Python)
-- Verify Phase 2: delegates to `generate-synthetic-dataset` or creates dataset via `create_dataset` + `create_datapoints` with `orq-skills-test-` prefix
-- Verify Phase 3: delegates to `build-evaluator` or creates evaluator via `create_llm_eval`
-- Verify Phase 4: generates a Python script with:
-  - `from evaluatorq import evaluatorq, job, DataPoint, EvaluationResult`
-  - One `@job("OrqAgent")` using `orq.agents.responses.create()` (NOT `agents.invoke()`)
-  - One `@job("ReverseAgent")` wrapping the Python function
-  - An evaluator scorer invoking the orq.ai judge by ID
-  - A `evaluatorq()` call wiring jobs + data + evaluators
-- Verify: script uses A2A message format `{"role": "user", "parts": [{"kind": "text", "text": ...}]}` (NOT OpenAI-style)
-- Verify: does NOT hardcode datapoints inline if a dataset was created on the platform
-
-### Scenario 2: orq.ai vs orq.ai
-
-- Ask: "Compare two versions of my agent — orq-skills-test-echo with model gpt-4o-mini vs the same agent"
-- Verify: generates two orq.ai job patterns with different job names (e.g., `OrqAgent-A`, `OrqAgent-B`)
-- Verify: uses the same `agent_key` for both (since it's the same agent)
-- Verify: warns about same-model comparison if both use the same model
-
-### Scenario 3: TypeScript preference
-
-- Ask: "I want to benchmark a LangGraph agent against my orq.ai agent, using TypeScript"
-- Verify Phase 4: generates TypeScript, not Python
-- Verify: imports from `@orq-ai/evaluatorq`
-- Verify: uses `wrapLangGraphAgent` from `@orq-ai/evaluatorq/langchain` for the LangGraph job
-- Verify: uses `job()` function (not `@job` decorator)
-
-### Scenario 4: Skill boundary — redirects
-
-- Ask: "Create a dataset for testing my agents"
-- Verify: redirects to `generate-synthetic-dataset` (does NOT handle dataset creation itself)
-- Ask: "Run an experiment with my orq.ai deployment"
-- Verify: redirects to `run-experiment` (no external agents involved)
-
-### Scenario 5: Dataset bias prevention
-
-- Provide: two agents — one with a mock weather tool returning "Sunny, 22C", one with a real API
-- Ask: "Compare these agents on weather queries"
-- Verify: does NOT write expected outputs matching the mock data
-- Verify: expected outputs describe correctness criteria (e.g., "should include current temperature from a real source")
-
 ---
 
 ## Critical Files
@@ -150,7 +103,3 @@ Requires `setup.md` to have run first (seed data for `run-experiment` test).
 - `skills/optimize-prompt/SKILL.md`
 - `skills/analyze-trace-failures/SKILL.md`
 - `skills/run-experiment/SKILL.md`
-- `skills/compare-agents/SKILL.md`
-- `skills/compare-agents/resources/job-patterns.md`
-- `skills/compare-agents/resources/evaluatorq-api.md`
-- `skills/compare-agents/resources/gotchas.md`

From 9c615d8b12bbd9f91972e431e0590c4b030ceee0 Mon Sep 17 00:00:00 2001
From: Arian Pasquali <arianpasquali@gmail.com>
Date: Tue, 7 Apr 2026 11:18:23 +0200
Subject: [PATCH 9/9] fix: remove compare-agents tests re-introduced by merge

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 tests/skills.md | 51 -------------------------------------------------
 1 file changed, 51 deletions(-)

diff --git a/tests/skills.md b/tests/skills.md
index 33a15ba..269e1b5 100644
--- a/tests/skills.md
+++ b/tests/skills.md
@@ -89,53 +89,6 @@ Requires `setup.md` to have run first (seed data for `run-experiment` test).
 - Ask: "Run an experiment using orq-skills-test-dataset with orq-skills-test-eval-length"
 - Verify: calls `create_experiment` with correct references
 
-## `compare-agents`
-
-### Scenario 1: orq.ai vs external agent (Python)
-
-- Ask: "Compare my orq.ai agent orq-skills-test-echo against a simple Python function that reverses the input"
-- Verify Phase 1: identifies two agents — orq.ai (uses `search_entities` to find `orq-skills-test-echo`) and generic Python
-- Verify Phase 1: asks or confirms language preference (Python)
-- Verify Phase 2: delegates to `generate-synthetic-dataset` or creates dataset via `create_dataset` + `create_datapoints` with `orq-skills-test-` prefix
-- Verify Phase 3: delegates to `build-evaluator` or creates evaluator via `create_llm_eval`
-- Verify Phase 4: generates a Python script with:
-  - `from evaluatorq import evaluatorq, job, DataPoint, EvaluationResult`
-  - One `@job("OrqAgent")` using `orq.agents.responses.create()` (NOT `agents.invoke()`)
-  - One `@job("ReverseAgent")` wrapping the Python function
-  - An evaluator scorer invoking the orq.ai judge by ID
-  - A `evaluatorq()` call wiring jobs + data + evaluators
-- Verify: script uses A2A message format `{"role": "user", "parts": [{"kind": "text", "text": ...}]}` (NOT OpenAI-style)
-- Verify: does NOT hardcode datapoints inline if a dataset was created on the platform
-
-### Scenario 2: orq.ai vs orq.ai
-
-- Ask: "Compare two versions of my agent — orq-skills-test-echo with model gpt-4o-mini vs the same agent"
-- Verify: generates two orq.ai job patterns with different job names (e.g., `OrqAgent-A`, `OrqAgent-B`)
-- Verify: uses the same `agent_key` for both (since it's the same agent)
-- Verify: warns about same-model comparison if both use the same model
-
-### Scenario 3: TypeScript preference
-
-- Ask: "I want to benchmark a LangGraph agent against my orq.ai agent, using TypeScript"
-- Verify Phase 4: generates TypeScript, not Python
-- Verify: imports from `@orq-ai/evaluatorq`
-- Verify: uses `wrapLangGraphAgent` from `@orq-ai/evaluatorq/langchain` for the LangGraph job
-- Verify: uses `job()` function (not `@job` decorator)
-
-### Scenario 4: Skill boundary — redirects
-
-- Ask: "Create a dataset for testing my agents"
-- Verify: redirects to `generate-synthetic-dataset` (does NOT handle dataset creation itself)
-- Ask: "Run an experiment with my orq.ai deployment"
-- Verify: redirects to `run-experiment` (no external agents involved)
-
-### Scenario 5: Dataset bias prevention
-
-- Provide: two agents — one with a mock weather tool returning "Sunny, 22C", one with a real API
-- Ask: "Compare these agents on weather queries"
-- Verify: does NOT write expected outputs matching the mock data
-- Verify: expected outputs describe correctness criteria (e.g., "should include current temperature from a real source")
-
 ---
 
 ## Critical Files
@@ -150,7 +103,3 @@ Requires `setup.md` to have run first (seed data for `run-experiment` test).
 - `skills/optimize-prompt/SKILL.md`
 - `skills/analyze-trace-failures/SKILL.md`
 - `skills/run-experiment/SKILL.md`
-- `skills/compare-agents/SKILL.md`
-- `skills/compare-agents/resources/job-patterns.md`
-- `skills/compare-agents/resources/evaluatorq-api.md`
-- `skills/compare-agents/resources/gotchas.md`