orq-ai · Baukebrenninkmeijer · Apr 7, 2026 · Mar 24, 2026 · Mar 24, 2026 · Mar 24, 2026
diff --git a/README.md b/README.md
@@ -10,6 +10,7 @@ Each skill encodes best practices from prompt engineering, agent design, evaluat
 
 Built on the [Agent Skills](https://agentskills.io/home#adoption) standard format, so it works with any compatible agent (Claude Code, Cursor, Gemini CLI, and others).
 
+
 ## Setup
 
 ### Prerequisites
@@ -52,7 +53,6 @@ claude --plugin-dir .
 
 > **Note:** Commands (`/orq:quickstart`, `/orq:workspace`, etc.) and agents are only available when installed as a Claude Code plugin.
 
-
 ### Verify
 
 Run the interactive onboarding to confirm everything works:
@@ -93,6 +93,7 @@ Skills are triggered by describing what you need. Claude picks the right skill a
 <!-- BEGIN_SKILLS_TABLE -->
 | Skill | What It Does | Documentation |
 |-------|-------------|---------------|
+| **setup-observability** | Set up orq.ai observability for existing LLM applications — AI Router proxy, OpenTelemetry, `@traced` decorator, and trace enrichment | [SKILL.md](skills/setup-observability/SKILL.md) |
 | **build-agent** | Design, create, and configure an orq.ai Agent with tools, instructions, knowledge bases, and memory | [SKILL.md](skills/build-agent/SKILL.md) |
 | **build-evaluator** | Create validated LLM-as-a-Judge evaluators following evaluation best practices | [SKILL.md](skills/build-evaluator/SKILL.md) |
 | **analyze-trace-failures** | Read production traces, identify what's failing, build failure taxonomies, and categorize issues | [SKILL.md](skills/analyze-trace-failures/SKILL.md) |
@@ -105,7 +106,15 @@ Skills are triggered by describing what you need. Claude picks the right skill a
 
 ## Workflows
 
-### 1. Build a New Agent
+### 1. Instrument an Existing App
+
+```
+"Add orq.ai tracing to my app"               → setup-observability
+/orq:traces --last 1h                          # Verify traces are flowing
+"Analyze these traces for failures"            → analyze-trace-failures
+```
+
+### 2. Build a New Agent
 
 ```
 "I need a customer support agent"             → build-agent
@@ -114,7 +123,7 @@ Skills are triggered by describing what you need. Claude picks the right skill a
 "Run an experiment to get a baseline"          → run-experiment
 ```
 
-### 2. Debug Production Issues
+### 3. Debug Production Issues
 
 ```
 /orq:traces --status error --last 24h          # Find errors
@@ -123,7 +132,7 @@ Skills are triggered by describing what you need. Claude picks the right skill a
 "Re-run the experiment to verify the fix"      → run-experiment
 ```
 
-### 3. Improve an Existing Agent
+### 4. Improve an Existing Agent
 
 ```
 /orq:analytics --group-by deployment           # Spot high error rates
@@ -134,7 +143,7 @@ Skills are triggered by describing what you need. Claude picks the right skill a
 "Optimize the prompt based on results"         → optimize-prompt
 ```
 
-### 4. Improve an existing Prompt
+### 5. Improve an Existing Prompt
 
 ```
 "My prompt isn't performing well, help me improve it" → optimize-prompt

diff --git a/commands/workspace.md b/commands/workspace.md
@@ -20,6 +20,7 @@ Show a quick overview of the user's orq.ai workspace — agents, deployments, pr
 - `experiments` — show only experiments
 - `projects` — show only projects
 - `knowledge` — show only knowledge bases
+- `evaluator` — show only evaluators
 
 If empty, show all sections.
 
@@ -35,6 +36,7 @@ Use the `search_entities` MCP tool and `get_analytics_overview` MCP tool to fetc
 - **Experiments:** `search_entities` with `type: "experiment"`
 - **Projects:** `search_entities` with `type: "project"`
 - **Knowledge:** `search_entities` with `type: "knowledge"`
+- **Evaluator:** `search_entities` with `type: "evaluator"`
 
 Fetch only the sections needed based on arguments. Always fetch analytics overview regardless of section filter.
 
@@ -91,6 +93,12 @@ Manage your workspace at **[Workspace → my.orq.ai](https://my.orq.ai/)**.
 
 - **product-docs** — 120 documents
 - **faq-database** — 45 documents
+
+
+### Evaluators (2)
+
+- **coherence** — active
+- **toxicity** — active
 ```
 
 #### Formatting rules

diff --git a/skills/build-agent/resources/api-reference.md b/skills/build-agent/resources/api-reference.md
@@ -17,7 +17,7 @@ Use the orq MCP server (`https://my.orq.ai/v2/mcp`) as the primary interface. Fo
 | `create_agent` | Create a new agent with configuration |
 | `get_agent` | Get agent details — verify configuration after creation or updates |
 | `update_agent` | Update agent configuration (instructions, model, tools) — iterate without recreating |
-| `search_entities` | Find agents, knowledge bases (`type: "knowledge"`), memory stores (`type: "memory_store"`) |
+| `search_entities` | Find agents, knowledge bases (`type: "knowledge"`), memory stores (`type: "memory_store"`), evaluators (`type: "evaluator"`) |
 | `search_directories` | Discover workspace project structure and paths — useful for KB `path` selection |
 | `list_models` | List available models for agent configuration |
 | `create_llm_eval` | Create evaluators for quality comparison |

diff --git a/skills/build-evaluator/SKILL.md b/skills/build-evaluator/SKILL.md
@@ -94,6 +94,7 @@ Use the orq MCP server (`https://my.orq.ai/v2/mcp`) as the primary interface. Fo
 |------|---------|
 | `create_llm_eval` | Create an LLM evaluator with your judge prompt |
 | `create_python_eval` | Create a Python evaluator for code-based checks |
+| `evaluator_get` | Retrieve any evaluator by ID |
 | `list_models` | List available judge models |
 
 **HTTP API fallback** (for operations not yet in MCP):

diff --git a/skills/generate-synthetic-dataset/resources/api-reference.md b/skills/generate-synthetic-dataset/resources/api-reference.md
@@ -21,6 +21,7 @@ Use the orq MCP server (`https://my.orq.ai/v2/mcp`) as the primary interface. Fo
 | `search_entities` | Find existing datasets (`type: "dataset"`) |
 | `update_datapoint` | Modify existing datapoints (curation) |
 | `delete_datapoints` | Remove datapoints from a dataset (curation) |
+| `evaluator_get` | Retrieve any evaluator by ID to understand dataset requirements |
 
 ## HTTP API
 

diff --git a/skills/run-experiment/resources/api-reference.md b/skills/run-experiment/resources/api-reference.md
@@ -15,6 +15,8 @@ Use the orq MCP server (`https://my.orq.ai/v2/mcp`) as the primary interface. Fo
 | Tool | Purpose |
 |------|---------|
 | `create_llm_eval` | Create an LLM evaluator |
+| `create_python_eval` | Create a Python evaluator for code-based checks |
+| `evaluator_get` | Retrieve any evaluator by ID |
 | `list_traces` | List and filter traces for error analysis |
 | `list_spans` | List spans within a trace |
 | `get_span` | Get detailed span information |