Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 14 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ Each skill encodes best practices from prompt engineering, agent design, evaluat

Built on the [Agent Skills](https://agentskills.io/home#adoption) standard format, so it works with any compatible agent (Claude Code, Cursor, Gemini CLI, and others).


## Setup

### Prerequisites
Expand Down Expand Up @@ -52,7 +53,6 @@ claude --plugin-dir .

> **Note:** Commands (`/orq:quickstart`, `/orq:workspace`, etc.) and agents are only available when installed as a Claude Code plugin.


### Verify

Run the interactive onboarding to confirm everything works:
Expand Down Expand Up @@ -93,6 +93,7 @@ Skills are triggered by describing what you need. Claude picks the right skill a
<!-- BEGIN_SKILLS_TABLE -->
| Skill | What It Does | Documentation |
|-------|-------------|---------------|
| **setup-observability** | Set up orq.ai observability for existing LLM applications — AI Router proxy, OpenTelemetry, `@traced` decorator, and trace enrichment | [SKILL.md](skills/setup-observability/SKILL.md) |
| **build-agent** | Design, create, and configure an orq.ai Agent with tools, instructions, knowledge bases, and memory | [SKILL.md](skills/build-agent/SKILL.md) |
| **build-evaluator** | Create validated LLM-as-a-Judge evaluators following evaluation best practices | [SKILL.md](skills/build-evaluator/SKILL.md) |
| **analyze-trace-failures** | Read production traces, identify what's failing, build failure taxonomies, and categorize issues | [SKILL.md](skills/analyze-trace-failures/SKILL.md) |
Expand All @@ -105,7 +106,15 @@ Skills are triggered by describing what you need. Claude picks the right skill a

## Workflows

### 1. Build a New Agent
### 1. Instrument an Existing App

```
"Add orq.ai tracing to my app" → setup-observability
/orq:traces --last 1h # Verify traces are flowing
"Analyze these traces for failures" → analyze-trace-failures
```

### 2. Build a New Agent

```
"I need a customer support agent" → build-agent
Expand All @@ -114,7 +123,7 @@ Skills are triggered by describing what you need. Claude picks the right skill a
"Run an experiment to get a baseline" → run-experiment
```

### 2. Debug Production Issues
### 3. Debug Production Issues

```
/orq:traces --status error --last 24h # Find errors
Expand All @@ -123,7 +132,7 @@ Skills are triggered by describing what you need. Claude picks the right skill a
"Re-run the experiment to verify the fix" → run-experiment
```

### 3. Improve an Existing Agent
### 4. Improve an Existing Agent

```
/orq:analytics --group-by deployment # Spot high error rates
Expand All @@ -134,7 +143,7 @@ Skills are triggered by describing what you need. Claude picks the right skill a
"Optimize the prompt based on results" → optimize-prompt
```

### 4. Improve an existing Prompt
### 5. Improve an Existing Prompt

```
"My prompt isn't performing well, help me improve it" → optimize-prompt
Expand Down
8 changes: 8 additions & 0 deletions commands/workspace.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ Show a quick overview of the user's orq.ai workspace — agents, deployments, pr
- `experiments` — show only experiments
- `projects` — show only projects
- `knowledge` — show only knowledge bases
- `evaluator` — show only evaluators

If empty, show all sections.

Expand All @@ -35,6 +36,7 @@ Use the `search_entities` MCP tool and `get_analytics_overview` MCP tool to fetc
- **Experiments:** `search_entities` with `type: "experiment"`
- **Projects:** `search_entities` with `type: "project"`
- **Knowledge:** `search_entities` with `type: "knowledge"`
- **Evaluator:** `search_entities` with `type: "evaluator"`

Fetch only the sections needed based on arguments. Always fetch analytics overview regardless of section filter.

Expand Down Expand Up @@ -91,6 +93,12 @@ Manage your workspace at **[Workspace → my.orq.ai](https://my.orq.ai/)**.

- **product-docs** — 120 documents
- **faq-database** — 45 documents


### Evaluators (2)

- **coherence** — active
- **toxicity** — active
```

#### Formatting rules
Expand Down
2 changes: 1 addition & 1 deletion skills/build-agent/resources/api-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Use the orq MCP server (`https://my.orq.ai/v2/mcp`) as the primary interface. Fo
| `create_agent` | Create a new agent with configuration |
| `get_agent` | Get agent details — verify configuration after creation or updates |
| `update_agent` | Update agent configuration (instructions, model, tools) — iterate without recreating |
| `search_entities` | Find agents, knowledge bases (`type: "knowledge"`), memory stores (`type: "memory_store"`) |
| `search_entities` | Find agents, knowledge bases (`type: "knowledge"`), memory stores (`type: "memory_store"`), evaluators (`type: "evaluator"`) |
| `search_directories` | Discover workspace project structure and paths — useful for KB `path` selection |
| `list_models` | List available models for agent configuration |
| `create_llm_eval` | Create evaluators for quality comparison |
Expand Down
1 change: 1 addition & 0 deletions skills/build-evaluator/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,7 @@ Use the orq MCP server (`https://my.orq.ai/v2/mcp`) as the primary interface. Fo
|------|---------|
| `create_llm_eval` | Create an LLM evaluator with your judge prompt |
| `create_python_eval` | Create a Python evaluator for code-based checks |
| `evaluator_get` | Retrieve any evaluator by ID |
| `list_models` | List available judge models |

**HTTP API fallback** (for operations not yet in MCP):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ Use the orq MCP server (`https://my.orq.ai/v2/mcp`) as the primary interface. Fo
| `search_entities` | Find existing datasets (`type: "dataset"`) |
| `update_datapoint` | Modify existing datapoints (curation) |
| `delete_datapoints` | Remove datapoints from a dataset (curation) |
| `evaluator_get` | Retrieve any evaluator by ID to understand dataset requirements |

## HTTP API

Expand Down
2 changes: 2 additions & 0 deletions skills/run-experiment/resources/api-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ Use the orq MCP server (`https://my.orq.ai/v2/mcp`) as the primary interface. Fo
| Tool | Purpose |
|------|---------|
| `create_llm_eval` | Create an LLM evaluator |
| `create_python_eval` | Create a Python evaluator for code-based checks |
| `evaluator_get` | Retrieve any evaluator by ID |
| `list_traces` | List and filter traces for error analysis |
| `list_spans` | List spans within a trace |
| `get_span` | Get detailed span information |
Expand Down
Loading