Skip to content

Commit 4865b7c

Browse files
Apply review edits to monitor conversations page (WBDOCS-1924)
Capitalize Calls to match Weave terminology, remove cross-reference from monitors.mdx, and soften wording in timeout section. Made-with: Cursor
1 parent ad0b7ea commit 4865b7c

2 files changed

Lines changed: 16 additions & 18 deletions

File tree

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
---
22
title: "Monitor conversations"
3-
description: "Score grouped calls after a conversation goes idle using debounced scoring"
3+
description: "Score grouped Calls after a conversation goes idle using debounced scoring"
44
---
55

6-
When your application handles multi-turn conversations, such as audio calls or chat threads, you might need to score the entire conversation rather than individual calls. Debounced scoring lets you group related calls and score them after the conversation goes idle, so your scorer has access to the full context.
6+
When your application handles multi-turn conversations, such as audio calls or chat threads, you might need to score the entire conversation rather than individual Calls. Debounced scoring in W&B Weave lets you group related Calls and score them after the conversation goes idle, so your scorer has access to the full context.
77

8-
For example, if your application uses OpenAI's Realtime APIs, each trace contains multiple `realtime.response` calls. Debounced scoring waits for the conversation to go idle, then scores the relevant calls as a group.
8+
For example, if your application uses OpenAI's Realtime APIs, each trace contains multiple `realtime.response` Calls. Debounced scoring waits for the conversation to go idle, then scores the relevant Calls as a group.
99

1010
For general monitor setup, see [Set up monitors](/weave/guides/evaluation/monitors).
1111

@@ -15,35 +15,35 @@ To enable debounced scoring on a monitor:
1515

1616
1. [Create a new monitor](/weave/guides/evaluation/monitors#how-to-create-a-monitor-in-weave) or edit an existing one.
1717
2. Toggle **Debounced Scoring** on. This reveals the following fields:
18-
- **Aggregation field**: The field used to group calls. Select **Trace Id** to group calls within a single trace, or **Thread Id** to group calls across a broader conversation thread.
19-
- **Aggregation method**: How calls in the group are scored. Select **Last message** to score only the most recent call in the group, or **All messages** to include all calls in the group.
20-
- **Timeout (minutes)**: How long to wait after the last call completes before scoring. After the timeout elapses, Weave checks whether a newer call has arrived in the group. If not, Weave scores the group.
18+
- **Aggregation field**: The field used to group Calls. Select **Trace Id** to group Calls within a single trace, or **Thread Id** to group Calls across a broader conversation thread.
19+
- **Aggregation method**: How Calls in the group are scored. Select **Last message** to score only the most recent Call in the group, or **All messages** to include all Calls in the group.
20+
- **Timeout (minutes)**: How long to wait after the last Call completes before scoring. After the timeout elapses, Weave checks whether a newer Call has arrived in the group. If not, Weave scores the group.
2121
3. Configure the **LLM-as-a-judge configuration** section as you would for any monitor. See [Set up monitors](/weave/guides/evaluation/monitors#how-to-create-a-monitor-in-weave) for details on these fields.
2222
4. Select **Create monitor** or **Update monitor**.
2323

2424
## Choose an aggregation method
2525

26-
### Last message (recommended)
26+
### Last message (Recommended)
2727

28-
Use the **Last message** method when each call in the conversation contains the full conversation history. This is the case when you use OpenAI's Realtime APIs, where every `realtime.response` call contains the complete audio conversation up to that point.
28+
Use the **Last message** method when each Call in the conversation contains the full conversation history. This is the case when you use OpenAI's Realtime APIs, where every `realtime.response` Call contains the complete audio conversation up to that point.
2929

30-
Set the **Aggregation field** to **Trace Id** and the **Aggregation method** to **Last message**. After the timeout elapses, Weave scores only the most recent call in the trace, which already contains the full conversation.
30+
Set the **Aggregation field** to **Trace Id** and the **Aggregation method** to **Last message**. After the timeout elapses, Weave scores only the most recent Call in the trace, which already contains the full conversation.
3131

32-
This method uses fewer resources because only one call per group is scored.
32+
This method uses fewer resources because only one Call per group is scored.
3333

3434
### All messages
3535

36-
Use the **All messages** method when individual calls do not contain the full conversation history. In this case, Weave extracts content from every call in the aggregation group and passes it all to the scorer.
36+
Use the **All messages** method when individual Calls do not contain the full conversation history. In this case, Weave extracts content from every Call in the aggregation group and passes it all to the scorer.
3737

38-
Set the **Aggregation field** to **Thread Id** for broader grouping flexibility, and the **Aggregation method** to **All messages**.
38+
You can set the **Aggregation field** to **Thread Id** for broader grouping flexibility, and the **Aggregation method** to **All messages**.
3939

40-
This method uses more resources because the scorer processes every call in the group.
40+
This method uses more resources because the scorer processes every Call in the group.
4141

4242
## Timeout considerations
4343

4444
The timeout value controls the trade-off between scoring latency and accuracy:
4545

46-
- **Shorter timeouts** score conversations faster but risk scoring before the conversation is complete. Use shorter timeouts for debugging or when conversations have predictable end points.
47-
- **Longer timeouts** wait longer to confirm the conversation is idle, reducing the chance of premature scoring. Use longer timeouts in production, especially for conversations with variable pauses between calls. Longer timeouts increase server load.
46+
- Shorter timeouts score conversations faster but risk scoring before the conversation is complete. Use shorter timeouts for debugging or when conversations have predictable end points.
47+
- Longer timeouts wait longer to confirm the conversation is idle, reducing the chance of premature scoring. Use longer timeouts in production, especially for conversations with variable pauses between Calls. Longer timeouts increase server load.
4848

49-
For example, a timeout of `0.25` minutes (15 seconds) is useful during development, while a timeout of several minutes is more appropriate for production workloads.
49+
For example, a timeout of `0.25` minutes (15 seconds) is useful during development, while a timeout of several minutes might be appropriate for production workloads.

weave/guides/evaluation/monitors.mdx

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,6 @@ You can monitor text, images, and audio in your application's input and output.
99

1010
Monitors require no code changes to your application. Set them up using the W&B Weave UI.
1111

12-
To score grouped calls in multi-turn conversations or audio threads after they go idle, see [Monitor conversations](/weave/guides/evaluation/monitor-conversations).
13-
1412
If you need to actively intervene in your application's behavior based on scores, use [guardrails](/weave/guides/evaluation/guardrails) instead.
1513

1614
## How to create a monitor in Weave

0 commit comments

Comments
 (0)