Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 14 additions & 2 deletions content/en/llm_observability/experiments/advanced_runs.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,22 @@
---
title: Advanced Experiment Runs
description: Run experiments multiple times to account for model variability and automate experiment execution in CI/CD pipelines.
description: Run experiments multiple times to account for model variability on a subset of the dataset, and automate experiment execution in CI/CD pipelines.
---

This page discusses advanced topics in running experiments, including [multiple experiment runs](#multiple-runs) and [setting up experiments in CI/CD](#setting-up-your-experiment-in-cicd).

## Run an Experiment on a subset of the Dataset

Check warning on line 8 in content/en/llm_observability/experiments/advanced_runs.md

View workflow job for this annotation

GitHub Actions / vale

Datadog.headings

'Run an Experiment on a subset of the Dataset' should use sentence-style capitalization.

First, add tags to your dataset records. They can be a unique identifier (e.g `name:test_use_case_1`) or represent a property of the scenario (e.g `difficulty:hard`).

Then, use the `tags` argument of `LLMObs.pull_dataset()` to filter down the dataset to the records you want to run an Experiment on.

Example
```
LLMObs.pull_dataset(dataset_name="my-dataset", tags=["env:prod", "version:1.0"])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Assign filtered dataset before running experiment

The example calls LLMObs.pull_dataset(...) without storing its return value, so following these steps can leave users running LLMObs.experiment(..., dataset=dataset) with an unfiltered dataset variable from earlier code. In the subset-run workflow this defeats the intended tag filter and can unexpectedly execute (and bill for) the full dataset.

Useful? React with 👍 / 👎.

```
Finally, run the Experiment as usual.

## Multiple runs

You can run the same experiment multiple times to account for model non-determinism. You can use the [LLM Observability Python SDK][1] or [Experiments API][2] to specify how many iterations to run; subsequently, each dataset record is executed that many times using the same tasks and evaluators.
Expand Down Expand Up @@ -215,4 +227,4 @@

[1]: /llm_observability/instrumentation/sdk?tab=python
[2]: /llm_observability/experiments/api
[3]: https://app.datadoghq.com/llm/experiments
[3]: https://app.datadoghq.com/llm/experiments
Loading