braintrustdata · Eden Halperin (edenh) · Sep 29, 2025
diff --git a/examples/Realtime/realtime-rag/utils/docs-sample/eval-ui.mdx b/examples/Realtime/realtime-rag/utils/docs-sample/eval-ui.mdx
@@ -18,7 +18,9 @@ The following steps require access to a Braintrust organization, which represent
 Navigate to the [AI providers](/app/settings?subroute=secrets) page in your settings and configure at least one API key. For this quickstart, be sure to add your OpenAI API key. After completing this initial setup, you can access models from many providers through a single, unified API.
 
 <Callout>
-For more advanced use cases where you want to use custom models or avoid plugging your API key into Braintrust, you may want to check out the [SDK](/docs/start/eval-sdk) quickstart.
+  For more advanced use cases where you want to use custom models or avoid
+  plugging your API key into Braintrust, you may want to check out the
+  [SDK](/docs/start/eval-sdk) quickstart.
 </Callout>
 
 </Step>
@@ -27,12 +29,13 @@ For more advanced use cases where you want to use custom models or avoid pluggin
 ### Create a new project
 
 For every AI feature your organization is building, the first thing you'll do is create a project.
+
 </Step>
 
 <Step>
 ### Create a new prompt
 
-Navigate to **Library** in the top menu bar, then select **Prompts**. Create a new prompt in your project called "movie matcher". A prompt is the input you provide to the model to generate a response. Choose `GPT 4o` for your model, and type this for your system prompt:
+Navigate to **Prompts**. Create a new prompt in your project called "movie matcher". A prompt is the input you provide to the model to generate a response. Choose `GPT 4o` for your model, and type this for your system prompt:
 
 ```
 Based on the following description, identify the movie title. In your response, simply provide the name of the movie.
@@ -49,6 +52,7 @@ Prompts can use [mustache](https://mustache.github.io/mustache.5.html) templatin
 ![First prompt](./movie-matcher-prompt.png)
 
 Select **Save as custom prompt** to save your prompt.
+
 </Step>
 
 <Step>
@@ -57,6 +61,7 @@ Select **Save as custom prompt** to save your prompt.
 Scroll to the bottom of the prompt viewer, and select **Create playground with prompt**. This will open the prompt you just created in the [prompt playground](https://www.braintrust.dev/docs/guides/playground), a tool for exploring, comparing, and evaluating prompts. In the prompt playground, you can evaluate prompts with data from your [datasets](https://www.braintrust.dev/docs/guides/datasets).
 
 ![Prompt playground](./prompt-playground.png)
+
 </Step>
 
 <Step>
@@ -89,6 +94,7 @@ In this example, the Data is the dataset you uploaded, the Task is the prompt yo
 ![Create experiment](./create-experiment.png)
 
 Creating an experiment from the playground will automatically log your results to Braintrust.
+
 </Step>
 
 <Step>

diff --git a/examples/Realtime/realtime.mdx b/examples/Realtime/realtime.mdx
@@ -2,8 +2,9 @@
 
 The OpenAI [Realtime API](https://platform.openai.com/docs/guides/realtime), designed for building advanced multimodal conversational experiences, unlocks even more use cases in AI applications. However, evaluating this and other audio models' outputs in practice is an unsolved problem. In this cookbook, we'll build a robust application with the Realtime API, incorporating tool-calling and user input. Then, we'll evaluate the results. Let's get started!
 
-## Getting started 
-In this cookbook, we're going to build a speech-to-speech RAG agent that answers questions about the Braintrust documentation. 
+## Getting started
+
+In this cookbook, we're going to build a speech-to-speech RAG agent that answers questions about the Braintrust documentation.
 
 To get started, you'll need a few accounts:
 
@@ -37,7 +38,7 @@ of your account, and set the `PINECONE_API_KEY` environment variable in the [Env
 
 <Callout type="info">
   We'll use the local environment variables to embed and upload the vectors, and
-  the Braintrust variables to run the RAG tool and LLM calls remotely. 
+  the Braintrust variables to run the RAG tool and LLM calls remotely.
 </Callout>
 
 ## Upload the vectors
@@ -50,7 +51,7 @@ npx tsx upload-vectors.ts
 
 This script reads all the files from the `docs-sample` directory, breaks them into sections based on headings, and creates vector embeddings for each section using OpenAI's API. It then stores those embeddings along with the section's title and content in Pinecone.
 
-That's it for setup! Now let's dig into the code. 
+That's it for setup! Now let's dig into the code.
 
 ## Accessing the Realtime API
 
@@ -114,7 +115,8 @@ export default async function Home() {
 ```
 
 <Callout>
-  You can also use our proxy with an AI provider’s API key, but you will not have access to other Braintrust features, like logging.
+  You can also use our proxy with an AI provider’s API key, but you will not
+  have access to other Braintrust features, like logging.
 </Callout>
 
 ## Creating a RAG tool
@@ -123,37 +125,44 @@ The retrieval logic also happens on the server side. We set up the helper functi
 
 ```typescript
 client.addTool(
-      {
-        name: 'pinecone_retrieval',
-        description: 'Retrieves relevant information from Braintrust documentation.',
-        parameters: {
-          type: 'object',
-          properties: {
-            query: {
-              type: 'string',
-              description: 'The search query to find relevant documentation.'
-            }
-          },
-          required: ['query']
+  {
+    name: "pinecone_retrieval",
+    description:
+      "Retrieves relevant information from Braintrust documentation.",
+    parameters: {
+      type: "object",
+      properties: {
+        query: {
+          type: "string",
+          description: "The search query to find relevant documentation.",
         },
       },
-      async ({ query }: { query: string }) => {
-        try {
-          setLastQuery(query);
-          const results = await fetchFromPinecone(query);
-          setRetrievalResults(results);
-          return results
-            .map(result => `[Score: ${result.score.toFixed(2)}] ${result.metadata.title}\n${result.metadata.content}`)
-            .join('\n\n');
-        } catch (error) {
-          throw error;
-        }
-      }
-    );
+      required: ["query"],
+    },
+  },
+  async ({ query }: { query: string }) => {
+    try {
+      setLastQuery(query);
+      const results = await fetchFromPinecone(query);
+      setRetrievalResults(results);
+      return results
+        .map(
+          (result) =>
+            `[Score: ${result.score.toFixed(2)}] ${result.metadata.title}\n${
+              result.metadata.content
+            }`,
+        )
+        .join("\n\n");
+    } catch (error) {
+      throw error;
+    }
+  },
+);
 ```
 
 <Callout type="info">
-Currently, because of the way the Realtime API works, we have to use OpenAI tool calling here instead of Braintrust tool functions. 
+  Currently, because of the way the Realtime API works, we have to use OpenAI
+  tool calling here instead of Braintrust tool functions.
 </Callout>
 
 ## Setting up the system prompt
@@ -183,13 +192,13 @@ Personality:
 `;
 ```
 
-Feel free to play around with the system prompt at any point, and see how it impacts the LLM's responses in the app. 
+Feel free to play around with the system prompt at any point, and see how it impacts the LLM's responses in the app.
 
 ## Running the app
 
 To run the app, navigate to `/web` and run `npm run dev`. You should have the app load on `localhost:3000`.
 
-Start a new conversation, and ask a few questions about Braintrust. Feel free to interrupt the bot, or ask unrelated questions, and see what happens. When you're finished, end the conversation. Have a couple of conversations to get a feel for some of the limitations and nuances of the bot - each conversation will come in handy in the next step. 
+Start a new conversation, and ask a few questions about Braintrust. Feel free to interrupt the bot, or ask unrelated questions, and see what happens. When you're finished, end the conversation. Have a couple of conversations to get a feel for some of the limitations and nuances of the bot - each conversation will come in handy in the next step.
 
 ## Logging in Braintrust
 
@@ -199,24 +208,24 @@ In addition to client-side authentication, you’ll also get the other benefits
 
 ## Online evaluations
 
-In Braintrust, you can run server-side online evaluations that are automatically run asynchronously as you upload logs. This makes it easier to evaluate your app in situations like this, where the prompt and tool might not be synced to Braintrust. 
+In Braintrust, you can run server-side online evaluations that are automatically run asynchronously as you upload logs. This makes it easier to evaluate your app in situations like this, where the prompt and tool might not be synced to Braintrust.
 
-Audio evals are complex, because there are multiple aspects of your application you can focus on. In this cookbook, we'll use the vector search query as a proxy for the quality of the Realtime API's interpretation of the user's input. 
+Audio evals are complex, because there are multiple aspects of your application you can focus on. In this cookbook, we'll use the vector search query as a proxy for the quality of the Realtime API's interpretation of the user's input.
 
 ### Setting up your scorer
 
-We'll need to create a scorer that captures the criteria we want to evaluate. Since we're dealing with complex RAG outputs, we'll use a custom LLM-as-a-judge scorer. 
-For an LLM-as-a-judge scorer, you define a prompt that evaluates the output and maps its choices to specific scores. 
+We'll need to create a scorer that captures the criteria we want to evaluate. Since we're dealing with complex RAG outputs, we'll use a custom LLM-as-a-judge scorer.
+For an LLM-as-a-judge scorer, you define a prompt that evaluates the output and maps its choices to specific scores.
 
-Navigate to **Library** > **Scorers** and create a new scorer. Call your scorer **BraintrustRAG** and add the following prompt: 
+Navigate to **Scorers** and create a new scorer. Call your scorer **BraintrustRAG** and add the following prompt:
 
 ```javascript
 Consider the following question:
- 
+
 {{input.arguments.query}}
- 
+
 and answer:
- 
+
 {{output}}
 
 How well does the answer answer the question?
@@ -225,49 +234,50 @@ b) Reasonably well
 c) Not well
 ```
 
-The prompt uses mustache syntax to map the input to the query that gets sent to Pinecone, and get the output. We'll also assign choice score to the options we included in the prompt. 
+The prompt uses mustache syntax to map the input to the query that gets sent to Pinecone, and get the output. We'll also assign choice score to the options we included in the prompt.
 
 ![RAG scorer](./assets/rag-scorer.png)
 
 ### Configuring your online eval
 
-Navigate to **Configuration** and scroll down to **Online scoring**. Select **Add rule** to configure your online scoring rule. Select the scorer we just created from the menu, and deselect **Apply to root span**. We'll filter to the **function** span since that's where our tool is called. 
+Navigate to **Configuration** and scroll down to **Online scoring**. Select **Add rule** to configure your online scoring rule. Select the scorer we just created from the menu, and deselect **Apply to root span**. We'll filter to the **function** span since that's where our tool is called.
 
 ![Configure score](./assets/configure-score.png)
 
 The score will now automatically run at the specified sampling rate for all logs in the project.
 
 ### Viewing your evaluations
 
-Now that you've set up your online evaluations, you can view the scores from within your logs. Underneath each function span that was included in the sampling rate, you'll have an additional span with the score. 
+Now that you've set up your online evaluations, you can view the scores from within your logs. Underneath each function span that was included in the sampling rate, you'll have an additional span with the score.
 
 ![Scoring span](./assets/scoring-span.png)
 
-This particular function call was scored a 0. But if we take a closer look at the logs, we can see that the question was actually answered pretty well. 
+This particular function call was scored a 0. But if we take a closer look at the logs, we can see that the question was actually answered pretty well.
 You may notice this pattern for other logs as well - so is our function actually not performing well?
 
-## Improving your evals 
+## Improving your evals
 
 There are three main ways to improve your evals:
+
 - Refine the scoring function to ensure it accurately reflects the success criteria.
 - Add new scoring functions to capture different performance aspects (for example, correctness or efficiency).
 - Expand your dataset with more diverse or challenging test cases.
 
-In this case, we need to be more precise about what we're testing for in our scoring function. In our application, we're asking for answers within the specific context of Braintrust, but our current scoring function is attempting to judge the responses to our questions objectively. 
+In this case, we need to be more precise about what we're testing for in our scoring function. In our application, we're asking for answers within the specific context of Braintrust, but our current scoring function is attempting to judge the responses to our questions objectively.
 
-Let's edit our scoring function to test for that as precisely as possible. 
+Let's edit our scoring function to test for that as precisely as possible.
 
 ### Improving our existing scorer
 
-Let's change the prompt for our scoring function to: 
+Let's change the prompt for our scoring function to:
 
 ```javascript
 Consider the following question from an existing Braintrust user:
- 
+
 {{input.arguments.query}}
- 
+
 and answer:
- 
+
 {{output}}
 
 How helpful is the answer, assuming the question is always in the context of Braintrust?
@@ -276,7 +286,7 @@ b) Reasonably helpful
 c) Not helpful
 ```
 
-As you continue to iterate on your scoring function and generate more logs, you should aim to see your scores go up. 
+As you continue to iterate on your scoring function and generate more logs, you should aim to see your scores go up.
 
 ![Logs over time](./assets/logs-over-time.png)
 
@@ -286,7 +296,3 @@ As you continue to build more AI applications with complex function calls and ne
 
 - [I ran an eval. Now what?](/blog/after-evals)
 - [What to do when a new AI model comes out](/blog/new-model)
-
-
-
-
diff --git a/examples/ToolOCR/ToolOCR.mdx b/examples/ToolOCR/ToolOCR.mdx
@@ -1,8 +1,8 @@
 # Using Python functions to extract text from images
 
-From digitizing and archiving images of your handwritten notes, to automating invoice processing, there are a multitude of reasons you’d want to extract text from an image. You could use an LLM for image processing, but doing so can sometimes be inaccurate, expensive, and slow. Optical character recognition, or OCR, is a great pre-processing step that allows you to convert raw image data into text that can then be processed or summarized by an LLM. 
+From digitizing and archiving images of your handwritten notes, to automating invoice processing, there are a multitude of reasons you’d want to extract text from an image. You could use an LLM for image processing, but doing so can sometimes be inaccurate, expensive, and slow. Optical character recognition, or OCR, is a great pre-processing step that allows you to convert raw image data into text that can then be processed or summarized by an LLM.
 
-Maybe you find the perfect recipe on the internet, but it’s surrounded by ads and people’s life stories, or you want to digitize an old recipe written by your grandmother. 
+Maybe you find the perfect recipe on the internet, but it’s surrounded by ads and people’s life stories, or you want to digitize an old recipe written by your grandmother.
 
 ![100 good cookies](assets/recipe.png)
 
@@ -43,10 +43,12 @@ of your account.
 Optical character recognition, or OCR, is any type of technology that converts images of typed, handwritten or printed text into machine-encoded text. There are many well known libraries for OCR — in this cookbook, we’ll use [OCR.Space](https://ocr.space/), a free API you can use for testing without creating an account.
 
 <Callout type="info">
-For this cookbook, we're using the free version of OCR.Space that limits the number of requests. You may exceed rate limits and need to upgrade your account to experiment further with this application.
+  For this cookbook, we're using the free version of OCR.Space that limits the
+  number of requests. You may exceed rate limits and need to upgrade your
+  account to experiment further with this application.
 </Callout>
 
-In Braintrust, you can create tools and then run them in the UI, API, and, of course, via prompts. This will make it easier to iterate on your prompt without having to worry about the OCR logic. 
+In Braintrust, you can create tools and then run them in the UI, API, and, of course, via prompts. This will make it easier to iterate on your prompt without having to worry about the OCR logic.
 
 The OCR tool is defined in `ocr.py`:
 
@@ -81,7 +83,7 @@ def ocr_image(**kwargs) -> str:
         raise ValueError(f"Failed to perform OCR: {e}")
 ```
 
-In just a few lines of code, it takes an image URL, parses and extracts the text, and returns the text contained in the image. 
+In just a few lines of code, it takes an image URL, parses and extracts the text, and returns the text contained in the image.
 
 To push the tool to Braintrust along with all its dependencies, run:
 
@@ -91,15 +93,15 @@ braintrust push ocr.py --requirements requirements.txt
 
 ### Try out the tool
 
-To try out the tool, visit the **toolOCR** project in Braintrust, and navigate to the **Tools** section of your **Library**. Here, you can test different images and see what kinds of outputs you're getting from the tool.
+To try out the tool, visit the **toolOCR** project in Braintrust, and navigate **Tools**. Here, you can test different images and see what kinds of outputs you're getting from the tool.
 
 ![Try gif](assets/try-tool.gif)
 
-This is helpful information for deciding if you'd like to do any additional post processing to the text output. For example, you may notice that your output contains `/n` to indicate new lines in the parsed text. You could include additional processing in your tool to do this. If you change your code, just run `braintrust push ocr.py --requirements requirements.txt` again to sync the tool with Braintrust. 
+This is helpful information for deciding if you'd like to do any additional post processing to the text output. For example, you may notice that your output contains `/n` to indicate new lines in the parsed text. You could include additional processing in your tool to do this. If you change your code, just run `braintrust push ocr.py --requirements requirements.txt` again to sync the tool with Braintrust.
 
-## Try out the prompt 
+## Try out the prompt
 
-When we pushed the tool to Braintrust, we also included an initial definition of the prompt: 
+When we pushed the tool to Braintrust, we also included an initial definition of the prompt:
 
 ```python #skip-compile
 prompt = project.prompts.create(
@@ -144,7 +146,7 @@ Your playground is now set up with a prompt, model choice, dataset, and the tool
 
 ## Iterating on the prompt
 
-Now that we have an interactive environment to test out our prompt and tool call, we can tweak the prompt and model until we get the desired results. 
+Now that we have an interactive environment to test out our prompt and tool call, we can tweak the prompt and model until we get the desired results.
 
 Hit the copy icon to duplicate your prompt and start tweaking. You can also tweak the original prompt and save your changes there if you'd like. For example, you can try instructing the model to always list the quantity of each ingredient you need to purchase.
 

diff --git a/examples/ToolRAG/ToolRAG.mdx b/examples/ToolRAG/ToolRAG.mdx
@@ -115,7 +115,7 @@ The output should be:
 
 ### Try out the tool
 
-To try out the tool, visit the project in Braintrust, and navigate to the **Tools** section of your **Library**.
+To try out the tool, visit the project in Braintrust, and navigate to **Tools**.
 
 ![Test tool](./assets/Test-tool.gif)