Skip to content

LLM Obs: Move hallucination detection evaluation doc#35309

Merged
gsvigruha merged 11 commits intomasterfrom
gergely.svigruha/templetized-hallu-detection
Mar 17, 2026
Merged

LLM Obs: Move hallucination detection evaluation doc#35309
gsvigruha merged 11 commits intomasterfrom
gergely.svigruha/templetized-hallu-detection

Conversation

@gsvigruha
Copy link
Contributor

@gsvigruha gsvigruha commented Mar 16, 2026

What does this PR do? What is the motivation?

  • LLM Obs: Move hallucination detection evaluation doc
  • Remove hallucination limitations - no longer apply since we turned this into a template
  • Remove large sections of the managed eval page, makes no sense anymore

Merge instructions

Merge readiness:

  • Ready for merge

@gsvigruha gsvigruha changed the title move hallucination doc LLM Obs: Move hallucination detection evaluation doc Mar 16, 2026
@gsvigruha gsvigruha marked this pull request as ready for review March 16, 2026 18:50
@gsvigruha gsvigruha requested a review from a team as a code owner March 16, 2026 18:50
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c8484e4cfa

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@joepeeples joepeeples self-assigned this Mar 16, 2026
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0afc7c05ce

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d43571c900

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

{{< /tabs >}}

If your LLM provider restricts IP addresses, you can obtain the required IP ranges by visiting [Datadog's IP ranges documentation][2], selecting your `Datadog Site`, pasting the `GET` URL into your browser, and copying the `webhooks` section.
Learn more about the [compatibility requirements][2].

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve removed BYOK anchor target

Removing the Connect your LLM provider account section also removed the #connect-your-llm-provider-account anchor, but content/en/llm_observability/evaluations/custom_llm_as_a_judge_evaluations/_index.md still links to /llm_observability/evaluations/managed_evaluations#connect-your-llm-provider-account ([2]). After this change, users following that custom-evaluation setup link are dropped at the top of the managed page with no matching section, so the provider-connection step is no longer reachable from the documented flow.

Useful? React with 👍 / 👎.

Copy link
Contributor

@joepeeples joepeeples left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with a couple small edit suggestions, thanks!


## Estimated token usage

You can monitor the token usage of your LLM evaluations using [this dashboard][8].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can monitor the token usage of your LLM evaluations using [this dashboard][8].
You can monitor the token usage of your LLM evaluations using the [LLM Evaluations Token Usage dashboard][8].

…_evaluations/_index.md

Co-authored-by: Joe Peeples <joe.peeples@datadoghq.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6db9d84b86

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

….com:DataDog/documentation into gergely.svigruha/templetized-hallu-detection
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6713bbbe86

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 282d759948

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +41 to +42
- [Language Mismatch][3] - Flags responses that are written in a different language than the user’s input
- [Sensitive Data Scanning][4] - Flags the presence of sensitive or regulated information in model inputs or outputs

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reconcile managed evaluation scope in this page

This new “Supported managed evaluations” list now limits managed evaluations to Language Mismatch and Sensitive Data Scanning, but the overview text in the same page still says managed evaluations include sentiment, topic relevancy, toxicity, failure to answer, and hallucination. That contradiction leaves readers with incompatible setup expectations (for example, looking for evaluations that are no longer listed as supported), so the page should be made internally consistent.

Useful? React with 👍 / 👎.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c2a424c636

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@gsvigruha gsvigruha merged commit 2202029 into master Mar 17, 2026
19 checks passed
@gsvigruha gsvigruha deleted the gergely.svigruha/templetized-hallu-detection branch March 17, 2026 00:21
ddjessicay added a commit that referenced this pull request Mar 17, 2026
* Add secret ID notes (#35272)

* add notes

* small edit

* Update MCP docs: recommend custom connectors for Claude Desktop & claude.ai (#35285)

* Update MCP docs: recommend custom connectors for Claude Desktop & claude.ai

The local binary is no longer needed for Claude Desktop or claude.ai — both
now support custom connectors with the remote MCP URL natively. Replaces the
stdio/binary setup instructions with a link to the Claude help center guide.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Simplify tab title to just "Claude" to cover all Claude products

Addresses PR feedback — custom connectors work across Claude (web),
Claude Desktop, and Claude Cowork, so "Claude" covers them all.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Say "including Claude Cowork" instead of "including Claude Desktop"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Remove preview feature notice from prompt optimization (#35288)

Removed preview feature notice for Prompt Optimization.

* [DDSQL-1503] Follow-up on dd.logs() description (#35295)

* Update dd.logs description

* Fix spacing

* [MLObs] adding clarification notes about the metrics (#35248)

* adding clairification notes about the metrics

* remove typo newline

* explain the metrics are only generated for certain keys

* [DOCS-13590] Add Fusion setup guide (#35059)

* [DOCS-13590] Add Fusion setup guide

* [DOCS-13590] Update preview callout

* [DOCS-13590] Update preview callout text

* [DOCS-13590] Add validation section

* [DOCS-13590] Add US1-FED site support banner to Oracle Fusion integration setup guide

* [DOCS-13590] Incorporate cswatt's feedback

* [DOCS-13590] Remove ORA_FND_READ_ONLY_ACCESS_ABSTRACT permission

* Remove MCP Server Preview form alert from VS Code & Cursor extension docs (#35303)

Remove 'The Datadog MCP Server is in Preview. Complete this form to request access.'
from both VS Code and Cursor tabs on the IDE plugins page.

Made-with: Cursor

Co-authored-by: Sumedha Mehta <sumedha.mehta@datadoghq.com>

* [DOCS-13433] Fix valid tag characters to include commas (#35249)

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* Docs13590/fusion integration ga (#35315)

* [DOCS-13590] Remove preview banner and make doc public

* [DOCS-13590] Add Oracle Fusion integration setup guide

* [DOCS-13642] Add US1-FED port restriction note to log forwarding docs (#35313)

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* Update Go Live Debugger page with eBPF limitations (#35310)

* [DOCS-12531] Update integration developers getting started guide (#34741)

* Rewrite requirements and getting-started

* Update links

* Make Vale corrections

* Apply suggestions from code review

Co-authored-by: Joe Peeples <joe.peeples@datadoghq.com>
Co-authored-by: Dominic Medina <115744456+dd-dominic@users.noreply.github.com>
Co-authored-by: Eva Parish <eva.parish@datadoghq.com>

---------

Co-authored-by: Joe Peeples <joe.peeples@datadoghq.com>
Co-authored-by: Dominic Medina <115744456+dd-dominic@users.noreply.github.com>

* [DOCS-13670] Standardize buffer section in destination docs (#35267)

* [DOCS-13670] Standardize buffer section in destination docs

Replace destination_buffer_numbered with destination_buffer shortcode.

* updates

* small edit

* small edit

* add for splunk hec

* Translation Pipeline PR (#35291)

* Translated file updates

* Translated file updates

* Translated file updates

* fix erroneously translated `tab` shortcodes

* fix malformed link syntax

---------

Co-authored-by: webops-guacbot[bot] <214537265+webops-guacbot[bot]@users.noreply.github.com>
Co-authored-by: Joe Peeples <joe.peeples@datadoghq.com>

* Add assets to support the Cdocs stepper (not in use yet) (#35312)

* Sketch in stepper styles

* Tweak styles

* Check off completed steps

* Flesh out example steps

* Make steps searchable

* Nudge elements

* Update example step

* Tweak stepper behavior

* Use a green checkmark circle to mark completed tasks

* Tweak button wording

* Tweak wording

* Tweak stepper line width

* Tweak appearance

* Improve focus visibility

* Improve accessibility

* Improve accessibility

* Tweak checkmark

* Tweak button text size

* Tweak loading behavior

* Button tweaks

* Tweaks

* Update demo markup

* [wip] Incorporate feedback

* Make the clicked step the active step

* Prevent step titles from being hidden under the sticky menu

* Tweak reset behavior

* Style expand/collapse buttons as links

* Improve responsiveness

* Tweak styles

* Tweak icons

* Tweak spacing

* Fix stepper icon URLs

* Tone down expand/collapse toggle styling (#35284)

Reduce visual weight of the expand all / collapse toggle so it reads
as a quiet utility control rather than competing with step titles.

- font-size: 16px → 14px
- font-weight: 600 → 500
- text-transform: uppercase → none (sentence case)
- Add subtle letter-spacing

* Tweaks

* Delete stepper demo file

* Revert changes in package.json

* Implement Codex feedback

* Fix bug

* Update assets/styles/components/_collapsible-section.scss

Co-authored-by: StefonSimmons <57869435+StefonSimmons@users.noreply.github.com>

---------

Co-authored-by: Brett Blue <84536271+brett0000FF@users.noreply.github.com>
Co-authored-by: StefonSimmons <57869435+StefonSimmons@users.noreply.github.com>

* LLM Obs: Move hallucination detection evaluation doc (#35309)

* move hallucination doc

* tweaks

* add back screenshot

* remove usused code

* fixlinks

* Update content/en/llm_observability/evaluations/custom_llm_as_a_judge_evaluations/_index.md

Co-authored-by: Joe Peeples <joe.peeples@datadoghq.com>

* add back account

* links

* fix title

* more fixes

---------

Co-authored-by: Joe Peeples <joe.peeples@datadoghq.com>

---------

Co-authored-by: May Lee <may.lee@datadoghq.com>
Co-authored-by: Reilly Wood <163153147+rgwood-dd@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Charles Jacquet <charles.jacquet@datadoghq.com>
Co-authored-by: Mariana Dutra <88353514+mariddc@users.noreply.github.com>
Co-authored-by: Xinyuan Guo <xinyuan.guo@datadoghq.com>
Co-authored-by: Bryce Eadie <bryce.eadie@datadoghq.com>
Co-authored-by: sumedham <87997309+sumedham@users.noreply.github.com>
Co-authored-by: Sumedha Mehta <sumedha.mehta@datadoghq.com>
Co-authored-by: Rosa Trieu <107086888+rtrieu@users.noreply.github.com>
Co-authored-by: Esther Kim <esther.kim@datadoghq.com>
Co-authored-by: ajwerner <awerner32@gmail.com>
Co-authored-by: Eva Parish <eva.parish@datadoghq.com>
Co-authored-by: Joe Peeples <joe.peeples@datadoghq.com>
Co-authored-by: Dominic Medina <115744456+dd-dominic@users.noreply.github.com>
Co-authored-by: webops-guacbot[bot] <214537265+webops-guacbot[bot]@users.noreply.github.com>
Co-authored-by: Jen Gilbert <jen.gilbert@datadoghq.com>
Co-authored-by: Brett Blue <84536271+brett0000FF@users.noreply.github.com>
Co-authored-by: StefonSimmons <57869435+StefonSimmons@users.noreply.github.com>
Co-authored-by: Gergely Svigruha <gsvigruha@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants