Skip to content

docs(adr): add ADR 0049 semantic observability and improvement loop#2423

Open
HofniGartner wants to merge 1 commit into
fullsend-ai:mainfrom
HofniGartner:adr/0049-semantic-observability-and-improvement-loop
Open

docs(adr): add ADR 0049 semantic observability and improvement loop#2423
HofniGartner wants to merge 1 commit into
fullsend-ai:mainfrom
HofniGartner:adr/0049-semantic-observability-and-improvement-loop

Conversation

@HofniGartner

Copy link
Copy Markdown

Summary

Proposes ADR 0049: Semantic observability and improvement loop — a layered model on top of ADR 0021 JSONL traces:

  1. Derived signals — host-side enrichment from stream-json (e.g. repeated tools, validation retries, stuck-run heuristics), exported to an OTel-compatible trace backend (hybrid with optional artifact mirror)
  2. Observer (read-first) — post-run / optional between-stage analysis; analyst, not fixer
  3. Shadow mode — dry-run for future observer forge actions (delivered: false by default)
  4. Lesson extraction — structured lessons from retro/observe → config repo → eval golden sets

Builds on operational-observability.md and testing-agents.md. Takes inspiration from The Darwin Project for separating raw traces from semantic signals, without adopting that runtime.

Relationship to retro (#131): Retro remains the workflow-level improvement agent (issues + PR comments). This ADR adds the plumbing — cheaper inputs for retro (signals vs full JSONL) and a path from findings → regression tests. Implementation is deferred.

Why now

JSONL gives per-run forensics but is expensive to scan at factory scale. Generic trace backends capture tool spans but not fullsend-specific patterns. This ADR records the what/why before implementation ADRs (trace export, harness layout, observer wiring).

What this PR does not include

  • No code changes
  • No observability vendor selection
  • No changes to problem docs
  • No replacement of JSONL or retro

Options considered

Documented in the ADR: artifact-only (A), backend-only (B), hybrid (C, preferred).

Defines derived signals, read-first observer, shadow interventions, and
lesson extraction on top of JSONL traces, with hybrid OTel-backend export.

Signed-off-by: Hofni Gartner <hgartner@redhat.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@github-actions

Copy link
Copy Markdown

E2E tests did not run

E2E tests run automatically for org/repo members and collaborators on pull requests.

For other contributors, a maintainer must add the ok-to-test label after the latest push.

See E2E testing guide for details.

@fullsend-ai-review

fullsend-ai-review Bot commented Jun 18, 2026

Copy link
Copy Markdown

🤖 Finished Review · ✅ Success · Started 2:15 PM UTC · Completed 2:27 PM UTC
Commit: a9a765b · View workflow run →

@ralphbean ralphbean left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple things to sort out before this can move forwards — see inline.

@@ -0,0 +1,178 @@
---

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[critical] This collides with #1489, which also claims 0049. That PR is closer to merging (has reviews, ready-for-merge label), so I think this one should take the next available number.

There's a renumber-adr skill in the repo that can help — it checks for collisions against the target branch and renumbers automatically.


### Option C: Hybrid (recommended)

Export traces to an OTel-compatible backend as the primary query surface.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[important] This ADR assumes OTel-compatible traces exist but doesn't reference the companion ADR that decides how they're produced (#1489 — distributed tracing instrumentation). I think we should be explicit about that dependency here, since derived signals and the observer both consume what the tracing layer produces.

Might make sense to wait for #1489 to land and then reference it by number.

- Harness layout for signal artifacts, shadow logs, and lesson files
([ADR 0024](0024-harness-definitions.md))
- Per-SIG dashboards and aggregation on trace scores and tags
- Observer write tools and action allowlist policy

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[non-blocking] Per AGENTS.md, accepted ADRs should update docs/architecture.md and related problem docs in the same PR. Not blocking on it, but worth adding before merge.

@fullsend-ai-review

Copy link
Copy Markdown

Review

Findings

High

  • [missing-architecture-doc-update] docs/architecture.md:185 — ADR 0049 has status Accepted but docs/architecture.md has not been updated to reflect the new decisions. Per AGENTS.md: "When status is Accepted, update docs/architecture.md and related problem docs in the same PR." The Observability section (line 185) has a Decided: block that should include a bullet referencing ADR 0049 and summarizing the semantic observability and improvement loop decisions.
    Remediation: Add a new bullet under the Decided: section in the Observability component referencing ADR 0049.

Medium

  • [missing-problem-doc-update] docs/problems/operational-observability.md — ADR 0049 lists operational-observability in its relates_to frontmatter. Per AGENTS.md and the writing-adrs skill, when an Accepted ADR resolves or partially answers open questions in a problem doc, those questions should be annotated with a link to the ADR.
    Remediation: Add cross-references to ADR 0049 where it addresses open questions.

  • [missing-problem-doc-update] docs/problems/testing-agents.md — ADR 0049 lists testing-agents in its relates_to frontmatter. Same requirement as above.
    Remediation: Add cross-references to ADR 0049 where it addresses open questions.

Low

  • [design-document-alignment] docs/ADRs/0049-semantic-observability-and-improvement-loop.md:98 — Section 2 (Observer) says implementation details are "deferred to ADR 0024 or a follow-on ADR." ADR 0024 is already accepted and does not mention observer stages, signal artifacts, shadow logs, or lesson files. The phrasing implies ADR 0024 might already cover these when it does not.

  • [capitalization-consistency] docs/ADRs/0049-semantic-observability-and-improvement-loop.md:78### Rollout order (sentence case) vs ### Non-goals (title case) — minor inconsistency in subsection header capitalization within the Decision section.

Info

  • [adr-cross-reference] docs/ADRs/0021-jsonl-reasoning-trace-exposure.md — ADR 0049 directly extends ADR 0021. Consider adding a minor annotation to ADR 0021 noting this relationship. Per AGENTS.md: "Minor annotations are welcome: cross-references to related ADRs."

@fullsend-ai-review fullsend-ai-review Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the review comment for full details.

the same schema for offline retro when no backend is configured.

JSONL remains the forensic source ([ADR 0021](0021-jsonl-reasoning-trace-exposure.md)).
Derived signals are enrichments, not a replacement.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[low] design-document-alignment

Section 2 says implementation details are 'deferred to ADR 0024 or a follow-on ADR.' ADR 0024 is already accepted and does not mention observer stages, signal artifacts, shadow logs, or lesson files. The phrasing implies ADR 0024 might already cover these, when in reality only a follow-on ADR would.

deferred to a follow-on decision. See
[operational-observability.md](../problems/operational-observability.md).

## Decision

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[low] capitalization-consistency

'Rollout order' (sentence case) vs 'Non-goals' (title case) shows minor inconsistency in subsection header capitalization within the Decision section.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants