Skip to content

Add Spans() & Thread() to TaskResult#40

Open
Alex Z (CLowbrow) wants to merge 8 commits intomainfrom
alex/trace-in-eval
Open

Add Spans() & Thread() to TaskResult#40
Alex Z (CLowbrow) wants to merge 8 commits intomainfrom
alex/trace-in-eval

Conversation

@CLowbrow
Copy link
Copy Markdown

@CLowbrow Alex Z (CLowbrow) commented Feb 11, 2026

Adds Spans() and Thread() methods to TaskResult, allowing scorers to access trace data from
within evals.

Changes

  • TaskResult.Spans(ctx, ...SpanQueryOpt) returns typed []eval.Span from the experiment
    trace
  • TaskResult.Thread(ctx) returns preprocessed thread entries via the project_default
    preprocessor
  • eval.Span struct replaces raw map[string]any with typed fields
  • api/objects package adds generic object fetch API used internally for span retrieval
  • InvokeGlobal cleanup removes dead /v1/function/invoke fallback
  • All new tests use VCR recorded cassettes

New API

  // Typed span with structured fields
  type Span struct {
        ID             string         `json:"id"`
        SpanID         string         `json:"span_id"`
        RootSpanID     string         `json:"root_span_id"`
        SpanParents    []string       `json:"span_parents"`
        SpanAttributes map[string]any `json:"span_attributes"`
        Input          any            `json:"input"`
        Output         any            `json:"output"`
        Metadata       map[string]any `json:"metadata"`
  }

  // Functional option for filtering spans
  type SpanQueryOpt func(*spansQuery)
  func WithSpanTypes(types ...string) SpanQueryOpt

  // Methods on TaskResult (available in scorers)
  func (r TaskResult[I, R]) Spans(ctx context.Context, opts ...SpanQueryOpt) ([]Span, error)
  func (r TaskResult[I, R]) Thread(ctx context.Context) ([]map[string]any, error)

Example

  scorer := eval.NewScorer("trace_aware", func(ctx context.Context, tr
  eval.TaskResult[string, string]) (eval.Scores, error) {
        // Get all spans from the trace
        allSpans, err := tr.Spans(ctx)
        if err != nil {
                return nil, err
        }

        // Filter to only custom spans
        customSpans, err := tr.Spans(ctx, eval.WithSpanTypes("custom"))
        if err != nil {
                return nil, err
        }

        // Get preprocessed conversation thread
        thread, err := tr.Thread(ctx)
        if err != nil {
                return nil, err
        }

        score := 0.0
        if len(allSpans) > 0 {
                score = 1.0
        }

        return eval.Scores{{
                Name:  "trace_aware",
                Score: score,
                Metadata: map[string]any{
                        "span_count":        len(allSpans),
                        "custom_span_count": len(customSpans),
                        "thread_count":      len(thread),
                },
        }}, nil
  })

Full working example: examples/internal/trace-scorer/main.go

@CLowbrow Alex Z (CLowbrow) changed the title [WIP] get spans Add trace objects with getSpans/getThread functionality to evals scorers Feb 20, 2026
@CLowbrow Alex Z (CLowbrow) marked this pull request as ready for review February 20, 2026 23:53
@clutchski Matt Perpick (clutchski) changed the title Add trace objects with getSpans/getThread functionality to evals scorers Add Spans() & Thread() to TaskResult Mar 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants