Skip to content

feat(evaluation): adding examples and repetitions to create tracer sessions calls#2954

Open
Shamik Karkhanis (shamikkarkhanis) wants to merge 3 commits into
mainfrom
shamikkarkhanis/lse-2266-sdk-exp-loading-state
Open

feat(evaluation): adding examples and repetitions to create tracer sessions calls#2954
Shamik Karkhanis (shamikkarkhanis) wants to merge 3 commits into
mainfrom
shamikkarkhanis/lse-2266-sdk-exp-loading-state

Conversation

@shamikkarkhanis
Copy link
Copy Markdown

@shamikkarkhanis Shamik Karkhanis (shamikkarkhanis) commented May 29, 2026

Description

LSE-2266. Adds support for experiment loading state tracking by including num_examples and num_repetitions in session creation call to langsmith.

Release Notes

none

Test Plan

  • cd python && uv run pytest tests/unit_tests/evaluation/test_runner.py -k "test_resolve_num_examples or test_evaluate_sends_num_examples" -v --PASSES
  • cd js && pnpm test src/tests/evaluate_runner.test.ts -t "evaluate forwards num_examples" --PASSES

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 29, 2026

JS perf benchmark

Lower is better. Noisy on shared runners — treat as a signal, not a gate.

Base64-heavy payload

Single large base64 string per message — the shape the worker-offload path is optimized for.
Payload: 2511.2 KB in / 5.2 KB out, 100 runs.

metric main this PR delta
Wall time (ms) 2003.39 1797.26 -10.3%
createRun total (ms) 74.48 98.97 +32.9%
createRun p50 (ms) 0.59 0.61 +2.2%
createRun p95 (ms) 0.78 3.66 +366.3%
createRun p99 (ms) 27.80 33.62 +21.0%
createRun max (ms) 27.80 33.62 +21.0%
updateRun total (ms) 42.21 47.80 +13.2%
updateRun p95 (ms) 0.85 1.17 +36.8%
loop lag total (ms) 1040.95 949.42 -8.8%
loop lag p50 (ms) 0.09 0.10 +2.2%
loop lag p95 (ms) 4.21 4.85 +15.2%
loop lag p99 (ms) 12.71 18.93 +48.9%
loop lag max (ms) 217.85 113.44 -47.9%

Structural payload

Many small strings across a wide/nested object graph. Should bypass worker offload and use sync flush.
Payload: 1239.5 KB in / 13.3 KB out, 100 runs.

metric main this PR delta
Wall time (ms) 1404.77 1396.33 -0.6%
createRun total (ms) 394.54 384.33 -2.6%
createRun p50 (ms) 3.41 3.35 -1.6%
createRun p95 (ms) 6.42 5.38 -16.3%
createRun p99 (ms) 7.20 14.56 +102.2%
createRun max (ms) 7.20 14.56 +102.2%
updateRun total (ms) 29.58 29.29 -1.0%
updateRun p95 (ms) 0.45 0.50 +11.5%
loop lag total (ms) 972.98 969.16 -0.4%
loop lag p50 (ms) 0.07 0.07 -0.2%
loop lag p95 (ms) 4.65 4.88 +4.9%
loop lag p99 (ms) 118.04 117.91 -0.1%
loop lag max (ms) 157.39 154.84 -1.6%

@shamikkarkhanis Shamik Karkhanis (shamikkarkhanis) marked this pull request as ready for review May 29, 2026 18:51
@shamikkarkhanis Shamik Karkhanis (shamikkarkhanis) force-pushed the shamikkarkhanis/lse-2266-sdk-exp-loading-state branch from 6fbc920 to 962f779 Compare May 30, 2026 00:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant