Skip to content

feat(sandbox): trace Python dataplane operations#2905

Open
Mukil Loganathan (langchain-infra) wants to merge 6 commits into
mainfrom
infra/add-langsmith-sandbox-id-trace-metadata
Open

feat(sandbox): trace Python dataplane operations#2905
Mukil Loganathan (langchain-infra) wants to merge 6 commits into
mainfrom
infra/add-langsmith-sandbox-id-trace-metadata

Conversation

@langchain-infra
Copy link
Copy Markdown
Contributor

@langchain-infra Mukil Loganathan (langchain-infra) commented May 16, 2026

Summary

  • Trace Python sandbox dataplane operations as LangSmith tool runs for sync/async run, reconnect, write, read, and tunnel
  • Wrap the public sandbox dataplane methods with a small @traceable-based helper and custom input/output processors for sanitized trace payloads
  • Attach sandbox_id and sandbox_name metadata through the active tracing context while keeping command/file payload behavior unchanged: no sandbox id env injection, no env-key logging, and file contents stay out of trace inputs

JS changes are split into #2906.

Test Plan

  • uv run ruff check python/langsmith/sandbox/_sandbox.py python/langsmith/sandbox/_async_sandbox.py python/tests/unit_tests/sandbox/test_sandbox.py python/tests/unit_tests/sandbox/test_async_sandbox.py
  • uv run --extra pytest pytest tests/unit_tests/sandbox/test_sandbox.py tests/unit_tests/sandbox/test_async_sandbox.py

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 16, 2026

JS perf benchmark

Lower is better. Noisy on shared runners — treat as a signal, not a gate.

Base64-heavy payload

Single large base64 string per message — the shape the worker-offload path is optimized for.
Payload: 2511.2 KB in / 5.2 KB out, 100 runs.

metric main this PR delta
Wall time (ms) 2103.71 1913.43 -9.0%
createRun total (ms) 94.04 83.62 -11.1%
createRun p50 (ms) 0.64 0.62 -1.8%
createRun p95 (ms) 2.16 1.72 -20.3%
createRun p99 (ms) 33.05 27.43 -17.0%
createRun max (ms) 33.05 27.43 -17.0%
updateRun total (ms) 46.28 50.64 +9.4%
updateRun p95 (ms) 1.19 1.19 +0.3%
loop lag total (ms) 1100.38 779.19 -29.2%
loop lag p50 (ms) 0.09 0.10 +2.2%
loop lag p95 (ms) 5.12 3.83 -25.2%
loop lag p99 (ms) 21.57 12.05 -44.1%
loop lag max (ms) 107.09 125.00 +16.7%

Structural payload

Many small strings across a wide/nested object graph. Should bypass worker offload and use sync flush.
Payload: 1239.5 KB in / 13.3 KB out, 100 runs.

metric main this PR delta
Wall time (ms) 1493.95 1498.22 +0.3%
createRun total (ms) 434.07 385.75 -11.1%
createRun p50 (ms) 3.43 3.30 -3.6%
createRun p95 (ms) 6.99 4.03 -42.3%
createRun p99 (ms) 13.74 21.62 +57.3%
createRun max (ms) 13.74 21.62 +57.3%
updateRun total (ms) 35.43 54.96 +55.1%
updateRun p95 (ms) 0.54 0.43 -19.7%
loop lag total (ms) 1054.66 1052.13 -0.2%
loop lag p50 (ms) 0.07 0.07 +0.4%
loop lag p95 (ms) 4.91 2.82 -42.6%
loop lag p99 (ms) 121.51 121.42 -0.1%
loop lag max (ms) 176.07 182.86 +3.9%

@langchain-infra Mukil Loganathan (langchain-infra) changed the title feat(sandbox): trace sandbox command execution feat(sandbox): trace sandbox dataplane operations May 16, 2026
@langchain-infra Mukil Loganathan (langchain-infra) changed the title feat(sandbox): trace sandbox dataplane operations feat(sandbox): trace Python dataplane operations May 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant