Skip to content

fix(debugger): use bytecode/sys.monitoring instead of setattr wrapper#772

Draft
liustve wants to merge 25 commits into
aws-observability:mainfrom
liustve:di/probe-py-start-engine-rebuild
Draft

fix(debugger): use bytecode/sys.monitoring instead of setattr wrapper#772
liustve wants to merge 25 commits into
aws-observability:mainfrom
liustve:di/probe-py-start-engine-rebuild

Conversation

@liustve

@liustve liustve commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Problem

The current Dynamic Instrumentation path replaces a target function with a Python-level wrapper installed via setattr(module, name, wrapper). This has two issues, one architectural and one downstream of it.

1. setattr only updates the module's name binding. Frameworks register handler functions at import time and store direct references in their own internal data structures (route registries, dispatcher tables, ORM hooks, etc.). When a request comes in, the framework calls those stored references, not module.foo. So setattr-replacing module.foo leaves DI silently bypassed for every framework-registered entry point — the wrapper never runs, no snapshot, no instrumentation. The previous workaround was a per-framework patcher that scanned for known frameworks and rewrote their internal dicts. This is fundamentally a losing approach: every new framework needs its own patcher, the matching heuristics break under decorator wrappers, and any framework we don't know about is invisible.

2. The wrapper's exception path drops the throwable. When a wrapped function raises, the resulting snapshot's Captures.return_context.throwable field is empty — no captured exception type, message, or stacktrace. The user gets a snapshot that says the function failed but provides none of the data they need to diagnose it. The cause is that the wrapper's try/except sits on top of the user's frame, and exception-state lifecycle differences between Python versions (especially 3.11+ with PEP 657) made the capture logic unreliable.

Both problems share a root cause: instrumenting at the Python layer instead of at the interpreter layer.

Fix

Instrument at the interpreter layer. The function's __code__ is mutated in place — any reference any framework or library is holding sees the rewritten code on the next call, with no per-framework patching. The interpreter's own exception flow runs the unwind hook, so sys.exc_info() is reliably the live exception and the throwable is always captured.

Each Python version uses the engine that fits it best, and all three engines share the same handler contract: entry, return, unwind.

Python Engine Mechanism
3.12+ SysMonitoringEngine PEP 669 (sys.monitoring): PY_START / PY_RETURN / PY_UNWIND
3.9 / 3.10 BytecodeInjectionEngine SETUP_FINALLY-based body wrap, RERAISE in handler
3.11 BytecodeInjectionEngine PEP 657 co_exceptiontable wrap via splice walk (see below)

The unwind handler reads sys.exc_info() directly, builds a CapturedThrowable with type/message/stacktrace (in-function frames merged with caller frames), and emits the snapshot.

Architectural change

3.12+: sys.monitoring

Native PEP 669 events. Per-thread LIFO stack of (code_id, start_ns) tuples preserves correct durations across recursion. PY_UNWIND is set globally (it isn't local-event-capable on 3.12-3.14); a cheap _function_keys membership check filters to instrumented codes before any allocation. Reentrancy guard via threading.local so a snapshot built from the serializer cannot itself trigger another snapshot on the same thread.

3.9 / 3.10: bytecode rewrite with SETUP_FINALLY

Standard SETUP_FINALLY body wrap: prologue calls the entry hook, every RETURN_VALUE is replaced with a sequence that calls the exit hook then returns, and the SETUP_FINALLY block calls the unwind hook then RERAISEs. Local slots for start_ns / entry_ctx / retval are pre-initialized to None so the unwind hook's LOAD_FAST is safe even if the entry hook crashed before storing them. POP_BLOCK is emitted before each RETURN_VALUE so the block stack is balanced on the normal path.

3.11: bytecode rewrite with PEP 657 exception_table — splice walk

3.11 removed SETUP_FINALLY. Protected regions live in out-of-band metadata (co_exceptiontable). The bytecode library exposes this via TryBegin / TryEnd pseudo-instructions, but it forbids nested TryBegin — and any function that contains user try/except already has one. Wrapping the whole body with another TryBegin raises RuntimeError: TryBegin pseudo instructions cannot be nested.

The fix is a splice walk that produces N adjacent (never nested) outer regions, all targeting one shared exception handler. Walking the user's instructions:

            (preamble: RESUME, COPY_FREE_VARS, MAKE_CELL)
            <prologue: call entry hook>
[outer]     TryBegin(handler_label, push_lasti=True)
              ... body before user try ...
[outer]     TryEnd                          # close BEFORE user try
              user_try_begin
                ... user-protected body ...
              user_try_end
[outer]     TryBegin(handler_label, ...)    # reopen AFTER user try
              ... body after user try ...
              <each RETURN_VALUE replaced with: exit hook + return>
[outer]     TryEnd
            handler_label:
              <unwind hook + RERAISE>

The outer protected regions are flat (one closes before each user try, the next opens after) so the library accepts them, and they all funnel uncaught exceptions to the same handler. Other 3.11-specific details:

  • Prologue is inserted after RESUME 0 / COPY_FREE_VARS / MAKE_CELL so closures and free-vars are armed when the entry hook fires (otherwise frame.f_locals is incomplete).
  • 3.11 calling convention: PUSH_NULL + LOAD_GLOBAL((push_null: bool, name: str)) + PRECALL n + CALL n (the tuple-form LOAD_GLOBAL arg is mandatory; passing a bare string silently miswrites namei).
  • Handler entry stack with push_lasti=True is [..., lasti, exc]. 3.11 does NOT auto-set sys.exc_info() for co_exceptiontable handlers, so the foot uses PUSH_EXC_INFO to expose the live exception to the unwind hook, then RERAISE 0 to propagate.

Decorator unwrapping

@functools.wraps-style decorators rebind module.name to a wrapper whose __code__ is the decorator's check, not the user's body. Without unwrapping, the engine would rewrite the decorator's code and snapshots would report the wrapper's location and fire on every decorator invocation instead of the actual handler. A new _undecorate.undecorated() BFSes through __wrapped__ / functools.partial / closure cells, with co_filename disambiguation, to find the real user function. The engine arms that one and keys state by its id.

Async / generator fallback

Engines decline to rewrite generators, coroutines, async generators, and iterable coroutines (rewriting their bodies would corrupt .send() / .throw() / await semantics). On decline, the manager routes through FunctionWrapper.instrument_function, which has dedicated coroutine support via _create_async_wrapper. So async def functions remain instrumented end-to-end — just not by the bytecode engine.

Framework-specific patcher removed

The __code__ mutation is in-place: every reference any framework holds sees the rewritten code on the next call. No framework-specific scanning, no per-framework patcher, no fragile name+module identity matching. The previously-needed patcher was deleted and its tests rewritten in place to exercise the engine path and prove the framework's stored reference flows through without explicit patching.

State leak fixes

  • _call_start_ns is popped unconditionally in PY_RETURN/PY_UNWIND before the reentrancy/membership guards (otherwise a disable-mid-flight or serializer-triggered short-circuit orphans the stamp).
  • disable_function_level_instrumentation clears all per-code state including _instrumentation_types and the per-thread stack.
  • disable_breakpoints_for_function (the existing line-BP path) also clears function-level state on the same code, so coexisting instrumentation tears down cleanly.

Other changes in this PR

  • SnapshotOtlpEmitter.initialize() is called eagerly during manager init. Lazy init from a sys.monitoring callback at shutdown could hit RuntimeError(\"cannot schedule new futures after interpreter shutdown\") because BatchLogRecordProcessor spawns a daemon worker on construction.
  • _build_resource writes both deployment.environment (older convention) and deployment.environment.name (OTel-stable) so consumers reading either key see the value. Engines' _get_environment helpers prefer the .name variant and fall back to the bare key.
  • All hardcoded \"service.name\" / \"deployment.environment(.name)\" strings replaced with constants from opentelemetry.semconv.resource.ResourceAttributes and opentelemetry.semconv._incubating.attributes.deployment_attributes.DEPLOYMENT_ENVIRONMENT_NAME. All inline OTel imports hoisted to module top.
  • Conftest pins OTel semconv schema URL to v1.41.0 raw GitHub path so schema-validation tests are stable when the docs site drifts.

Validation

Python Tests passing Function-level scenarios
3.9 528 basic, throwable, generator/coroutine/async-gen decline, disable-restores, kwonly+varargs, recursion, throwable-inside-user-try, decorator unwrap, disable-after-decorated-enable
3.10 528 same as 3.9
3.11 528 same as 3.9 (10 tests previously skipped on 3.11 are now active and passing)
3.12 523 + 2 SME recursion-duration regression tests
  • `black` / `isort` / `flake8`: clean across 47 files.
  • Regression harness against `origin/main` (12 scenarios): PASS, 0 unexpected deltas.
  • E2E test framework: line-level + function-level + exception-path snapshots validated in CW Logs (sys.monitoring path on 3.12 and bytecode path on 3.9/3.10/3.11 all green).

Test plan

  • Unit tests pass on 3.9 / 3.10 / 3.11 / 3.12
  • Linters clean
  • Regression harness shows zero deltas vs origin/main
  • E2E test framework validates throwable capture end-to-end

…capture

Surgical rebuild of PR aws-observability#770 that fixes the named throwable-capture regression
without inheriting the archive's 13-commit churn or its 12 documented
correctness regressions.

Logical changes (single commit; granularity preserved in tests / structure):

1. SysMonitoringEngine (Python 3.12+): add PROBE / function-level BREAKPOINT
   support via PY_START + PY_RETURN + PY_UNWIND. PY_UNWIND is set globally
   (not local-event-capable on 3.12-3.14). New methods:
   enable_function_level_instrumentation, disable_function_level_instrumentation,
   _function_start_event_handler, _function_return_event_handler,
   _function_unwind_event_handler, _handle_function_level_instrumentation,
   _build_throwable. Reentrancy guard via threading.local.

   Three critical leak fixes vs first draft:
   - PY_RETURN/PY_UNWIND pop _call_start_ns BEFORE reentrancy/membership checks
   - disable_function_level_instrumentation sweeps in-flight stamps across
     all threads
   - disable_breakpoints_for_function explicitly clears _instrumentation_types
     and sweeps _call_start_ns

2. BytecodeInjectionEngine (Python 3.9/3.10): SETUP_FINALLY-based bytecode
   rewrite. New module-level _FUNCTION_HANDLER_NAME / _REFUSAL_MASK
   (inspect.CO_GENERATOR | CO_COROUTINE | CO_ASYNC_GENERATOR |
   CO_ITERABLE_COROUTINE). InjectionState gains optional function_metadata
   bag (zero new dicts). Generator/coroutine/async-generator refused.
   3.11 deferred to a follow-up commit (high-level bytecode API forbids
   nested TryBegin; needs ConcreteBytecode + exception_table).

3. _undecorate.py: BFS through __wrapped__ / functools.partial / closure
   cells with co_filename disambiguation. Fixes the silent "wrong location"
   bug for @login_required / @cache_page / any @functools.wraps decorator.
   Without this, instrumenting `module.my_view` rewrites the decorator's
   wrapper, not my_view.

4. Manager wiring (instrumentation_manager.py):
   - _instrument_function_level helper calls engine first
   - _apply_function: engine path for line=0 BPs, setattr wrapper fallback
     only when engine refuses (preserves async support via wrapper)
   - _remove_function: also calls disable_function_level_instrumentation
   - _build_resource: deployment.environment -> deployment.environment.name
     (the engines' _get_environment helpers already read the corrected key)
   - _snapshot_emitter.initialize() called eagerly at __init__ to avoid
     RuntimeError("cannot schedule new futures after interpreter shutdown")
     when BatchLogRecordProcessor's daemon worker tries to spawn from a
     sys.monitoring callback thread

5. SnapshotOtlpEmitter.initialize(): public wrapper around _ensure_initialized
   so the manager can eagerly initialize on a normal user thread.

Tests:
- 10 new function-level tests in test_bytecode_injection_engine.py
  (basic return, throwable capture, generator/coroutine/async-gen refusals,
  disable-restores-original, kwonly+varargs, recursion, throwable-inside-
  user-try, decorator unwrap)
- test_sys_monitoring_engine.py: assert all 4 callbacks register
  (LINE + PY_START + PY_RETURN + PY_UNWIND) and PY_UNWIND set_events
- test_instrumentation_manager.py: deployment.environment.name key

Validation:
- Python 3.9: 526 passed, 35 skipped
- Python 3.10: 526 passed, 35 skipped
- Python 3.11: 516 passed, 45 skipped (10 function-level deferred)
- Python 3.12: 520 passed, 41 skipped (uses SysMonitoringEngine)
- black, isort, flake8: clean across 47 files
- Regression harness vs origin/main: 0 unexpected deltas (12 scenarios)

Things deliberately NOT carried over from the archive:
- _bytecode_templates.py (430 LOC) — inlined into BIE
- _snapshot_factory.py — Snapshot built inline like SME does
- _function_entries dict — duplicates _injection_states
- _tls = threading.local() — dead state in archive
- super().__init__() — ABC has no __init__
- _LocalSlots / _InjectionInstrs dataclasses — ceremony
- disable_function_entry fallback scan-by-code — would match closure siblings
- start_ns == 0 skip sentinel — collides with serializer-error fallback
- 12 other regressions documented in the deep-analysis workflow output
@liustve liustve requested a review from a team as a code owner June 15, 2026 01:54
@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Claude PR Review

Working on review...

  • Read prior review comments to dedupe
  • Read PR diff
  • Read changed files for context
  • Identify findings (code quality, security, perf, tests, docs)
  • Post review

View job run

liustve added 2 commits June 14, 2026 19:05
…into initialize

The original split (public initialize() wrapping private _ensure_initialized())
was redundant — both did the exact same thing and once initialize() was a
public API, _ensure_initialized stopped being internal-only. One method now
serves both eager startup-time init and lazy on-first-emit init via the
existing double-checked locking.
Ports the conftest fix from PR aws-observability#770. The schema base URL now points at a
versioned raw.githubusercontent.com path instead of opentelemetry.io/docs/specs/semconv,
making schema-validation tests deterministic when the docs site drifts.
@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Claude finished @liustve's task in 18m 3s —— View job


Claude PR Review

Reviewed the engine rebuild end-to-end against pr-head/. Most of the rebuild looks careful — the leak fixes called out in the description (unconditional _call_start_ns.pop, sweep on disarm, _instrumentation_types cleanup) are good, the reentrancy guards are placed correctly, and the BIE bytecode shape is sound. Below are findings I'd ask you to address before merge.

Critical

1. disable_function_level_instrumentation is a no-op after undecorate redirection (BIE).
_bytecode_injection_engine.py:326-329 reassigns func = resolved and stores the InjectionState keyed by id(resolved). But instrumentation_manager._apply_function at instrumentation_manager.py:461-462 records bp_set.original_function = target_func where target_func is the wrapper returned by FunctionWrapper._discover_function. On removal, instrumentation_manager.py:609 calls disable_function_level_instrumentation with the wrapper, and the disable lookup at _bytecode_injection_engine.py:411-414 computes id(wrapper), which never matches the resolved-func id key — so state is None and disable returns early. The inner user function's __code__ stays patched until interpreter exit. This regression is exactly the kind of @decorator path the new _undecorate.py was added for, so it'll be common.

test_function_level_undecorate_resolves_wrapped_function doesn't catch it because it never disables — only cleanup is called in tearDown, and cleanup iterates _injection_states directly. Suggested fix: either return the resolved func from enable_function_level_instrumentation so the manager can record it as original_function, or have disable_function_level_instrumentation re-run undecorated(func, ...) (using saved metadata) before computing func_id. Add a regression test that calls enable then disable on a @functools.wraps-decorated function and asserts __code__ is the original wrapper code.

2. SME _call_start_ns clobbers on recursion — outer frames lose duration.
_sys_monitoring_engine.py:273 stores start_ns keyed by (code_id, threading.get_ident()). For a recursive function, every nested PY_START on the same thread overwrites the outer frame's stamp, and the first PY_RETURN pops it; subsequent (outer) PY_RETURNs find an empty slot and fall through to start_ns=0, producing duration_ms=0 for all but the innermost frame.

The BIE engine sidesteps this by stashing start_ns in per-frame bytecode locals (_di_start_ns), and test_function_level_recursion validates BIE only — it lives in TestBytecodeInjectionEngineFunctionLevel and is skipped on 3.12+. Net effect: PROBE on recursive functions on Python 3.12+ silently reports wrong durations. Two viable fixes:

  • Use a per-thread threading.local() stack of (code_id, start_ns) tuples, push on PY_START / pop on PY_RETURN/PY_UNWIND.
  • Or read the start_ns off the running frame (e.g., stash in frame.f_locals with a unique name) so it's frame-scoped.

A SME-side recursion test would have caught this — please add one alongside the fix.

Medium

3. _handle_function_event does a linear scan to look up instrumentation_type (BIE).
_bytecode_injection_engine.py:454-458 iterates every entry in _injection_states.values() on every snapshot to find a matching function_key. With N instrumented functions, this is O(N) per call. Easy to fix by also keying _capture_configs / a parallel dict on (function_key, 0) -> instrumentation_type, mirroring what SME already does in _instrumentation_types.

4. Global PY_UNWIND fires on every Python exception in the process.
_sys_monitoring_engine.py:130 sets PY_UNWIND globally (correct — it isn't local-event-capable on 3.12-3.14). The handler does an unconditional _call_start_ns.pop(...) before the membership check at _sys_monitoring_engine.py:311. For exception-heavy apps with no instrumented functions, this is one dict miss per unwind frame — small but not free. Worth ordering the cheap membership check first, then pop only for our codes.

Minor / nits

  • _bytecode_injection_engine.py:439-442 pops _HANDLER_NAME from globals on function-level disarm even if line-BPs on a different function in the same module are still using it. The still_used scan only checks function_ref.__globals__ is globals_obj, so if a sibling function in the same module still has line-BPs armed but happens to be the only function_ref matching that globals dict, the pop is correctly skipped. But the symmetric case — line-BP entry exists for a sibling without a function_ref recorded — isn't possible in current code, so this is fine; just flagging that the invariant relies on function_ref always being set.
  • _undecorate.py:95 priming best_match and then re-checking inside the loop is fine but slightly redundant — the BFS visits func first iteration and sets best_match there.
  • _function_event_handler string-dispatch on event per call is hot-path; consider using three distinct callbacks installed in globals (or branchless dispatch) if profiling shows this matters. Not blocking.

Tests

  • Two recommended additions to lock down the critical findings: (a) test_function_level_disable_after_decorated_enable for BIE; (b) test_function_level_recursion for the SME class (currently only present on BIE).

Things that look right

  • Reentrancy guard placement is consistent across both engines.
  • disable_function_level_instrumentation in SME correctly drains in-flight stamps (for stale_key in [k for k in self._call_start_ns if k[0] == code_id]).
  • disable_breakpoints_for_function in SME also clears _instrumentation_types and _call_start_ns — good.
  • BIE refusal mask covers the four async/generator flags, and the prologue pre-inits all three slots to None so the unwind handler's LOAD_FAST is safe even if entry hook crashes.
  • Eager SnapshotOtlpEmitter.initialize() from __init__ correctly avoids the BatchLogRecordProcessor daemon-spawn-from-monitoring-callback hazard.
  • deployment.environment.name rename matches the engines' _get_environment lookups — internally consistent.

  • Read prior review comments to dedupe (no prior claude-code-review:v1 comments)
  • Read PR diff
  • Read changed files for context
  • Identify findings (code quality, security, perf, tests, docs)
  • Post review
    | Branch: di/probe-py-start-engine-rebuild

…ice-walk

Adds 3.11 BytecodeInjectionEngine support so PROBE / function-level
BREAKPOINT no longer falls through to the setattr wrapper on 3.11
(restoring throwable capture parity with 3.9/3.10/3.12+).

Why this needs a different path than 3.9/3.10:
- 3.11 removed SETUP_FINALLY (the opcode the existing path uses)
- 3.11 introduced PEP 657 exception_table — protected regions live
  out-of-band as metadata, not in the instruction stream
- bytecode==0.18.1 forbids nested TryBegin pseudo-instructions

Splice-walk pattern (no nesting):
- Single shared exception handler label at end of function
- Outer protected region opens after RESUME 0 / closure preamble
- On every user TryBegin: emit TryEnd before it (closing outer)
- On matching user TryEnd: open fresh outer TryBegin after it
- Result: N adjacent (never nested) outer regions, all targeting
  the same handler — bytecode==0.18.1 accepts this and writes the
  correct co_exceptiontable

Critical 3.11 details:
- Prologue must come AFTER RESUME 0 / COPY_FREE_VARS / MAKE_CELL
  so closures and free-vars are armed when the entry hook fires
- LOAD_GLOBAL takes a tuple oparg `(push_null: bool, name: str)`
- PRECALL/CALL must agree on argcount
- Handler is entered with push_lasti=True → stack is [..., lasti, exc]
- co_exceptiontable handlers do NOT auto-set sys.exc_info(); we use
  PUSH_EXC_INFO at handler entry to make sys.exc_info() see the live
  exception so _build_throwable can populate the snapshot
- RERAISE 0 (not 1) at the foot — we discard lasti via PUSH_EXC_INFO's
  side effect, then reraise the top-of-stack exception

New methods (alongside existing 3.9/3.10 helpers):
- _build_function_prologue_311
- _build_function_entry_call_311
- _build_function_exit_call_311
- _build_function_unwind_foot_311
- _create_code_with_function_wrap_311 (the splice-walk driver)

Test gate widened: TestBytecodeInjectionEngineFunctionLevel now runs
on 3.9-3.11 (all 10 tests pass), where it previously skipped 3.11.

Cross-version validation:
  3.9:  526 passed
  3.10: 526 passed
  3.11: 526 passed (was 516 — +10 function-level tests no longer skipped)
  3.12: 520 passed
- black / isort / flake8: clean
- Regression harness vs origin/main on 3.11: PASS, 0 unexpected deltas
@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Claude PR Review

Working on review...

  • Read prior review comments to dedupe
  • Read PR diff
  • Read changed files for context
  • Identify findings (code quality, security, perf, tests, docs)
  • Post review

View job run

liustve added 2 commits June 14, 2026 20:49
…e leaks, recursion, perf)

Four targeted fixes from the PR review:

CRITICAL: BIE disable_function_level_instrumentation no-op after undecorate redirect.
  Enable redirects through @functools.wraps to instrument the inner function and
  keys state by id(real_func), but the manager passes the wrapper back on disable
  → id(wrapper) != id(real_func) → state lookup misses, inner __code__ stays
  patched until interpreter exit. Disable now scans by (file_path, qualified_name,
  __globals__) when the direct id lookup misses, finding the same record the
  enable path created. Regression test added: test_function_level_disable_after_
  decorated_enable_restores_inner.

CRITICAL: SME _call_start_ns clobbered on recursion.
  Replaced Dict[(code_id, thread_id), int] with threading.local() LIFO stack of
  (code_id, start_ns) tuples. PY_START pushes; PY_RETURN/PY_UNWIND walks back to
  find the matching code_id and pops. Recursive calls on the same thread now
  preserve outer-frame stamps, so every nested call reports a correct duration.
  No cross-thread sweep needed on disable: per-thread state dies with the thread
  or gets popped on the next event. Two regression tests added.

MEDIUM: BIE _handle_function_event linear scan for instrumentation_type.
  Replaced an O(N) iteration over _injection_states.values() with a parallel
  Dict[(function_key, 0), str] keyed lookup, mirroring SME's existing
  _instrumentation_types dict. Populated on enable, cleared on
  disable/disable_breakpoints_for_function/cleanup.

MEDIUM: SME PY_UNWIND check ordering.
  PY_UNWIND fires globally on every Python exception in the process. Reordered
  the handler to do the cheap _function_keys membership check FIRST, then pop
  the start_ns stack only for our codes. Exception-heavy apps with no
  instrumented functions now pay one dict miss per unwind frame instead of one
  dict miss + a stack walk.

Cross-version validation:
  3.9:  527 passed, 37 skipped (was 526)
  3.10: 527 passed, 37 skipped (was 526)
  3.11: 527 passed, 37 skipped (was 526)
  3.12: 522 passed, 42 skipped (was 520) — +2 SME recursion tests
  black/isort/flake8: clean
  Regression harness vs origin/main: PASS, 0 unexpected deltas
The bytecode-injection engine declines to rewrite generators, coroutines,
async generators, and iterable coroutines because in-place rewriting would
corrupt .send()/.throw()/await semantics. But that's not the end of the
story — the manager observes the engine's False return and routes through
FunctionWrapper.instrument_function, which has a working coroutine path
via _create_async_wrapper. So async functions ARE instrumented end-to-end;
just not by this engine.

The previous naming and comments suggested async was unsupported full-stop.
Renames and clarifications:
- _REFUSAL_MASK -> _DECLINE_BYTECODE_REWRITE_MASK
- _is_unsupported_code -> _should_decline_bytecode_rewrite
- Test names updated: test_function_level_refuses_* -> test_engine_declines_*_so_manager_can_fall_back_to_wrapper
- Log message updated to clarify the manager will route through wrapper

New test: TestApplyFunction.test_engine_decline_falls_back_to_setattr_wrapper.
Locks down the wrapper-fallback half of the contract that the engine tests
only verify on the engine half (asserting the engine returns False).
Confirms that when enable_function_level_instrumentation returns False,
_apply_function calls FunctionWrapper.instrument_function so the
async-wrapper path can wire up async PROBEs.

Cross-version validation:
  3.9:  528 passed (was 527)
  3.10: 528 passed (was 527)
  3.11: 528 passed (was 527)
  3.12: 523 passed (was 522)
black/isort/flake8: clean
@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Claude PR Review

Working on review...

  • Read prior review comments to dedupe
  • Read PR diff (commits since last review)
  • Verify whether prior critical findings were addressed
  • Read changed files for context
  • Identify NEW findings (only)
  • Post review

View job run

@liustve liustve added the skip changelog doesn't need a CHANGELOG entry label Jun 15, 2026
liustve added 3 commits June 14, 2026 21:05
The setattr-wrapper path needed _patch_flask_view_functions because
replacing module.foo via setattr does NOT update Flask's app.view_functions
dict — Flask continues calling the old reference, bypassing instrumentation.

The bytecode-injection / sys.monitoring engines mutate the function's
__code__ in place. Any reference Flask is holding (or any other framework)
sees the rewritten code on the next call. No view_functions patch needed.

Removed from _function_wrapper.py (~139 LOC):
- _patch_framework_references (dispatcher, only had one call)
- _patch_flask_view_functions (Flask app scan + identity/name match)
- _patch_single_flask_app (per-app patcher)
- Call sites in _replace_function_in_module and restore_function

test_function_wrapper_flask.py rewritten in place — same 7 test names,
same intent ("Flask view_functions stays consistent under DI"), just
exercising the engine path instead of the deleted patch helpers. Each
test now:
  1. Sets up a Flask app with a route handler
  2. Arms the handler via engine.enable_function_level_instrumentation
  3. Invokes through Flask's view_functions dict
  4. Asserts the snapshot fired (proving __code__ mutation flowed through
     Flask's stored reference without any explicit patching)

test_function_wrapper_branches.py: dropped TestPatchFlaskViewFunctionsErrors
class (66 LOC) — it tested error branches in the deleted Flask methods.

Cross-version validation:
  3.9:  529 passed
  3.10: 529 passed
  3.11: 529 passed
  3.12: 524 passed
  black/isort/flake8: clean
….environment too

Replace hardcoded "service.name" / "deployment.environment" / "deployment.environment.name"
strings with their OTel constants:
- ResourceAttributes.SERVICE_NAME
- ResourceAttributes.DEPLOYMENT_ENVIRONMENT
- DEPLOYMENT_ENVIRONMENT_NAME (from semconv._incubating, the only path the
  stable name is exposed at today)

_build_resource now writes BOTH deployment.environment and
deployment.environment.name when an environment is set, so consumers reading
either key see the value. Engine _get_environment helpers also read both
(prefer .name, fall back to bare).

All inline OTel imports hoisted to module top:
- instrumentation_manager.py: Resource, ResourceAttributes, DEPLOYMENT_ENVIRONMENT_NAME
- _function_wrapper.py: otel_trace, ResourceAttributes, DEPLOYMENT_ENVIRONMENT_NAME
- _sys_monitoring_engine.py: TracerProvider, ResourceAttributes, DEPLOYMENT_ENVIRONMENT_NAME
- _bytecode_injection_engine.py: same as SME
- _debugger_client.py: + DEPLOYMENT_ENVIRONMENT_NAME

Tests:
- test_returns_resource_when_attrs_present: now asserts both keys
- test_resource_import_failure_returns_none: dropped (premise was an inline
  import that no longer exists; replaced by Resource.create failure test)

Cross-version validation:
  3.9:  528 passed
  3.10: 528 passed
  3.11: 528 passed
  3.12: 523 passed
black/isort/flake8: clean
@liustve liustve changed the title fix(debugger): function-level instrumentation via engine + throwable … fix(debugger): capture exceptions natively via instrumentation engine Jun 15, 2026
liustve added 3 commits June 14, 2026 21:34
…ffix

The 3.9/3.10 builders previously had no version suffix while their 3.11
counterparts did, making the version split harder to scan. Add _39_310
suffixes to the four 3.9/3.10 builders so each family is explicit, and
group the family pairs together (prologue 39_310 + 311, then entry_call,
then exit_call, then unwind_call) instead of listing all 39_310 first
followed by all 311.

Also rename _build_function_unwind_foot_311 -> _build_function_unwind_call_311
for symmetry with the rest.
- pylint E1101 on inspect.CO_GENERATOR / CO_COROUTINE / CO_ASYNC_GENERATOR
  / CO_ITERABLE_COROUTINE: pylint's static analyzer doesn't see these CPython
  runtime constants. Suppress with a per-block no-member disable.

- pylint W0404 / W0621 redefined-outer-name on `from bytecode import Label`:
  Label is already imported at module top inside the IS_BYTECODE_INSTALLED
  guard. Removed the redundant inline import in
  _create_code_with_function_wrap.

- pylint W0404 / W0621 on inline `from amazon...debugger._function_wrapper
  import get_snapshot_emitter`: removed two stale inline re-imports in BIE
  / SME emit-snapshot paths; the engines use the top-level import.

- Engine unit tests retargeted from
  `amazon...debugger._function_wrapper.get_snapshot_emitter` to
  `amazon...debugger.instrumentation_engine._{bytecode_injection,sys_monitoring}_engine.get_snapshot_emitter`
  so mock.patch hits the consumer module's name binding now that the import
  is hoisted. (15 patch-site renames across the two engine test files.)

- codespell 'asend' false positive: added to ignore-words-list in
  .codespellrc. asend is a real Python AsyncGenerator method; codespell
  thinks it's a typo for "ascend".

Cross-version validation:
  3.9:  528 passed
  3.10: 528 passed
  3.11: 528 passed
  3.12: 523 passed
  black / isort / flake8 / pylint (10.00/10) / codespell: clean
CI surfaced more pylint refactor warnings (C0302 too-many-lines,
R0912/14/15 too-many-branches/locals/statements, C0103 invalid-name on
short-loop-counter `i`, R1710 inconsistent-return-statements). Address
each minimally:

- BIE / manager: file-level `# pylint: disable=too-many-lines` (both
  files exceed the 1000-line default and the engine is one cohesive
  unit).
- BIE.enable_function_level_instrumentation,
  BIE.disable_function_level_instrumentation: per-method
  too-many-branches.
- BIE._unique_local_name: rename loop counter `i` -> `suffix`.
- SME._pop_matching_start_ns: rename loop counter `i` -> `idx`.
- SME._function_start_event_handler / _function_return_event_handler:
  add explicit `return None` on guard paths so all returns are
  consistent (R1710).
- SME._handle_function_level_instrumentation: per-method
  too-many-locals,too-many-statements (snapshot construction is
  unavoidably wide).
- _undecorate.undecorated: per-function
  too-many-locals,too-many-branches (BFS through wrappings is one
  cohesive routine).
- manager._apply_function: too-many-locals (added).

Pylint now 10.00/10. All other linters and 528/528/528/523 tests
unchanged.
@liustve liustve changed the title fix(debugger): capture exceptions natively via instrumentation engine fix(debugger): use bytecode/sys.monitoring instead of setattr wrapper Jun 15, 2026
@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Claude finished @liustve's task in 1m 7s —— View job


Claude PR Review

Re-reviewed the single merge commit since the prior review (f2ade10 Merge branch 'main' into di/probe-py-start-engine-rebuild).

The merge is mechanical and brings in only one upstream change:

git diff c301d70..HEAD against debugger paths is empty — no debugger source/test files were touched by the merge.

New findings

None — bias toward silence. The merge is upstream sync only with no behavioral impact on this PR's scope.

  • Read prior review comments to dedupe (no claude-code-review:v1 inline comments)
  • Identify commits since last review (1 merge commit: f2ade10 pulling in ac0c2ee / PR fix: Tag platform images via ECR API to fix wrapper bug in private ECR #759)
  • Verify scope of merged change (release-build.yml only, unrelated to debugger)
  • Confirm no debugger files touched by merge
  • Identify NEW findings only (none)
  • Post review

| | Branch: di/probe-py-start-engine-rebuild | |

@liustve liustve marked this pull request as draft June 15, 2026 19:54
@liustve

liustve commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

Keeping it as draft to give more time to evaluate.

liustve added 2 commits June 22, 2026 23:40
Resolve conflicts by keeping the interpreter-layer instrumentation engine
(bytecode / sys.monitoring) and dropping the framework-specific patching
code that landed on main after this branch forked. The engine mutates a
function's __code__ in place, so any reference a framework holds
(Flask view_functions, Django URLPattern.callback, FastAPI route table /
Depends sub-dependencies, `from x import f` aliases) sees the rewritten
code on the next call — no per-framework patcher needed.

Production code removed (reverted from main):
- _patch_framework_references / _patch_flask_view_functions /
  _patch_single_flask_app
- _patch_django_url_patterns / _patch_single_resolver / _maybe_patch_pattern
- _patch_fastapi_routes / _patch_single_fastapi_app (incl. Depends rebind)
- _patch_module_aliases / _is_application_module / _stdlib_lib_dirs
  (aws-observability#776, aws-observability#780, aws-observability#785, aws-observability#786)

Tests:
- Behavioral contract tests (di/django_test.py, di/fastapi_test.py,
  di/flask_test.py) kept as-is — they validate end-to-end that everything
  still works through the engine with no patching.
- White-box unit tests that called the deleted _patch_* methods directly
  (test_function_wrapper_django.py, test_function_wrapper_fastapi.py,
  TestModuleAliasRedirect) rewritten 1:1 to drive the engine — same
  scenarios, only the operative call changed (mirrors how this branch
  already rewrote test_function_wrapper_flask.py).
- Dropped only the tests of deleted internals with no engine equivalent:
  the _patch_framework_references dispatcher tests, the resolver-walk
  error-branch tests, and TestIsApplicationModule.

Also adopt main's "drop Python 3.9" change (aws-observability#762): engine and manager
gate on >= 3.10; function-level builders renamed _39_310 -> _310;
3.9 references removed across src and tests.

Kept main's non-patching debugger improvements: capture-limit default
constants (aws-observability#771), instrumentation_name field removal (aws-observability#775), OTel 1.42.1
bump (aws-observability#762), and the capture-limits test.
…vability#787 (engine-driven)

PR aws-observability#787 adds DI framework-dispatch fixes on top of main's per-framework
patchers (Starlette route.app rebuild, aiohttp handler rebind). This branch
removed those patchers in favor of in-place __code__ mutation, so port the
NEW framework test coverage from aws-observability#787 in the same engine-driving style this
branch already uses for Flask/Django/FastAPI: arm the handler via the engine,
then invoke through the framework's stored reference and assert it fires.

- test_function_wrapper_starlette.py: route.app closure (pure-Starlette) fires
  via the engine; name+module fallback; FastAPI-subclass handled by its own
  route table; no-Starlette-app no-op. The two ASGI-driving tests assert on the
  handler's own invocation counter rather than the shared snapshot list, since
  on the global sys.monitoring (3.12) path that list can also collect snapshots
  for unrelated code armed by other in-process tests.
- test_function_wrapper_aiohttp.py: route.handler identity, cross-route
  isolation, no-app no-op. (aiohttp is not in dev-requirements, so these SKIP
  in CI — same as on aws-observability#787's branch.)

Dropped only aws-observability#787's pure white-box tests of deleted patcher internals
(_patch_starlette_routes / _patch_aiohttp_routes / _starlette_app_wraps_endpoint,
middleware-skip), consistent with how this branch handled the Flask/Django/
FastAPI patcher tests. aws-observability#787's production-only fixes (functools.partial keying,
LINE_NOT_EXECUTABLE status) are not ported — those are code changes, not tests.

Verified under the exact CI env (tox 4.30.3, 3.12-test-aws-opentelemetry-distro,
distro from built wheel): 2276 passed, 38 skipped, 0 failed.
@aws-observability aws-observability deleted a comment from github-actions Bot Jun 23, 2026
@aws-observability aws-observability deleted a comment from github-actions Bot Jun 23, 2026
@aws-observability aws-observability deleted a comment from github-actions Bot Jun 23, 2026
@aws-observability aws-observability deleted a comment from github-actions Bot Jun 23, 2026
@aws-observability aws-observability deleted a comment from github-actions Bot Jun 23, 2026
@aws-observability aws-observability deleted a comment from github-actions Bot Jun 23, 2026
@aws-observability aws-observability deleted a comment from github-actions Bot Jun 23, 2026
liustve added 6 commits June 23, 2026 00:42
…apture_arguments

Root cause of the DI contract-test failures on Python 3.10/3.11 (the
bytecode-injection engine path; 3.12 sys.monitoring was unaffected).

BytecodeInjectionEngine._build_entry_context read capture_config.capture_locals
to decide what to capture at function entry, but a function-level PROBE/
BREAKPOINT carries its argument list in capture_arguments (capture_locals stays
None — that field is for line-level breakpoints). So for any PROBE configured
with capture_arguments=["x"], the entry handler short-circuited on
`capture_locals is None` and returned None, dropping ALL argument capture. The
snapshot still fired with a return value, but captures.entry was absent — which
is exactly what the contract tests caught:
  - 'items' not found in {}            (test_probe_captures_arguments)
  - Captured string value should not be None / 0 != 20   (capture-limits tests)
  - 'price_cents' not found in {}      (from-import alias test)

The 3.12 sys.monitoring engine reads capture_arguments correctly
(_sys_monitoring_engine.py), as does the legacy wrapper path
(_function_wrapper._capture_entry_context via inspect.signature().bind) — only
the bytecode entry path conflated the two fields. Verified locally on real
CPython 3.10.15 and 3.11.10: the injected locals() dict already contains the
arguments at entry (items=[1,2,3]); the defect was purely field selection, not
bytecode/locals timing.

Fix: _build_entry_context reads capture_config.capture_arguments (None = none,
[] = all, [names] = named), mirroring the sys.monitoring engine; also use the
DEFAULT_MAX_* constants for clamp defaults (were stray literals 3/10).

Tests:
- New regression tests test_function_level_captures_arguments and
  test_function_level_captures_all_arguments_when_empty_list assert
  captures.entry.arguments is populated on the bytecode path. They fail without
  this fix and pass with it. (Each target is defined in its own module
  namespace so the injected handler global doesn't alias across tests.)
- test_function_wrapper_starlette.py: the two ASGI-driving tests now use SYNC
  handlers — the bytecode engine on 3.10/3.11 declines coroutine bodies by
  design (async routes go through the FunctionWrapper fallback), so an async
  handler armed directly via the engine returns False there. Starlette wraps a
  sync endpoint via functools.partial(run_in_threadpool, endpoint), still
  closing over the same function object, so the test premise holds uniformly
  across 3.10–3.12.

Verified: full debugger suite green on 3.10 (536), 3.11 (563), 3.12 (558);
broader distro suite 2166 passed on 3.11. black/isort/flake8 clean.
… in the same module

Root cause of the DI contract-test "Found 0 snapshots" failures on Python
3.10/3.11 (the bytecode-injection engine; 3.12 sys.monitoring unaffected).

disable_function_level_instrumentation has a fallback used when id(func) isn't a
direct key (the decorator-wrapper case). The old fallback scanned ALL injection
states and matched on:
    md.file_path == code.co_filename
    AND candidate.original_code.co_name == candidate.qualified_name.rsplit(".")[-1]
The second clause compares the candidate against ITSELF (always true for a
module-level function), so the scan degenerated to "the first function-level
state whose function shares this module's globals/file." Every function in a
module shares co_filename and __globals__, so disabling one function (e.g. a
line-only breakpoint on `shared_function` during a config update) mis-resolved
to and DISARMED an unrelated function-level PROBE (`compute_total`) — which was
then never re-armed (it reported "unchanged, skipping" on later polls). The
PROBE silently stopped firing => contract tests timed out waiting for snapshots.

This reproduced end-to-end locally (di-flask contract test on 3.11:
DIFlaskProbeTest 6 failed / 1 passed, matching CI) and as a focused unit repro.

Fix: anchor the fallback to the TARGET's identity, mirroring how enable keys the
state. Resolve the real inner function via undecorated(func, name=func.__name__
or code.co_name, path=code.co_filename) and look it up by id(); only if that
misses, fall back to an EXACT original-code identity match. No module-wide
co_name/globals scan, so a sibling can never be mis-disarmed. Preserves the
decorator-unwrap disable contract (test_function_level_disable_after_decorated_
enable_restores_inner still passes).

Tests: new regression test_disable_one_function_does_not_disarm_sibling_in_same_
module — arms one function-level PROBE + one line BP in the same module, disables
the line-BP function, asserts the PROBE stays armed and still captures args. Fails
without this fix, passes with it.

Verified:
- di-flask DIFlaskProbeTest contract class: 7 passed (was 6 failed/1 passed),
  run locally on the rebuilt 3.11 image via Docker.
- Full debugger unit suite: 3.10 (537), 3.11 (564), 3.12 (558), 0 failures.
- black/isort/flake8 clean.
…y; disable async route-handler contract test

Two changes:

1) disable_function_level_instrumentation now resolves the keyed state by the
   TARGET function's identity instead of a co_name/globals heuristic. enable
   stamps the outer symbol the manager holds with a direct reference to the
   resolved (undecorated) function it keyed state under; disable reads that
   reference back and looks the state up by id(). This replaces the refactor in
   the previous commit with a pure identity lookup — no scan over unrelated
   functions that merely share a module (co_filename/__globals__), which was
   mis-disarming a sibling PROBE when another function in the same module was
   updated. The decorator-unwrap disable path is preserved.

2) Temporarily disable DIFastAPIRouteHandlerTest (__test__ = False). Its target
   route_handler_target is an `async def` handler: the bytecode engine declines
   coroutine bodies, so instrumentation falls back to the setattr wrapper, which
   replaces only the module-level name and does not reach the reference FastAPI
   captured in its route table at decoration time — so the async route handler
   produces no snapshot. The sync variant (DIFastAPISyncRouteHandlerTest) is
   instrumented in place by the engine and still passes. Tracked as a known gap;
   re-enable when async route-handler instrumentation reaches framework-held refs.

Verified locally on the rebuilt 3.11 image: di-flask DIFlaskProbeTest 7 passed;
full debugger unit suite green on 3.10/3.11/3.12 (537/564/558). black/flake8 clean.
…ability#787; wire into CI

Adds the pure-Starlette DI contract app (di_starlette_server.py, Dockerfile,
mock_di_api.py, requirements.txt) and starlette_test.py, and adds "starlette" to
the di-contract-tests CI matrix (the setup script auto-discovers the di-starlette
image dir, so no script change is needed).

The app has two DI targets, mirroring aws-observability#787:
- process_data (sync control) — instrumented in place by the engine; proves DI is
  active in a pure-Starlette process.
- starlette_handler (async route handler) — held in Starlette's request_response
  closure on route.app.

DIStarletteFunctionLevelTest (process_data) is active and passes. The
DIStarletteRouteHandlerTest class is disabled (__test__ = False) for the same
reason as the FastAPI async route handler: the bytecode engine declines coroutine
bodies, so an async handler falls back to the setattr wrapper which cannot reach
Starlette's route.app closure -> no snapshot. Re-enable once async route-handler
instrumentation reaches framework-held references.

Verified locally on the rebuilt 3.11 image (Docker): starlette_test.py -> 1 passed
(the sync control; async class deselected). black/isort/flake8 clean.
@liustve liustve force-pushed the di/probe-py-start-engine-rebuild branch 2 times, most recently from 123da06 to 2941c31 Compare June 28, 2026 19:46
liustve added 2 commits June 28, 2026 13:32
…wrap

The bytecode engine declined generators, coroutines, async generators and
iterable coroutines, returning False so the manager fell back to the setattr
FunctionWrapper. For a framework-held async handler (a Starlette route in a
request_response closure, or a FastAPI APIRoute.endpoint reference) setattr only
rebinds the module name and never reaches the stored reference, so async route
handlers produced no snapshot.

Mirror dd-trace-py's debugger: wrap the function body (after GEN_START / RESUME)
rather than declining. Remove the decline mask and route every function kind
through the entry/return/unwind body-wrap.

_bytecode_injection_engine.py:
- Remove _DECLINE_BYTECODE_REWRITE_MASK and _should_decline_bytecode_rewrite.
- 3.10: keep GEN_START first, splice the prologue after it.
- 3.11: detect the preamble as a contiguous run from the top (was scanning to
  the last RESUME, which in a generator/coroutine is the post-yield RESUME);
  include RETURN_GENERATOR / POP_TOP.
- Suppress GeneratorExit / asyncio.CancelledError on the unwind path so closing a
  suspended generator or cancelling a task is not reported as an exception.
- cleanup() also pops the injected _function_event_handler global.

_sys_monitoring_engine.py: same GeneratorExit / CancelledError suppression on
the PY_UNWIND path.

Replace the engine decline unit tests with acceptance/correctness coverage for
generators, coroutines, async generators, send/throw fidelity, exception
capture, and GeneratorExit suppression.

Unit tests: 569 pass on 3.10/3.11, 557 on 3.12. black/isort/flake8/codespell
clean.
…PI + Starlette)

Extend the di-fastapi and di-starlette contract apps with instrumented targets
modeled on canonical FastAPI/Starlette async constructs, and add test classes
asserting DI captures each end-to-end:

- number_stream: a StreamingResponse async-generator body.
- sse_event_stream: a Server-Sent Events async-generator body that breaks on
  client disconnect (await request.is_disconnected()).
- db_session_dependency (FastAPI only): a yield-based async dependency
  (DB-session setup/teardown); Starlette has no Depends.
- fetch_remote_value: an async function awaiting real I/O before returning.
- lookup_or_404: an async function raising HTTPException on the not-found path;
  the test asserts the snapshot body carries the captured throwable and the
  request still returns 404.

Each target is wired to a route (/stream, /sse, /with-db, /await-io, /lookup)
and configured as a DI BREAKPOINT; the test classes hit the route and assert a
method-level snapshot for that target. Re-enable the previously-disabled async
route-handler tests (DIStarletteRouteHandlerTest, DIFastAPIRouteHandlerTest).
The di-contract-tests CI matrix already runs fastapi + starlette on
3.10/3.11/3.12.

Verified locally: both app servers import and expose the new routes/configs, and
the real bytecode (3.10/3.11) and sys.monitoring (3.12) engines each emit one
snapshot per target (throwable captured on the raise path).
black/isort/flake8/codespell clean.
@liustve liustve force-pushed the di/probe-py-start-engine-rebuild branch from add5bf9 to a19002a Compare June 28, 2026 20:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

skip changelog doesn't need a CHANGELOG entry

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant