Skip to content

Conversation

@pablogsal
Copy link
Member

@pablogsal pablogsal commented Dec 20, 2025

Non-blocking sampling reads process memory while the target continues
running, which can produce torn stacks when generators or coroutines
rapidly switch between yield points. Blocking mode uses atomic process
suspension (task_suspend on macOS, NtSuspendProcess on Windows,
PTRACE_SEIZE on Linux) to stop the target during each sample, ensuring
consistent snapshots.

Use blocking mode with longer intervals (1ms+) to avoid impacting the
target too much. The default non-blocking mode remains best for most
cases since it has zero overhead.

📚 Documentation preview 📚: https://cpython-previews--142998.org.readthedocs.build/

@pablogsal pablogsal changed the title gh-142654: Add blocking mode for accurate stack traces in Tachyon gh-138122: Add blocking mode for accurate stack traces in Tachyon Dec 20, 2025
@pablogsal pablogsal force-pushed the fixes branch 4 times, most recently from bb510c0 to 8c21dc9 Compare December 20, 2025 02:57
@pablogsal
Copy link
Member Author

@ivonastojanovic can you take a look?

Non-blocking sampling reads process memory while the target continues
running, which can produce torn stacks when generators or coroutines
rapidly switch between yield points. Blocking mode uses atomic process
suspension (task_suspend on macOS, NtSuspendProcess on Windows,
PTRACE_SEIZE on Linux) to stop the target during each sample, ensuring
consistent snapshots.

Use blocking mode with longer intervals (1ms+) to avoid impacting the
target too much. The default non-blocking mode remains best for most
cases since it has zero overhead.

Also fix a frame cache bug: the cache was including the last_profiled_frame
itself when extending with cached data, but this frame was executing in
the previous sample and its line number may have changed. For example,
if function A was sampled at line 6, then execution continued to line 10
and called B→C, the next sample would incorrectly report A at line 6
(from cache) instead of line 10. The fix uses start_idx + 1 to only trust
frames ABOVE last_profiled_frame — these caller frames are frozen at their
call sites and cannot change until their callees return.

Signed-off-by: Pablo Galindo <pablogsal@gmail.com>

// Extend frame_info with frames from start_idx onwards
PyObject *slice = PyList_GetSlice(entry->frame_list, start_idx, num_frames);
// Extend frame_info with frames ABOVE start_idx (not including it).
Copy link
Member Author

@pablogsal pablogsal Dec 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes a frame cache bug: the cache was including the last_profiled_frame
itself when extending with cached data, but this frame was executing in
the previous sample and its line number may have changed. For example,
if function A was sampled at line 6, then execution continued to line 10
and called B→C, the next sample would incorrectly report A at line 6
(from cache) instead of line 10. The fix uses start_idx + 1 to only trust
frames ABOVE last_profiled_frame: these caller frames are frozen at their
call sites and cannot change until their callees return.

Copy link
Contributor

@ivonastojanovic ivonastojanovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just a couple of nits

collector.collect_failed_sample()
errors += 1
except Exception as e:
print(e)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print(e)

duration_sec = current_time - start_time
break
except (RuntimeError, UnicodeDecodeError, MemoryError, OSError):
except (RuntimeError, UnicodeDecodeError, MemoryError, OSError) as e:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
except (RuntimeError, UnicodeDecodeError, MemoryError, OSError) as e:
except (RuntimeError, UnicodeDecodeError, MemoryError, OSError):

# When consume_generator is on the arithmetic lines (temp1, temp2, etc.),
# fibonacci_generator should NOT be in the stack at all.
# Line numbers are important here - see ARITHMETIC_LINES below.
cls.generator_script = '''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Maybe we could use textwrap.dedent here to keep the string nicely formatted. What do you think?


.. option:: --blocking

Stop the target process during each sample. This ensures consistent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if pause might sound better than stop here

Suggested change
Stop the target process during each sample. This ensures consistent
Pause the target process during each sample. This ensures consistent

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants