Skip to content

Always dump stack trace to binary term#47

Closed
jackalcooper wants to merge 14 commits into
mainfrom
fix-dump-stack-trace-env-var
Closed

Always dump stack trace to binary term#47
jackalcooper wants to merge 14 commits into
mainfrom
fix-dump-stack-trace-env-var

Conversation

@jackalcooper
Copy link
Copy Markdown
Contributor

@jackalcooper jackalcooper commented Oct 4, 2025

getenv() is not thread-safe in glibc—if another thread (e.g., Python’s setenv()) modifies the environment concurrently, it can corrupt memory or crash due to reallocation of environment variables.

Summary by CodeRabbit

  • New Features

    • Exceptions may include an optional returned stack trace, now formatted for clearer display.
  • Bug Fixes

    • Error messages append ANSI reset and include relevant source path information when a trace is present.
  • Tests

    • Tests adjusted to verify presence/format of returned traces and updated expected messages.
  • Chores

    • CI workflow updated: newer checkout action, consolidated caches, added dedicated gdb install step, and removed redundant test flags/steps.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Oct 4, 2025

Walkthrough

Add propagation and formatting of native Zig stack traces into Elixir exceptions via a new error_return_trace field; introduce Zig stack-trace formatting utilities; change Zig opaque storage types; update tests to expect trace and ANSI-reset behavior; and update CI to install gdb in a dedicated step and remove an obsolete test step.

Changes

Cohort / File(s) Summary
CI workflow
.github/workflows/elixir.yml
Upgrade actions/checkout to @v5; remove KINDA_PRINT_LINKAGES env and obsolete stacktrace test step; add a dedicated Install gdb apt step; remove inline apt-get calls from test steps and reorder caching.
Elixir error type
lib/kinda/error/nif_call.ex
Extend Kinda.CallError defexception to include :error_return_trace; implement two message/1 clauses: one returning the plain message when trace is nil, another appending \n + ANSI reset + trace when present.
Tests updated
kinda_example/test/kinda_example_test.exs
Update assertions and pattern matches to accept Kinda.CallError with optional error_return_trace; adjust expected messages to include ANSI reset sequences; assert captured messages contain Zig source paths and that error_return_trace exists.
Zig: exception creation & stack formatting
src/beam.zig
Add format_stack_trace helper and internal writeStackTraceToBuffer(environment, stack_trace) to format a Zig StackTrace into a BEAM-friendly binary; populate "error_return_trace" in make_exception when trace formatting succeeds (or nil on failure).
Zig: debug utilities
src/debug.zig
Add stack-trace formatting utilities: formatStackTrace and helpers (formatTraceItem, formatEmptyTraceItem, etc.) to resolve addresses, print file:line:column and symbol info, handle stripped debug-info, and iterate frames safely.
Zig: storage type changes
src/kinda.zig
Change OpaqueStructType.storage and OpaqueField.storage from std.array_list.AlignedManaged(u8, null) to std.array_list.Managed(u8) (public signature changes).

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant BEAM as BEAM / Elixir
    participant NIF as Zig NIF
    participant Formatter as Kinda.CallError.message/1

    rect rgba(120,200,180,0.06)
      Note left of NIF: native error occurs in NIF
      NIF->>NIF: build optional StackTrace (debug)
      NIF->>NIF: writeStackTraceToBuffer(env, stack_trace) -> binary | nil
      NIF->>BEAM: return exception map {"message": msg, "error_return_trace": trace | nil}
    end

    BEAM->>Formatter: construct Kinda.CallError(map)
    alt error_return_trace == nil
        Formatter->>BEAM: message() -> plain message
    else error_return_trace present
        Formatter->>BEAM: message() -> message + "\n" + ANSI_RESET + error_return_trace
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

I nibble bytes beside the trace,
I hop from Zig into Elixir's space,
I tuck the stack into the map so neat,
Reset the colors, make the output sweet,
A rabbit's tiny dev-time feat 🥕🐇

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title succinctly describes the main change of the pull request—always dumping stack traces as binary terms—matching the modifications to remove environment variable controls and embed the error_return_trace field. It is clear and specific enough for team members to understand the primary intent.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix-dump-stack-trace-env-var

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jackalcooper jackalcooper changed the title Fix typo in GHA Always dump stack trace to binary term Oct 5, 2025
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 193d3e0 and 37718e4.

📒 Files selected for processing (5)
  • .github/workflows/elixir.yml (1 hunks)
  • kinda_example/test/kinda_example_test.exs (2 hunks)
  • lib/kinda/error/nif_call.ex (1 hunks)
  • src/beam.zig (1 hunks)
  • src/kinda.zig (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
kinda_example/test/kinda_example_test.exs (1)
lib/kinda/error/nif_call.ex (1)
  • message (7-9)
🔇 Additional comments (6)
src/kinda.zig (1)

13-13: LGTM! Type simplification is appropriate.

The change from std.array_list.AlignedManaged(u8, null) to std.array_list.Managed(u8) is a good simplification. Since the alignment parameter was null (default alignment), using Managed directly is cleaner and equivalent.

Also applies to: 19-19

src/beam.zig (2)

1406-1415: Stack trace formatting implementation looks correct.

The function properly:

  • Creates a managed buffer with appropriate cleanup via defer
  • Formats the stack trace using the adapted writer interface
  • Converts to a BEAM slice term

The use of adaptToNewApi suggests this is handling a Zig standard library API transition, which is a reasonable approach.


1417-1432: LGTM! Exception enhancement integrates cleanly.

The changes properly:

  • Initialize stack_trace to nil as a sensible default
  • Conditionally format the trace with error handling that falls back to nil on failure
  • Extend the exception map with the new error_return_trace field

This provides graceful degradation if trace formatting fails while still propagating stack trace information when available.

kinda_example/test/kinda_example_test.exs (2)

11-11: LGTM! Test assertions properly updated.

The pattern matches correctly account for the new error_return_trace field in Kinda.CallError. Using _ to match any trace value is appropriate since the exact trace content may vary.

Also applies to: 30-30


22-22: LGTM! Message format and trace content assertions are thorough.

The tests verify:

  • ANSI reset sequence is properly included (matching the message/1 implementation)
  • Stack trace contains expected source file paths (src/beam.zig and kinda_example/native/zig-src/main.zig)

These assertions ensure the trace formatting works correctly end-to-end.

Also applies to: 28-29

lib/kinda/error/nif_call.ex (1)

2-2: LGTM! Exception enhancement is well-implemented.

The changes cleanly:

  • Add the error_return_trace field to the exception
  • Handle both nil and present trace cases with pattern matching
  • Include ANSI reset sequence for proper terminal formatting when displaying traces

This implementation aligns with the Zig-side changes and test expectations.

Also applies to: 5-9

Comment thread .github/workflows/elixir.yml
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
src/debug.zig (2)

6-6: Clarify the unused self_debug_info variable.

The global variable self_debug_info is declared but never used in this file. Consider:

  • Removing it if it's dead code
  • Adding a TODO comment if it's intended for future use
  • Moving it to where it's actually used if it belongs elsewhere
-var self_debug_info: ?SelfInfo = null;
-

19-19: Consider documenting the allocator choice.

The code uses std.heap.c_allocator for debug info operations instead of beam.allocator. While this might be intentional for debug information handling (which may need to persist beyond BEAM-managed memory), documenting this choice would help future maintainers understand the reasoning.

Consider adding a comment:

// Use c_allocator for debug info as it may need to persist beyond BEAM memory lifecycle
const symbol_info = module.getSymbolAtAddress(std.heap.c_allocator, address) catch {
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d2bfb94 and b3ad128.

📒 Files selected for processing (3)
  • .github/workflows/elixir.yml (2 hunks)
  • src/beam.zig (1 hunks)
  • src/debug.zig (1 hunks)
🔇 Additional comments (6)
.github/workflows/elixir.yml (1)

67-70: LGTM! Dedicated gdb installation step is correctly implemented.

The extraction of gdb installation into a dedicated step improves workflow organization and readability. The implementation is standard and the placement before gdb-based tests is correct.

Note: There's a past review comment about PR title mismatch, but it references "Fix typo in GHA" while the current PR title is "Always dump stack trace to binary term". The current changes align with the PR's stack trace handling goals, so that past comment may no longer be relevant.

src/beam.zig (3)

1406-1412: Verify the writer type parameter.

The function signature uses *std.io.Writer as the parameter type, which is unusual in Zig code. Typically, writer parameters are either:

  • anytype for generic writers
  • A specific concrete writer type (e.g., std.ArrayList(u8).Writer)

The usage of adaptToNewApi in writeStackTraceToBuffer suggests you're managing a writer API transition in Zig 0.15.1. While this might be correct for your version, ensure this approach is stable and won't break with future Zig updates.

Consider documenting why this specific writer type and API adaptation is needed, as it's non-standard and future maintainers might question it.


1413-1422: LGTM! Good error handling with fallback.

The implementation correctly:

  • Buffers the stack trace output using ArrayList
  • Adapts the writer API to handle compatibility
  • Catches errors and falls back to nil rather than propagating failures

This defensive approach ensures that stack trace formatting failures don't crash the entire error reporting mechanism.


1424-1439: LGTM! Clean integration of stack trace into exceptions.

The modification correctly:

  • Initializes stack_trace as nil by default
  • Conditionally formats the trace when error_trace is provided
  • Adds the new error_return_trace field to the exception map alongside existing fields

This approach maintains backward compatibility while adding the new trace functionality.

src/debug.zig (2)

12-36: Address the commented-out cleanup code.

Line 17 contains a commented-out defer module.deinit(debug_info_allocator.?) which suggests either:

  • A potential memory leak if cleanup is needed but not performed
  • The API changed and cleanup is no longer required
  • The cleanup is handled elsewhere

Please clarify why this cleanup is commented out and whether memory management is correct here. If cleanup is not needed, remove the comment to avoid confusion.


38-64: LGTM! Stack trace formatting is well-implemented.

The function correctly:

  • Handles the case where debug info is stripped
  • Uses a circular buffer approach to iterate through frames
  • Formats each frame with proper indentation and numbering
  • Provides clear fallbacks for missing information

The frame iteration logic properly handles the circular buffer by counting down from the most recent frame, which is the standard way to present stack traces.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b3ad128 and 70053b3.

📒 Files selected for processing (1)
  • src/debug.zig (1 hunks)
🔇 Additional comments (1)
src/debug.zig (1)

48-56: Verify the frame numbering convention.

The current implementation numbers frames in reverse order compared to typical stack trace conventions:

  • Oldest (outermost) frame is printed as #N
  • Newest (innermost) frame is printed as #1

Most languages (C, Python, Rust, etc.) use the opposite convention where #0 is the innermost/newest frame and higher numbers represent outer frames.

This might be intentional for Elixir's conventions, but please verify this is the desired behavior.

Comment thread src/debug.zig
const beam = @import("beam.zig");
const SelfInfo = std.debug.SelfInfo;

var self_debug_info: ?SelfInfo = null;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove unused global variable.

The global variable self_debug_info is never initialized or referenced anywhere in this file. The formatStackTrace function receives debug_info as a parameter instead.

Apply this diff to remove the dead code:

-var self_debug_info: ?SelfInfo = null;
-
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
var self_debug_info: ?SelfInfo = null;
🤖 Prompt for AI Agents
In src/debug.zig around line 6, remove the unused global variable declaration
`var self_debug_info: ?SelfInfo = null;` because it is never initialized or
referenced; update the file by deleting that line so only the active
declarations and the parameter-passed `debug_info` remain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant