fix: catch panics from external light-* crates in V2 parser#340
Closed
i-am-bert wants to merge 4 commits into
Closed
fix: catch panics from external light-* crates in V2 parser#340i-am-bert wants to merge 4 commits into
i-am-bert wants to merge 4 commits into
Conversation
Wraps parse_public_transaction_event_v2 call in catch_unwind to prevent panics in external Light Protocol dependencies (e.g. light-event) from killing the ingestor task and halting indexing. On caught panic: - Logs tx signature, slot, and panic message at error level - Emits parser_panic_caught StatsD metric for monitoring - Returns None, skipping the problematic transaction - Indexing continues with the next transaction This fixes the active incident where light-event v0.21.0 panics with 'index out of bounds: the len is 3 but the index is 3' at slot 407265370, causing both pitt and fra ingestors to be permanently stuck.
…indices Root cause: create_nullifier_queue_indices() sizes nullifier_queue_indices to batch_input_accounts.len() (filtered nullifiers), but iterates over ALL input_compressed_accounts by index. When some nullifiers are filtered out, input_compressed_accounts.len() > batch_input_accounts.len(), causing an index-out-of-bounds panic. Fix: size the vec to input_merkle_tree_pubkeys.len() (the actual array being iterated). The caller zips with batch_input_accounts, so extra entries are harmlessly ignored. Uses [patch.crates-io] to override light-event with patched local copy until upstream fixes this in a new release.
Per Het: Solana is source of truth. Skipping a transaction risks serving stale/incorrect data. Halting is the lesser evil. The vendored light-event fix resolves the root cause so no transactions are skipped.
Contributor
Author
|
Superseded by #341 (clean version bump to light-event 0.24). The vendored patch approach is no longer needed. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Both pitt and fra Photon ingestors stuck at slot 407265370.
light-eventv0.21.0 panics increate_batched_transaction_event→create_nullifier_queue_indiceswithindex out of bounds: the len is 3 but the index is 3.Root Cause
create_nullifier_queue_indices()inlight-eventhas a size mismatch bug:batch_input_accountsis built by filtering nullifiers to only those matchinginput_sequence_numberscreate_nullifier_queue_indices(len)receiveslen = batch_input_accounts.len()(the filtered count)nullifier_queue_indices = vecinput_compressed_accountsby indexi, accessingnullifier_queue_indices[i]input_compressed_accounts.len() > batch_input_accounts.len()(i.e. some nullifiers were filtered out),iexceedslen→ panicThis bug exists in both v0.21.0 and v0.23.0 of
light-event.graph TD A[nullifiers array - 4 items] -->|filter by input_sequence_numbers| B[batch_input_accounts - 3 items] B -->|.len| C["len = 3"] C --> D["nullifier_queue_indices = vec![MAX; 3]"] E[input_compressed_accounts - 4 items] -->|iterate by index i| F["i goes 0,1,2,3"] F -->|"access [3]"| D D -->|"len is 3, index is 3"| G["💥 PANIC"]Fix (two layers)
1. Root cause fix: Patch
light-event(light-event-patched/)Vendored
light-eventv0.21.0 with[patch.crates-io]override. Fixedcreate_nullifier_queue_indicesto size the vec toinput_merkle_tree_pubkeys.len()(the array actually being indexed), not the filteredbatch_input_accountscount. The caller zips withbatch_input_accounts, so extra entries are harmlessly ignored.2. Safety net:
catch_unwindin V2 parser (src/ingester/parser/mod.rs)Wraps
parse_public_transaction_event_v2()incatch_unwind(AssertUnwindSafe(...))to prevent any future panic from externallight-*crates from halting the ingestor. On caught panic: logs tx sig + slot + panic message, emitsparser_panic_caughtStatsD metric, returnsNone.Testing
cargo checkpasses cleanlyUpstream
Bug exists in light-event v0.21.0 and v0.23.0. Will file upstream issue on lightprotocol/light-protocol.