Skip to content

feat: implement BullMQ Dead-Letter Queue (DLQ) and Admin Replay API #132#188

Merged
elizabetheonoja-art merged 2 commits into
SubStream-Protocol:mainfrom
Husten150:feat/observability-clean
Apr 22, 2026
Merged

feat: implement BullMQ Dead-Letter Queue (DLQ) and Admin Replay API #132#188
elizabetheonoja-art merged 2 commits into
SubStream-Protocol:mainfrom
Husten150:feat/observability-clean

Conversation

@Husten150
Copy link
Copy Markdown
Contributor

Description

Context
The SubStream indexer currently lacks fault tolerance for malformed XDR data. A single corrupted ledger block can cause a worker hang, leading to data gaps for merchants. This PR introduces a Dead-Letter Queue (DLQ) to isolate "poison pill" payloads while allowing the ingestion pipeline to maintain 100% uptime.

Changes

Resilient Ingestion Flow: Wrapped the XDR parsing logic in a try/catch block with a 3-tier retry strategy.

DLQ Routing: Implemented automatic routing to a secondary failed-events-dlq after 3 failed attempts. Each entry includes the raw XDR payload, ledger sequence, and the error stack trace.

Non-Blocking Progression: Ensured the last_ingested_ledger pointer is updated even when a job is sent to the DLQ, preventing infinite loops on bad data.

Admin Replay API: Created a POST /admin/dlq/retry endpoint to allow manual re-injection of failed jobs into the primary queue.

Slack Integration: Added a Slack webhook notification that triggers specifically when a job is moved to the DLQ.

Data Retention: Configured BullMQ to retain DLQ messages for 14 days before automatic eviction.

- Add comprehensive Dead Letter Queue system for failed RPC syncs
- Implement Slack Alert Service for real-time notifications
- Create Soroban Event Indexer with DLQ integration
- Add admin API endpoints for DLQ management
- Include comprehensive test suite for all components
- Update configuration with type-safe settings

This provides robust error handling and observability for the Soroban event processing pipeline.
@drips-wave
Copy link
Copy Markdown

drips-wave Bot commented Apr 22, 2026

Hey @Husten150! 👋 It looks like this PR isn't linked to any issue.

If this PR is for one of the issues assigned to you as part of a Wave, please link it to ensure your contribution is tracked properly. You can do this by adding a keyword to the PR description (e.g., Closes #123), or by clicking a button below:

Issue Title
#132 Build Dead-Letter Queue (DLQ) for Failed RPC Syncs Link to this issue
#131 Implement Idempotent Soroban Event Indexer Worker Link to this issue
#134 Sync External Fiat-Price Oracle Cache (SEP-40) Link to this issue
#133 Design PostgreSQL Schema for Subscription Cache Link to this issue

ℹ️ Learn more about linking PRs to issues

@elizabetheonoja-art elizabetheonoja-art merged commit 75dfd90 into SubStream-Protocol:main Apr 22, 2026
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants