Problem
request_invoice_cache currently lives only inside AsyncOrderState, so it disappears on restart. That breaks idempotent retries for async_order.request_invoice across process restarts.
The current in-memory design has no eviction path for expired replay entries, so the cache can grow without bound over time.
Impact
- a retry with the same
(lsp_node_id, claim_session_id) is only replay-safe within one process lifetime
- after restart, the original bolt11 response cannot be recovered
- the cache is only a hot cache, not durable source of truth
INBOUND_PAYMENTS_KEY is the payment ledger, not the request/response replay table
Required design
- KV for persistence
- single-flight guard for in-process concurrency
- for cross-restart atomicity, move the replay state to a real table with a transaction-backed write path
Proposed fix
Remove the hot cache entirely and persist request_outbound_invoice replay state in KV instead.
Recommended shape:
- key by
(lsp_node_id, claim_session_id)
- persist the original request params, the response payload, and
expires_at
- on lookup:
- if params match and the entry is not expired, replay the stored result
- if params differ, return
stale_flow
- if expired, return
stale_flow and lazily prune the record
Concurrency requirement
The persisted replay lookup must be protected by a single-flight guard per (lsp_node_id, claim_session_id) so that concurrent duplicate requests do not race through the miss path and create two invoices.
This guard is only for in-process serialization. It does not replace durable persistence.
Cleanup strategy
Use KV as source of truth and clean stale replay records by:
- lazy removal on read when an entry is expired
- optional periodic sweep over the replay namespace
- keeping replay TTL separate from the payment ledger TTL
Notes
INBOUND_PAYMENTS_KEY should remain the payment ledger, not the replay store
KVStoreSync does not provide transactions or compare-and-swap
- adding a hot cache in front of KV does not solve correctness or atomicity
- if stronger atomicity is needed, this should move to a dedicated persisted schema or transaction-backed write path
Goal
Make request_invoice idempotency restart-safe and remove the unbounded in-memory replay cache.
Problem
request_invoice_cachecurrently lives only insideAsyncOrderState, so it disappears on restart. That breaks idempotent retries forasync_order.request_invoiceacross process restarts.The current in-memory design has no eviction path for expired replay entries, so the cache can grow without bound over time.
Impact
(lsp_node_id, claim_session_id)is only replay-safe within one process lifetimeINBOUND_PAYMENTS_KEYis the payment ledger, not the request/response replay tableRequired design
Proposed fix
Remove the hot cache entirely and persist
request_outbound_invoicereplay state in KV instead.Recommended shape:
(lsp_node_id, claim_session_id)expires_atstale_flowstale_flowand lazily prune the recordConcurrency requirement
The persisted replay lookup must be protected by a single-flight guard per
(lsp_node_id, claim_session_id)so that concurrent duplicate requests do not race through the miss path and create two invoices.This guard is only for in-process serialization. It does not replace durable persistence.
Cleanup strategy
Use KV as source of truth and clean stale replay records by:
Notes
INBOUND_PAYMENTS_KEYshould remain the payment ledger, not the replay storeKVStoreSyncdoes not provide transactions or compare-and-swapGoal
Make
request_invoiceidempotency restart-safe and remove the unbounded in-memory replay cache.