Skip to content

Make async_order.request_invoice replay state restart-safe #31

@txalkan

Description

@txalkan

Problem

request_invoice_cache currently lives only inside AsyncOrderState, so it disappears on restart. That breaks idempotent retries for async_order.request_invoice across process restarts.

The current in-memory design has no eviction path for expired replay entries, so the cache can grow without bound over time.

Impact

  • a retry with the same (lsp_node_id, claim_session_id) is only replay-safe within one process lifetime
  • after restart, the original bolt11 response cannot be recovered
  • the cache is only a hot cache, not durable source of truth
  • INBOUND_PAYMENTS_KEY is the payment ledger, not the request/response replay table

Required design

  • KV for persistence
  • single-flight guard for in-process concurrency
  • for cross-restart atomicity, move the replay state to a real table with a transaction-backed write path

Proposed fix

Remove the hot cache entirely and persist request_outbound_invoice replay state in KV instead.

Recommended shape:

  • key by (lsp_node_id, claim_session_id)
  • persist the original request params, the response payload, and expires_at
  • on lookup:
    • if params match and the entry is not expired, replay the stored result
    • if params differ, return stale_flow
    • if expired, return stale_flow and lazily prune the record

Concurrency requirement

The persisted replay lookup must be protected by a single-flight guard per (lsp_node_id, claim_session_id) so that concurrent duplicate requests do not race through the miss path and create two invoices.

This guard is only for in-process serialization. It does not replace durable persistence.

Cleanup strategy

Use KV as source of truth and clean stale replay records by:

  • lazy removal on read when an entry is expired
  • optional periodic sweep over the replay namespace
  • keeping replay TTL separate from the payment ledger TTL

Notes

  • INBOUND_PAYMENTS_KEY should remain the payment ledger, not the replay store
  • KVStoreSync does not provide transactions or compare-and-swap
  • adding a hot cache in front of KV does not solve correctness or atomicity
  • if stronger atomicity is needed, this should move to a dedicated persisted schema or transaction-backed write path

Goal

Make request_invoice idempotency restart-safe and remove the unbounded in-memory replay cache.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions