Fix item scans around dead letters and lease extension#7
Conversation
Avoid using keyspace range decoding for queue item scans because item keys are tuple-encoded as {priority, vesting_time, id}.
The keyspace decoder expects one complete tuple value and raises when a range scan returns nested tuple key suffixes. Scan raw key ranges by prefix for peek, minimum vesting time, and queue-empty checks instead.
|
Correction after Jason pointed out the tuple encoding behavior: The tuple encoding itself is still suitable for ordered range scans. A tuple key like So the first bug should not be framed as "tuple item keys cannot be range scanned." The more precise issue we reproduced in Beacon is that non-item rows can live under the That still needs an upstream fix, but the fix should be targeted at keeping dead-letter storage outside the |
Store dead-lettered jobs in a sibling queue keyspace and filter item scans by valid item key shape so legacy non-item rows cannot crash or block queue scans. Make lease-extender log capture deterministic in CI.
b6fd0ed to
afa8e3d
Compare
Keep the log-capture tests from waiting through a second extender interval while still flushing Logger inside the capture window.
Summary
items/scan prefixitems/item_keywhen completing or requeueing leased jobsRoot Cause
This PR fixes two queue-consumer failure modes found while running the downstream Beacon app:
https://github.com/ccarvalho-eng/the-beacon
Non-item rows under
items/Tuple-encoded keys can be used for ordered bounded scans. The problem here was not tuple ordering itself.
The queue stored dead-letter rows by deriving the dead-letter keyspace from
keyspaces.items. That placed non-item rows beneath the live item scan prefix. Those keys are encoded as multiple tuple values under theitems/prefix, so item-scan code that decodes each suffix as one{priority, vesting_time, id}key can raise:The fix makes
dead_letter/a sibling ofitems/, and item scans now use raw storage-key ranges with a key-shape filter. That prevents future dead-letter writes from polluting live item scans and lets queues skip legacy malformed rows already present under the old prefix.Stale lease item keys
Lease extension can move an item to a new
{priority, vesting_time, id}key while the manager still holds the original lease struct. Terminal actions previously trusted the caller lease'sitem_keywhen completing or requeueing. If the stored lease had already been updated by the extender, dead-letter or retry paths could clear the stale key and leave an orphaned item behind.Those orphaned rows remain visible to
peek/4, butobtain_lease/5cannot find them by the decoded value's key, so they can block later jobs in the same priority range.The fix makes the stored lease record authoritative for terminal item keys.
Flow
Reproduction
Beacon exposed both bugs while using the Bedrock-backed queue:
Extra data after keywhen item scans encountered non-item dead-letter rows under the olditems/prefixlease_idset, no corresponding lease record, and an item value whose computedItem.key/1did not match the raw storage keyThis PR adds direct regressions for both cases:
Store.peek/4ignores legacy non-item rows under the item scan prefixcomplete/3andrequeue/4use the stored lease item key when the caller lease is staleValidation
git diff --checkCode.string_to_quoted!/1for touched filesLocal
mix testandmix formatare blocked before project tests by the checkout's dev/test dependency toolchain loading under local Elixir/OTP (styler/earmark_parsercompile/load error). GitHub Actions runs the supported matrix and is the authoritative verification for this branch.