Skip to content

Fix PDBackendAsync shared-key races in allocation, transfer, and removal paths#323

Merged
hlin99 merged 5 commits into
ww22_PD_race_conditionfrom
copilot/fix-race-conditions-in-pdbackendasync
May 29, 2026
Merged

Fix PDBackendAsync shared-key races in allocation, transfer, and removal paths#323
hlin99 merged 5 commits into
ww22_PD_race_conditionfrom
copilot/fix-race-conditions-in-pdbackendasync

Conversation

Copy link
Copy Markdown

Copilot AI commented May 29, 2026

PDBackendAsync mishandled shared-prefix keys under concurrency: duplicate allocations could overwrite/free in-flight buffers, and remove() could drop entries still needed by other requests. This ports the sync backend’s shared-key semantics to async so dedup, ownership, and lifecycle are consistent.

  • Receiver dedup in allocation path

    • _async_allocate_and_put now checks contains(key, pin=False) per requested key.
    • Duplicate keys are not reallocated; their positions are returned via already_sent_indexes.
    • AllocResponse now carries both:
      • remote_indexes for newly allocated chunks
      • already_sent_indexes for deduped chunks
  • Sender-side skip for deduped chunks

    • _async_transfer_task consumes already_sent_indexes and filters outgoing RDMA writes accordingly.
    • Staging buffers for skipped chunks are released immediately.
    • Only non-deduped keys are tracked/sent; response-size mismatch is guarded.
  • put() no longer overwrites existing key

    • Duplicate put for an existing key now drops the new object (refcount down) instead of replacing/freeing the existing in-flight target.
  • remove() made refcount-aware

    • remove() now inspects current object refcount and deletes/frees only when ref_count == 1.
    • _alloc_freed_condition notification is emitted only on actual removal/free.
  • Race regression test alignment

    • Updated FakePDBackendDataPath in tests/v1/test_pd_backend_async_race.py to mirror new put/remove semantics.
    • Assertions now validate dedup-preserving behavior for shared keys.
# Receiver dedup
if self.contains(key, pin=False):
    already_sent_indexes.append(idx)
    continue

# put(): do not overwrite existing shared key
if key in self.data:
    mem_obj.ref_count_down()
    return
self.data[key] = mem_obj

# remove(): only free when last reference
mem_obj = self.data.get(key, None)
if mem_obj is not None and mem_obj.get_ref_count() == 1:
    del self.data[key]
    mem_obj.ref_count_down()
Original prompt

Problem

PDBackendAsync has two critical race conditions when multiple requests share the same CacheEngineKey (shared prefix scenario). The sync version (PDBackend in pd_backend.py) already handles this correctly, but the async version does not.

Bug 1: put() overwrites and frees in-flight buffer (use-after-free)

In pd_backend_async.py line 1430-1440:

def put(self, key, mem_obj):
    with self.data_lock:
        old = self.data.pop(key, None)
        if old is not None:
            old.ref_count_down()  # ← frees buffer while RDMA may still be writing!
        self.data[key] = mem_obj

When Req A and Req B share the same prefix → same key K:

  1. Receiver allocates obj_A for Req A → put(K, obj_A) → returns obj_A.address to Sender A
  2. Sender A starts RDMA write to obj_A.address
  3. Receiver allocates obj_B for Req B → put(K, obj_B) → frees obj_A!
  4. Sender A's RDMA is now writing to freed memory → use-after-free

Bug 2: remove() unconditionally pops (breaks other request's retrieve)

In pd_backend_async.py line 1470-1482:

def remove(self, key, force=True):
    with self.data_lock:
        mem_obj = self.data.pop(key, None)  # ← unconditionally removes!
        if mem_obj is not None:
            mem_obj.ref_count_down()
            return True

When Req A does remove_after_retrieve on key K, Req B's subsequent get_blocking(K) hits AssertionError: Key ... not found in local data.

Fix (port sync version's approach)

The sync PDBackend (pd_backend.py) already handles this correctly:

1. Dedup in _allocate_and_put (sync line 917-920):

if self.contains(key, pin=False):
    already_send_indexes.append(idx)
    continue  # don't allocate, tell sender to skip RDMA

2. put() does not overwrite (sync line 982-983):

def put(self, key, mem_obj):
    with self.data_lock:
        self.data[key] = mem_obj  # safe because dedup prevents duplicates arriving here

3. remove() checks refcount (sync line 1058-1063):

def remove(self, key, force=True):
    with self.data_lock:
        if mem_obj := self.data.get(key, None):
            if mem_obj.get_ref_count() == 1:
                del self.data[key]
            return True
        return False

Required Changes in lmcache/v1/storage_backend/pd_backend_async.py

Change 1: Add dedup check in _async_allocate_and_put (around line 1320)

Before allocating each key, check if it already exists:

for idx, key_str in enumerate(alloc_request.keys):
    key = CacheEngineKey.from_string(key_str)

    # DEDUP: if key already exists, skip allocation and tell sender to skip RDMA
    if self.contains(key, pin=False):
        already_sent_indexes.append(idx)
        continue

    # ... existing allocation logic ...

And include already_sent_indexes in the response:

return AllocResponse(remote_indexes=alloc_indexes, already_sent_indexes=already_sent_indexes)

Change 2: Fix put() — do not overwrite existing keys (line 1417-1440)

Change to match sync behavior. Since dedup prevents duplicates from reaching put(), we can simplify:

def put(self, key, mem_obj):
    with self.data_lock:
        if key in self.data:
            # Key already exists (race between dedup check and put).
            # Release the new obj instead of overwriting.
            mem_obj.ref_count_down()
            return
        self.data[key] = mem_obj

Change 3: Fix remove() — use refcount check (line 1460-1502)

Change to match sync version's refcount-based removal:

def remove(self, key, force=True):
    with self.data_lock:
        mem_obj = self.data.get(key, None)
        if mem_obj is not None:
            if mem_obj.get_ref_count() == 1:
                del self.data[key]
                mem_obj.ref_count_down()
                # ... notify freed condition ...
            return True
        return False

Note: The _alloc_freed_condition notification should only happen when the entry is actually removed (refcount == 1).

Change 4: Sender side — handle already_sent_indexes in _async_transfer_task

After receiving AllocResponse, filter out already-sent chunks (skip their RDMA):

alloc_response = await self._async_remote_allocate(receiver_id, alloc_request)
already_sent_indexes = set(alloc_response.already_sent_indexes)
remote_indexes = alloc_response.remote_indexes

# Filter out already-sent memory objects
mem_objs_to_send = []
indexes_to_send = []
for idx, (mem_obj, remote_addr) in enumerate(zip(memory_objs, remote_indexes)):
    if idx in already_sent_indexes:
        mem_obj.ref_count_down()  # release sender staging buffer
    else:
        mem_objs_to_send.append(mem_obj)
        indexes_to_send.append(remote_addr)

Tests

Unit tests are already in tests/v1/test_pd_backend_async_race.py. They currently FAIL (demonstrating the bugs). After the fix, they should PA...

This pull request was created from Copilot chat.

Copilot AI changed the title [WIP] Fix race conditions in PDBackendAsync to prevent use-after-free Fix PDBackendAsync shared-key races in allocation, transfer, and removal paths May 29, 2026
Copilot AI requested a review from hlin99 May 29, 2026 01:34
Copilot finished work on behalf of hlin99 May 29, 2026 01:34
…ount release (#324)

* Initial plan

* fix(pd): make dedup pin atomic and decrement remove refcount

* test(pd): align async race tests with refcount decrement semantics

* test(pd): restore requested UT typing and mock specs

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
…n, dedup test coverage (#325)

* Initial plan

* Fix remove() log, add bounds check, change put() log to INFO, add sender dedup tests

* Merge TestSenderSideDedup into test_pd_backend_async_race.py, remove separate file

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
…nsfer (#326)

* Initial plan

* fix(pd_backend_async): include deduped keys in abort cleanup tracking

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
@hlin99 hlin99 marked this pull request as ready for review May 29, 2026 05:39
@hlin99 hlin99 merged commit 6478afb into ww22_PD_race_condition May 29, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants