Skip to content

Bug: Pagination cursor uses non-unique key (claim-timestamp with second precision), causing items to be skipped during list traversal #581

Description

@AiRanthem

Background

Batch create 10,000 sandboxes, wait for all to become ready, then paginate through them via GET /v2/sandboxes. Only ~9,600 are returned; ~400 sandboxes are missing. Confirmed via kubectl that all 10,000 sandboxes exist and are in healthy status; individual GET requests can also find these "missing" sandboxes.

The pagination implementation in pkg/utils/pagination/pagination.go uses the claim-timestamp annotation as the sort and cursor key. This timestamp is generated via time.Now().Format(time.RFC3339) (see pkg/sandbox-manager/infra/sandboxcr/claim.go:655), which has second-level precision.

The pagination logic uses strict greater-than comparison to locate the start of the next page:

startIdx = sort.Search(len(items), func(i int) bool {
    return p.GetKey(items[i]) > p.NextToken
})

When sandboxes are created concurrently in bulk, multiple sandboxes claimed within the same second share an identical timestamp. If the last record on a page has timestamp T, the next page skips all remaining records with timestamp == T and jumps directly to the first record with timestamp > T.

Issue Type

bug

Relevant Code

  • pkg/utils/pagination/pagination.go:68-70 — strict greater-than comparison
  • pkg/servers/e2b/list.go:143-145 — GetKey uses claim-timestamp
  • pkg/sandbox-manager/infra/sandboxcr/claim.go:655 — timestamp written in RFC3339 (second precision)
  • GET /v2/sandboxes ListSandboxes (pkg/servers/e2b/list.go:136-146)
  • GET /snapshots ListSnapshots (pkg/servers/e2b/list.go:278-290)
  • Any scenario using pagination.Paginator where GetKey returns non-unique values

Notes

Suggested fix direction: The pagination cursor key must guarantee uniqueness. Recommended approach: change GetKey to claimTime + "\x00" + sandboxID, leveraging the uniqueness of sandboxID as a tie-breaker to completely eliminate the duplicate-key skipping problem. Simply increasing time precision (e.g., RFC3339Nano) can only mitigate but not fundamentally resolve the issue.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions