Skip to content

Conversation

@majisourav99
Copy link
Contributor

Problem Statement

The current cleanup logic deletes backup versions without verifying whether the configured minimum retention period has elapsed. As a result, in scenarios where a newer ingestion succeeds after a failed attempt, relatively recent backup versions may be removed prematurely.

Solution

This change moves the explicit check to ensure that a backup version is eligible for deletion only after the minimum retention time has been satisfied. By enforcing retention guarantees, we avoid unintended data loss and ensure backup versions remain available for recovery during the expected retention window.

Code changes

  • Added new code behind a config. If so list the config names and their default values in the PR description.
  • Introduced new log lines.
    • Confirmed if logs need to be rate limited to avoid excessive logging.

Concurrency-Specific Checks

Both reviewer and PR author to verify

  • Code has no race conditions or thread safety issues.
  • Proper synchronization mechanisms (e.g., synchronized, RWLock) are used where needed.
  • No blocking calls inside critical sections that could lead to deadlocks or performance degradation.
  • Verified thread-safe collections are used (e.g., ConcurrentHashMap, CopyOnWriteArrayList).
  • Validated proper exception handling in multi-threaded code to avoid silent thread termination.

How was this PR tested?

  • New unit tests added.
  • New integration tests added.
  • Modified or extended existing tests.
  • Verified backward compatibility (if applicable).

Does this PR introduce any user-facing or breaking changes?

  • No. You can skip the rest of this section.
  • Yes. Clearly explain the behavior change and its impact.

return false;
}

long minRetentionThreshold = store.getLatestVersionPromoteToCurrentTimestamp() + minBackupVersionCleanupDelay;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add more details in the PR description on exactly what scenario the patch is trying to fix and add the corresponding unit test for it? e.g. is the scenario that we are trying to solve:
t1: [(v1, current), (v2, failed), (v3, future)]
t2: [(v1, backup), (v2, failed), (v3, current)]

At t1 will be allowed to delete both v1 and v2 regardless of the minimum retention since getLatestVersionPromoteToCurrentTimestamp?

Regardless, I think this approach of only relying on getLatestVersionPromoteToCurrentTimestamp is error prone given the various scenarios a store's version list can get into with deferred version swap. I'd recommend implementing a cleaner solution like to what we discussed where with current version counter/generationId as a version config to help the cleanup task cherry pick the right version to delete (or can ask ClaudeCode to see if it has better ideas).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants