Skip to content

Add fallback for log truncation issue in Delta source#782

Closed
vinishjail97 wants to merge 1 commit intomainfrom
fallbackForLogTruncation
Closed

Add fallback for log truncation issue in Delta source#782
vinishjail97 wants to merge 1 commit intomainfrom
fallbackForLogTruncation

Conversation

@vinishjail97
Copy link
Copy Markdown
Contributor

Important Read

  • Please ensure the GitHub issue is mentioned at the beginning of the PR

What is the purpose of the pull request

(For example: This pull request implements the sync for delta format.)

Brief change log

(for example:)

  • Fixed JSON parsing error when persisting state
  • Added unit tests for schema evolution

Verify this pull request

(Please pick either of the following options)

This pull request is a trivial rework / code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end.
  • Added TestConversionController to verify the change.
  • Manually verified the change by running a job locally.

return deltaCommitInstant.equals(instant) || deltaCommitInstant.isBefore(instant);
try {
DeltaHistoryManager.Commit deltaCommitAtOrBeforeInstant =
deltaLog.history().getActiveCommitAtTime(Timestamp.from(instant), true, false, true);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3rd #argument seems to be "mustBeRecreatable" and is false. if we just make this to true wont it be enough?

@vinishjail97
Copy link
Copy Markdown
Contributor Author

Closing this draft PR.

The approach here — calling deltaLog.getChanges() to probe for vacuumed commit files — is reasonable in intent but has a meaningful downside: it eagerly opens and partially iterates the commit log on every incremental sync safety check, adding unnecessary I/O overhead on the hot path. As @brishi19791 noted, there may be a simpler fix via the mustBeRecreatable flag on getActiveCommitAtTime.

Additionally, the PR body was never filled out and the implementation was left as a draft without tests. Given it has been idle for ~3 months with the approach still in question, closing for now.

If the VACUUM-induced log truncation issue resurfaces, the fix should come with a clear description of the failure scenario, a unit/integration test that reproduces it, and a more targeted approach (e.g., verifying the mustBeRecreatable flag behavior or checking log file existence directly without iterating changes).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants