Skip to content

Conversation

@mustansir14
Copy link
Contributor

Description:

Recently a bug was reported in our slack community workspace related to S3 scanning. The user faced a panic: runtime error: index out of range error, originating here.

After some investigation, and the fact that this bug surfaced after unit scans were introduced in the source in #4560, I found that this was because the source uses a single Checkpointer instance throughout it's lifetime, and since ChunkUnit can be called concurrently, this instance was being shared between concurrent runs. Since the checkpointer does not support concurrent page processing, unexpected behaviors can occur.

The solution to this is to remove the Checkpointer as a class-level attribute and use separate instances for each unit/bucket. This works for legacy scans as well because the underlying sources.Progress in the checkpointer is a pointer, which means all separate Checkpointer instances share the same sources.Progress.

Theoretically this should be enough to fix the bug, but just for safety I added a length check as well.

I didn't add a new test for this because the bug is a result of concurrency and race conditions, so is not very easy to reproduce. The existing test TestSource_ChunkUnit_Resumption_MultipleBucketsConcurrent is the test which could reproduce this, but again it's not guaranteed.

Checklist:

  • Tests passing (make test-community)?
  • Lint passing (make lint this requires golangci-lint)?

@mustansir14 mustansir14 requested a review from a team December 18, 2025 13:24
@mustansir14 mustansir14 requested a review from a team as a code owner December 18, 2025 13:24
Copy link
Contributor

@mcastorina mcastorina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find and fix! Just had one question about a timeout that was changed.

Comment on lines 535 to 536
const getObjectTimeout = 30 * time.Second
const getObjectTimeout = 30 * time.Minute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a significant difference! What's the reason it was bumped up?

Copy link
Contributor Author

@mustansir14 mustansir14 Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this, It's a mistake on my part. Some of the integration tests were timing out for me because of this so I temporarily bumped it up. I'll revert this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants