[INS-232] Fix S3 Source "panic: runtime error: index out of range" bug #4610
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description:
Recently a bug was reported in our slack community workspace related to S3 scanning. The user faced a
panic: runtime error: index out of rangeerror, originating here.After some investigation, and the fact that this bug surfaced after unit scans were introduced in the source in #4560, I found that this was because the source uses a single
Checkpointerinstance throughout it's lifetime, and sinceChunkUnitcan be called concurrently, this instance was being shared between concurrent runs. Since the checkpointer does not support concurrent page processing, unexpected behaviors can occur.The solution to this is to remove the
Checkpointeras a class-level attribute and use separate instances for each unit/bucket. This works for legacy scans as well because the underlyingsources.Progressin the checkpointer is a pointer, which means all separateCheckpointerinstances share the samesources.Progress.Theoretically this should be enough to fix the bug, but just for safety I added a length check as well.
I didn't add a new test for this because the bug is a result of concurrency and race conditions, so is not very easy to reproduce. The existing test
TestSource_ChunkUnit_Resumption_MultipleBucketsConcurrentis the test which could reproduce this, but again it's not guaranteed.Checklist:
make test-community)?make lintthis requires golangci-lint)?