-
Notifications
You must be signed in to change notification settings - Fork 412
[WIP] disagg: better cache downloading #10637
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[WIP] disagg: better cache downloading #10637
Conversation
Signed-off-by: JaySon-Huang <tshent@qq.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@coderabbitai review |
Signed-off-by: JaySon-Huang <tshent@qq.com>
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
📝 WalkthroughWalkthroughModified S3 file cache download implementation to use buffered, rate-limited writes via a new Changes
Sequence DiagramsequenceDiagram
participant S3 as S3 Object Storage
participant ReadBuf as ReadBufferFromIStream
participant WriteLim as PosixWritableFile<br/>(with RateLimiter)
participant TempFS as Temporary File
participant FinalFS as Final Cache Location
rect rgb(220, 240, 255)
Note over S3,ReadBuf: Initialize S3 stream
S3->>ReadBuf: GetObject (istream)
end
rect rgb(240, 255, 240)
Note over ReadBuf,WriteLim: Buffered, Rate-Limited Download
loop Chunked reads with rate limiting
ReadBuf->>ReadBuf: Read chunk from buffer
ReadBuf->>WriteLim: Write chunk (rate limiter gates throughput)
WriteLim->>TempFS: Write to temp file
WriteLim->>WriteLim: Check rate limit quota
end
end
rect rgb(255, 240, 240)
Note over WriteLim,TempFS: Finalization
WriteLim->>TempFS: fsync (ensure durability)
WriteLim->>WriteLim: Report file size
end
rect rgb(255, 250, 220)
Note over TempFS,FinalFS: Atomic Finalization
TempFS->>FinalFS: rename(temp_file, final_file)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (2 warnings, 1 inconclusive)
✅ Passed checks (2 passed)
✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
dbms/src/Storages/S3/FileCache.cpp
🧰 Additional context used
🪛 Clang (14.0.6)
dbms/src/Storages/S3/FileCache.cpp
[error] 989-989: parameter 'istr' is unused
(misc-unused-parameters,-warnings-as-errors)
[error] 990-990: 3 adjacent parameters of 'downloadToLocal' of similar type are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 990-990: parameter 'fname' is unused
(misc-unused-parameters,-warnings-as-errors)
[error] 991-991: parameter 'content_length' is unused
(misc-unused-parameters,-warnings-as-errors)
[error] 992-992: parameter 'write_limiter' is unused
(misc-unused-parameters,-warnings-as-errors)
[error] 996-996: variable 'ofile' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 998-998: statement should be inside braces
(readability-braces-around-statements,-warnings-as-errors)
[error] 1002-1002: variable 'total_written' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 1004-1004: variable 'rbuf' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 1007-1007: variable 'count' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 1008-1008: if with identical then and else branches
(bugprone-branch-clone,-warnings-as-errors)
[error] 1008-1008: variable 'write_res' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
🔇 Additional comments (4)
dbms/src/Storages/S3/FileCache.cpp (4)
24-24: LGTM! New includes support the buffered download implementation.The added headers provide the necessary functionality for implementing rate-limited, buffered writes to local files during S3 downloads.
Also applies to: 26-26, 35-35, 38-38, 42-42
1003-1004: Verify: Is the 16KB buffer size intentional for rate limiting granularity?The buffer size of 16KB seems smaller than typical S3 download buffers (which are often 64KB-1MB). However, given the PR's objective to implement fine-grained rate limiting to prevent I/O bursts, this smaller buffer size may be intentional to provide better rate-limiting granularity.
Could you confirm whether:
- The 16KB size is a deliberate choice for finer-grained rate limiting control?
- Performance testing shows this size provides adequate throughput while preventing bursts?
1052-1055: LGTM! Proper temp-file-then-rename pattern for atomic downloads.Downloading to a temporary file first and then renaming to the final destination is a good practice that ensures:
- Atomic file replacement (on most filesystems)
- Incomplete files are never visible to readers
- Proper cleanup on failure (temp files are cleaned up during restore)
1065-1071: LGTM! Appropriate log level for download completion.Changing the log level from DEBUG to INFO for successful download completion is appropriate, as this represents a significant operation that should be visible in standard operational logs for monitoring and troubleshooting.
| while (!rbuf.eof()) | ||
| { | ||
| size_t count = rbuf.buffer().end() - rbuf.position(); | ||
| if (ssize_t write_res = ofile.write(rbuf.position(), count); write_res < 0) | ||
| { | ||
| throwFromErrno(fmt::format("write to file failed, fname={}", fname), write_res, errno); | ||
| } | ||
| else | ||
| { | ||
| total_written += write_res; | ||
| } | ||
| rbuf.position() += count; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find and examine PosixWritableFile::write() implementation
find . -type f -name "*.cpp" -o -name "*.h" | head -20Repository: pingcap/tiflash
Length of output: 1009
🏁 Script executed:
#!/bin/bash
# Search for PosixWritableFile class definition
rg "class PosixWritableFile" -A 20Repository: pingcap/tiflash
Length of output: 2242
🏁 Script executed:
#!/bin/bash
# Find the file containing PosixWritableFile
find . -type f \( -name "*.cpp" -o -name "*.h" \) -exec rg -l "PosixWritableFile" {} \;Repository: pingcap/tiflash
Length of output: 694
🏁 Script executed:
#!/bin/bash
# Look for the write method in PosixWritableFile
rg "PosixWritableFile.*write" -B 2 -A 15Repository: pingcap/tiflash
Length of output: 3460
🏁 Script executed:
#!/bin/bash
# Check the actual code at the specified file and lines
wc -l dbms/src/Storages/S3/FileCache.cppRepository: pingcap/tiflash
Length of output: 98
🏁 Script executed:
#!/bin/bash
# Read the actual code at FileCache.cpp around lines 1005-1017
sed -n '988,1025p' dbms/src/Storages/S3/FileCache.cppRepository: pingcap/tiflash
Length of output: 1501
🏁 Script executed:
#!/bin/bash
# Check the PosixWritableFile implementation more closely
cat dbms/src/IO/BaseFile/PosixWritableFile.cpp | grep -A 10 "ssize_t PosixWritableFile::write"Repository: pingcap/tiflash
Length of output: 356
Critical: Partial writes not handled correctly—data corruption risk.
The implementation has a critical bug in the write loop. PosixWritableFile::write() directly returns the result of ::write(fd, buf, size), which may write fewer bytes than requested (a valid POSIX behavior for partial writes). However, line 1016 advances the buffer position by the full count (requested bytes) regardless of write_res (actual bytes written).
Impact: If a partial write occurs—e.g., ::write() returns 50 out of 100 requested bytes—the code advances the buffer position by 100, causing the next buffer refill to skip 50 unwritten bytes. This results in data loss and file corruption.
The fix requires looping until all requested bytes are written, adjusting the offset after each partial write:
Proposed fix
while (!rbuf.eof())
{
size_t count = rbuf.buffer().end() - rbuf.position();
- if (ssize_t write_res = ofile.write(rbuf.position(), count); write_res < 0)
- {
- throwFromErrno(fmt::format("write to file failed, fname={}", fname), write_res, errno);
- }
- else
- {
- total_written += write_res;
- }
- rbuf.position() += count;
+ size_t bytes_to_write = count;
+ size_t bytes_written_total = 0;
+ while (bytes_written_total < bytes_to_write)
+ {
+ ssize_t write_res = ofile.write(rbuf.position() + bytes_written_total, bytes_to_write - bytes_written_total);
+ if (write_res < 0)
+ {
+ throwFromErrno(fmt::format("write to file failed, fname={}", fname), write_res, errno);
+ }
+ bytes_written_total += write_res;
+ total_written += write_res;
+ }
+ rbuf.position() += bytes_written_total;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| while (!rbuf.eof()) | |
| { | |
| size_t count = rbuf.buffer().end() - rbuf.position(); | |
| if (ssize_t write_res = ofile.write(rbuf.position(), count); write_res < 0) | |
| { | |
| throwFromErrno(fmt::format("write to file failed, fname={}", fname), write_res, errno); | |
| } | |
| else | |
| { | |
| total_written += write_res; | |
| } | |
| rbuf.position() += count; | |
| } | |
| while (!rbuf.eof()) | |
| { | |
| size_t count = rbuf.buffer().end() - rbuf.position(); | |
| size_t bytes_to_write = count; | |
| size_t bytes_written_total = 0; | |
| while (bytes_written_total < bytes_to_write) | |
| { | |
| ssize_t write_res = ofile.write(rbuf.position() + bytes_written_total, bytes_to_write - bytes_written_total); | |
| if (write_res < 0) | |
| { | |
| throwFromErrno(fmt::format("write to file failed, fname={}", fname), write_res, errno); | |
| } | |
| bytes_written_total += write_res; | |
| total_written += write_res; | |
| } | |
| rbuf.position() += bytes_written_total; | |
| } |
🧰 Tools
🪛 Clang (14.0.6)
[error] 1007-1007: variable 'count' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 1008-1008: if with identical then and else branches
(bugprone-branch-clone,-warnings-as-errors)
[error] 1008-1008: variable 'write_res' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
🤖 Prompt for AI Agents
In dbms/src/Storages/S3/FileCache.cpp around lines 1005 to 1017, the write loop
advances the buffer position by the full requested count even when
PosixWritableFile::write() returns a smaller number (partial write), causing
data loss; change the logic to loop until the entire requested count is written:
on each iteration call write() with the current pointer and remaining bytes,
check write_res for negative (throw on error other than transient EINTR/EAGAIN
where you should retry) and for zero (treat as error), add only write_res to
total_written, advance rbuf.position() by write_res, subtract write_res from
remaining count and continue until remaining is zero before refilling the
buffer.
What problem does this PR solve?
Issue Number: close #10633
Problem Summary:
What is changed and how it works?
Check List
Tests
Side effects
Documentation
Release note
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.