fix(csv): handle trailing empty field at EOF without newline in COPY FROM#644
Closed
chiangchenghsin-hash wants to merge 1 commit into
Closed
Conversation
When a CSV file ends with a row like "a,b," (no trailing newline, last field empty), the parser dropped the empty field, causing "expected N values per row, but got N-1". Add an else-if branch in final_state to emit the empty value when column > 0.
This was referenced Jul 3, 2026
Contributor
|
The windows style newlines in the code makes the diff hard to review. I ran |
Contributor
|
Thank you for your contribution! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes a CSV
COPY FROMparser bug where a trailing empty field at end-of-file (without newline) is silently dropped, causingexpected N values per row, but got N-1.Bug
File:
src/processor/operator/persistent/reader/csv/base_csv_reader.cpp—parseCSV()final_stateWhen a CSV file ends with a row like:
The parser reaches
final_statewithposition == start(both at EOF). The existing code only adds a remaining value whenposition > start, so the empty trailing field is skipped →columnundercounts by 1 →addRow()throws "expected 3 values per row, but got 2".Workaround users currently need
Quote the empty field as
""— this makesposition > starttrue (parser moves past the closing quote), so the value is added. But this is fragile and not obvious.Fix
Add an
else if (column > 0)branch infinal_stateto emit an empty value when the file ends right after a delimiter:final_state: lineContext.setEndOfLine(getFileOffset()); if (position > start) { // Add remaining value to chunk. if (!addValue(driver, curRowIdx, column, std::string_view(buffer.get() + start, position - start - hasQuotes), escapePositions)) { return {curRowIdx, numErrors}; } column++; + } else if (column > 0) { + // File ends right after a delimiter with an empty trailing field. + // Without this, a row like "a,b," (no trailing newline) loses its last + // empty field and undercounts columns, causing "expected N, got N-1". + if (!addValue(driver, curRowIdx, column, std::string_view{}, escapePositions)) { + return {curRowIdx, numErrors}; + } + column++; } if (column > 0) { curRowIdx += driver.addRow(curRowIdx, column, getOptionalWarningData<Driver>(columnInfo, option, getWarningSourceData())); }Why this is safe
The
else if (column > 0)branch only fires when all three are true:position == start(file ended with no partial value to add)column > 0(we're mid-row, i.e. just past a delimiter — not at a clean row boundary)final_state(EOF)This precisely matches "file ends right after a delimiter with an empty trailing field" and nothing else. Rows ending with
\nare handled byadd_row(which resetscolumnto 0), so they don't hit this branch.Verification
Tested with 4 variants — only the no-trailing-newline case was broken, now all pass:
""expected 3, got 2Platform impact
Platform-agnostic. Affects all CSV
COPY FROMusers whose source files lack a trailing newline on the last row — a common situation when CSVs are generated programmatically (many emitters don't add a final\n).