Skip to content

fix(csv): handle trailing empty field at EOF without newline in COPY FROM#644

Closed
chiangchenghsin-hash wants to merge 1 commit into
LadybugDB:mainfrom
chiangchenghsin-hash:fix/csv-trailing-empty-field
Closed

fix(csv): handle trailing empty field at EOF without newline in COPY FROM#644
chiangchenghsin-hash wants to merge 1 commit into
LadybugDB:mainfrom
chiangchenghsin-hash:fix/csv-trailing-empty-field

Conversation

@chiangchenghsin-hash

Copy link
Copy Markdown
Contributor

Summary

Fixes a CSV COPY FROM parser bug where a trailing empty field at end-of-file (without newline) is silently dropped, causing expected N values per row, but got N-1.

Bug

File: src/processor/operator/persistent/reader/csv/base_csv_reader.cppparseCSV() final_state

When a CSV file ends with a row like:

id,name,ver
1,a,
2,b,      ← no trailing newline, last field empty

The parser reaches final_state with position == start (both at EOF). The existing code only adds a remaining value when position > start, so the empty trailing field is skipped → column undercounts by 1 → addRow() throws "expected 3 values per row, but got 2".

Workaround users currently need

Quote the empty field as "" — this makes position > start true (parser moves past the closing quote), so the value is added. But this is fragile and not obvious.

Fix

Add an else if (column > 0) branch in final_state to emit an empty value when the file ends right after a delimiter:

     final_state:
         lineContext.setEndOfLine(getFileOffset());
         if (position > start) {
             // Add remaining value to chunk.
             if (!addValue(driver, curRowIdx, column,
                     std::string_view(buffer.get() + start, position - start - hasQuotes),
                     escapePositions)) {
                 return {curRowIdx, numErrors};
             }
             column++;
+        } else if (column > 0) {
+            // File ends right after a delimiter with an empty trailing field.
+            // Without this, a row like "a,b," (no trailing newline) loses its last
+            // empty field and undercounts columns, causing "expected N, got N-1".
+            if (!addValue(driver, curRowIdx, column, std::string_view{}, escapePositions)) {
+                return {curRowIdx, numErrors};
+            }
+            column++;
         }
         if (column > 0) {
             curRowIdx += driver.addRow(curRowIdx, column,
                 getOptionalWarningData<Driver>(columnInfo, option, getWarningSourceData()));
         }

Why this is safe

The else if (column > 0) branch only fires when all three are true:

  1. position == start (file ended with no partial value to add)
  2. column > 0 (we're mid-row, i.e. just past a delimiter — not at a clean row boundary)
  3. We reached final_state (EOF)

This precisely matches "file ends right after a delimiter with an empty trailing field" and nothing else. Rows ending with \n are handled by add_row (which resets column to 0), so they don't hit this branch.

Verification

CREATE NODE TABLE t (id INT64, name STRING, ver STRING, PRIMARY KEY(id));
-- CSV content: "id,name,ver\n1,a,\n2,b,"  (no trailing newline)
COPY t FROM 'test.csv' (HEADER=true);
MATCH (n:t) RETURN count(n);  -- Before: error "expected 3, got 2". After: 2 rows copied

Tested with 4 variants — only the no-trailing-newline case was broken, now all pass:

CSV variant Before After
LF + trailing newline
CRLF + trailing newline
Quoted empty ""
No trailing newline + empty last field expected 3, got 2 ✅ 2 rows

Platform impact

Platform-agnostic. Affects all CSV COPY FROM users whose source files lack a trailing newline on the last row — a common situation when CSVs are generated programmatically (many emitters don't add a final \n).

When a CSV file ends with a row like "a,b," (no trailing newline, last
field empty), the parser dropped the empty field, causing "expected N
values per row, but got N-1". Add an else-if branch in final_state to
emit the empty value when column > 0.
@adsharma

adsharma commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

The windows style newlines in the code makes the diff hard to review. I ran dos2unix and added a test case in #648

@adsharma adsharma closed this Jul 3, 2026
@adsharma

adsharma commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Thank you for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants