fix(#399): ARRAY column causes panic in Storage Read API Arrow serialization#496
fix(#399): ARRAY column causes panic in Storage Read API Arrow serialization#496lraigosov wants to merge 3 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR fixes how list values are appended to Arrow ListBuilder to prevent row-count mismatch panics, and adds a regression test to ensure the behavior stays correct.
Changes:
- Fix list-slot creation by calling
listBuilder.Append(true)once per row (not once per element). - Add a regression test covering multiple rows with varying list lengths.
- Add explanatory comments documenting the Arrow
ListBuilderslot semantics and the root cause of issue #399.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| internal/types/types.go | Fixes list slot creation logic for Arrow ListBuilder by moving Append(true) outside the element loop. |
| internal/types/types_test.go | Adds a regression test to catch row-count mismatch panics and validate per-row list lengths. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Append(true) opens one list slot for this row. All element appends to | ||
| // ValueBuilder() that follow belong to this slot. Calling Append again | ||
| // (for the next row) implicitly closes the current slot. Placing this | ||
| // call inside the element loop is the root cause of issue #399: it opens | ||
| // N slots instead of 1, causing a row-count mismatch panic in NewRecord. | ||
| listBuilder.Append(true) | ||
| b := listBuilder.ValueBuilder() | ||
| for _, vv := range v { | ||
| listBuilder.Append(true) | ||
| if err := vv.AppendValueToARROWBuilder(b); err != nil { | ||
| return err | ||
| } |
There was a problem hiding this comment.
In BigQuery, a column of mode REPEATED is always non-null by design: the BigQuery API guarantees a REPEATED field is either empty or populated, never null. Calling \Append(true)\ (non-null) is therefore always correct here; \AppendNull()\ would produce an Arrow null list that has no representation in BigQuery's type system.
A typed-nil []*TableCell\ stored in the \interface{}\ field \V\ means empty REPEATED field — not null. Ranging over a nil slice yields 0 iterations, so the result is a valid, non-null, zero-length list slot — exactly what BigQuery expects.
I've updated the comment above \Append(true)\ in commit \35660fc\ to document this BigQuery invariant explicitly, and added a nil-slice test case (see the reply on the test comment).
| rows := []struct { | ||
| cells []*TableCell | ||
| wantLen int | ||
| }{ | ||
| { | ||
| cells: []*TableCell{{V: "1"}, {V: "2"}, {V: "3"}}, | ||
| wantLen: 3, | ||
| }, | ||
| { | ||
| cells: []*TableCell{}, | ||
| wantLen: 0, | ||
| }, | ||
| { | ||
| cells: []*TableCell{{V: "4"}}, | ||
| wantLen: 1, | ||
| }, | ||
| } |
There was a problem hiding this comment.
Both gaps are addressed in commit \35660fc:
-
Nil vs empty: Added a fourth row with \cells: nil\ (typed-nil []*TableCell\ stored in \interface{}). It asserts \wantLen: 0\ and is annotated to explain that in BigQuery REPEATED fields are always non-null — nil means empty, not null.
-
Element value assertions: After the per-row length loop, the test now extracts the flat \ListValues()\ array as *array.Int64\ and verifies row 0's elements are exactly [1, 2, 3]\ using \ValueOffsets\ to bound the slice.
listBuilder.Append(true) was called inside the per-element loop, opening one list slot per element instead of one per row. Move the call before the loop so exactly one slot is opened per row, and empty arrays emit a valid empty list entry instead of being silently skipped.
99afa55 to
0903f60
Compare
|
@goccy this PR fixes a panic in the Storage Read API when a table contains an |
Problem
Reading a table that contains an
ARRAYcolumn via the Storage Read API (ARROW format) causes a server panic:The row count mismatch is produced inside
array.RecordBuilder.NewRecordwhen the Arrow record is assembled.Closes #399
Root cause
AppendValueToARROWBuilderininternal/types/types.gohandles[]*TableCell(ARRAY columns) with this loop:Arrow's
ListBuilder.Append(true)opens one list slot (one logical row entry). Elements appended toValueBuilder()after it fill that slot. CallingAppendagain (for the next row) implicitly closes the previous slot and opens a new one.Placing the call inside the per-element loop creates N slots instead of 1 for an N-element array. A row with
[1,2,3]produced 3 list-builder slots instead of 1, making the list-field's row count (6) diverge from the other fields' row count (3).An empty array
[]was also broken: the loop body never executed, so no slot was opened at all, dropping the row silently.Fix
Move
listBuilder.Append(true)before the loop so exactly one slot is opened per row, regardless of the number of elements:Test
TestAppendValueToARROWBuilder_Listininternal/types/types_test.goexercises three rows — a 3-element array, an empty array, and a 1-element array — and verifies:NewRecorddoes not panic (the row-count mismatch is gone).