Skip to content

fix(#399): ARRAY column causes panic in Storage Read API Arrow serialization#496

Open
lraigosov wants to merge 3 commits into
goccy:mainfrom
lraigosov:feat/fix-arrow-list-builder
Open

fix(#399): ARRAY column causes panic in Storage Read API Arrow serialization#496
lraigosov wants to merge 3 commits into
goccy:mainfrom
lraigosov:feat/fix-arrow-list-builder

Conversation

@lraigosov

Copy link
Copy Markdown

Problem

Reading a table that contains an ARRAY column via the Storage Read API (ARROW format) causes a server panic:

panic: arrow/array: field 2 has 6 rows. want=3

The row count mismatch is produced inside array.RecordBuilder.NewRecord when the Arrow record is assembled.

Closes #399

Root cause

AppendValueToARROWBuilder in internal/types/types.go handles []*TableCell (ARRAY columns) with this loop:

for _, vv := range v {
    listBuilder.Append(true)   // ← inside the loop: one slot per element
    vv.AppendValueToARROWBuilder(b)
}

Arrow's ListBuilder.Append(true) opens one list slot (one logical row entry). Elements appended to ValueBuilder() after it fill that slot. Calling Append again (for the next row) implicitly closes the previous slot and opens a new one.

Placing the call inside the per-element loop creates N slots instead of 1 for an N-element array. A row with [1,2,3] produced 3 list-builder slots instead of 1, making the list-field's row count (6) diverge from the other fields' row count (3).

An empty array [] was also broken: the loop body never executed, so no slot was opened at all, dropping the row silently.

Fix

Move listBuilder.Append(true) before the loop so exactly one slot is opened per row, regardless of the number of elements:

listBuilder.Append(true)        // ← one slot per row
b := listBuilder.ValueBuilder()
for _, vv := range v {
    vv.AppendValueToARROWBuilder(b)
}

Test

TestAppendValueToARROWBuilder_List in internal/types/types_test.go exercises three rows — a 3-element array, an empty array, and a 1-element array — and verifies:

  1. NewRecord does not panic (the row-count mismatch is gone).
  2. Each row's list length matches the inserted element count.
  3. Element values are preserved correctly.

Copilot AI review requested due to automatic review settings June 17, 2026 12:41

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR fixes how list values are appended to Arrow ListBuilder to prevent row-count mismatch panics, and adds a regression test to ensure the behavior stays correct.

Changes:

  • Fix list-slot creation by calling listBuilder.Append(true) once per row (not once per element).
  • Add a regression test covering multiple rows with varying list lengths.
  • Add explanatory comments documenting the Arrow ListBuilder slot semantics and the root cause of issue #399.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
internal/types/types.go Fixes list slot creation logic for Arrow ListBuilder by moving Append(true) outside the element loop.
internal/types/types_test.go Adds a regression test to catch row-count mismatch panics and validate per-row list lengths.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread internal/types/types.go Outdated
Comment on lines 198 to 208
// Append(true) opens one list slot for this row. All element appends to
// ValueBuilder() that follow belong to this slot. Calling Append again
// (for the next row) implicitly closes the current slot. Placing this
// call inside the element loop is the root cause of issue #399: it opens
// N slots instead of 1, causing a row-count mismatch panic in NewRecord.
listBuilder.Append(true)
b := listBuilder.ValueBuilder()
for _, vv := range v {
listBuilder.Append(true)
if err := vv.AppendValueToARROWBuilder(b); err != nil {
return err
}

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In BigQuery, a column of mode REPEATED is always non-null by design: the BigQuery API guarantees a REPEATED field is either empty or populated, never null. Calling \Append(true)\ (non-null) is therefore always correct here; \AppendNull()\ would produce an Arrow null list that has no representation in BigQuery's type system.

A typed-nil []*TableCell\ stored in the \interface{}\ field \V\ means empty REPEATED field — not null. Ranging over a nil slice yields 0 iterations, so the result is a valid, non-null, zero-length list slot — exactly what BigQuery expects.

I've updated the comment above \Append(true)\ in commit \35660fc\ to document this BigQuery invariant explicitly, and added a nil-slice test case (see the reply on the test comment).

Comment on lines +25 to +41
rows := []struct {
cells []*TableCell
wantLen int
}{
{
cells: []*TableCell{{V: "1"}, {V: "2"}, {V: "3"}},
wantLen: 3,
},
{
cells: []*TableCell{},
wantLen: 0,
},
{
cells: []*TableCell{{V: "4"}},
wantLen: 1,
},
}

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both gaps are addressed in commit \35660fc:

  1. Nil vs empty: Added a fourth row with \cells: nil\ (typed-nil []*TableCell\ stored in \interface{}). It asserts \wantLen: 0\ and is annotated to explain that in BigQuery REPEATED fields are always non-null — nil means empty, not null.

  2. Element value assertions: After the per-row length loop, the test now extracts the flat \ListValues()\ array as *array.Int64\ and verifies row 0's elements are exactly [1, 2, 3]\ using \ValueOffsets\ to bound the slice.

listBuilder.Append(true) was called inside the per-element loop,
opening one list slot per element instead of one per row. Move the
call before the loop so exactly one slot is opened per row, and empty
arrays emit a valid empty list entry instead of being silently skipped.
@lraigosov lraigosov force-pushed the feat/fix-arrow-list-builder branch from 99afa55 to 0903f60 Compare June 17, 2026 12:45
@lraigosov

Copy link
Copy Markdown
Author

@goccy this PR fixes a panic in the Storage Read API when a table contains an ARRAY column. The change is a one-line move in internal/types/types.go with a focused regression test. Ready for review when you have a moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Read storage API error selecting array field

2 participants