Skip to content

fix: return bare Arrow IPC messages from the Storage Read API#494

Open
otegami wants to merge 1 commit into
goccy:mainfrom
otegami:fix/storage-read-arrow-bare-ipc
Open

fix: return bare Arrow IPC messages from the Storage Read API#494
otegami wants to merge 1 commit into
goccy:mainfrom
otegami:fix/storage-read-arrow-bare-ipc

Conversation

@otegami

@otegami otegami commented Jun 15, 2026

Copy link
Copy Markdown

Related

Closes #493

Summary

Each ARROW field was a full IPC stream; the v1 proto wants a single bare IPC message per field, so the official cloud.google.com/go/bigquery client read 0 rows. getSerializedARROWSchema / sendARROWRows now emit one bare message via ipc.GetSchemaPayload / ipc.GetRecordBatchPayload + Payload.WritePayload, instead of an ipc.Writer (which frames a whole stream + EOS rather than a single message).

TestStorageReadARROW had decoded the frames by hand and tolerated the old framing; it's fixed, and TestStorageReadARROWHighLevel is added as the real guard (reads via bigquery.Client + EnableStorageReadClient).

Verification

go test ./server/... ./test/...

The Storage Read API wrote a full Arrow IPC stream into the Arrow fields:
ReadSession.arrow_schema.serialized_schema carried [schema][empty batch][EOS]
and each ReadRowsResponse.arrow_record_batch.serialized_record_batch carried
[schema][record][EOS]. The v1 proto specifies a single IPC-encapsulated
message per field (serialized_schema is an Arrow schema,
serialized_record_batch is an Arrow RecordBatch), and the official
cloud.google.com/go/bigquery client relies on that: reassembling the fields
into a stream then yields two schemas and a mid-stream EOS, so RowIterator
reads zero rows.

Serialize just the message instead: ipc.GetSchemaPayload for the schema and
ipc.GetRecordBatchPayload for each record batch, written via
Payload.WritePayload (no ipc.Writer, which frames a stream and appends an EOS).

Update the low-level test's decoder to reassemble [schema][record-batch] the
way the client's ArrowRecordBatch.Read does, and add
TestStorageReadARROWHighLevel, which reads a table through bigquery.Client +
EnableStorageReadClient (the production read path) and fails unless the frames
are bare messages.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Storage Read API returns full Arrow IPC streams instead of bare messages

1 participant