fix: return bare Arrow IPC messages from the Storage Read API#494
Open
otegami wants to merge 1 commit into
Open
fix: return bare Arrow IPC messages from the Storage Read API#494otegami wants to merge 1 commit into
otegami wants to merge 1 commit into
Conversation
The Storage Read API wrote a full Arrow IPC stream into the Arrow fields: ReadSession.arrow_schema.serialized_schema carried [schema][empty batch][EOS] and each ReadRowsResponse.arrow_record_batch.serialized_record_batch carried [schema][record][EOS]. The v1 proto specifies a single IPC-encapsulated message per field (serialized_schema is an Arrow schema, serialized_record_batch is an Arrow RecordBatch), and the official cloud.google.com/go/bigquery client relies on that: reassembling the fields into a stream then yields two schemas and a mid-stream EOS, so RowIterator reads zero rows. Serialize just the message instead: ipc.GetSchemaPayload for the schema and ipc.GetRecordBatchPayload for each record batch, written via Payload.WritePayload (no ipc.Writer, which frames a stream and appends an EOS). Update the low-level test's decoder to reassemble [schema][record-batch] the way the client's ArrowRecordBatch.Read does, and add TestStorageReadARROWHighLevel, which reads a table through bigquery.Client + EnableStorageReadClient (the production read path) and fails unless the frames are bare messages.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Related
Closes #493
Summary
Each ARROW field was a full IPC stream; the v1 proto wants a single bare IPC message per field, so the official
cloud.google.com/go/bigqueryclient read 0 rows.getSerializedARROWSchema/sendARROWRowsnow emit one bare message viaipc.GetSchemaPayload/ipc.GetRecordBatchPayload+Payload.WritePayload, instead of anipc.Writer(which frames a whole stream + EOS rather than a single message).TestStorageReadARROWhad decoded the frames by hand and tolerated the old framing; it's fixed, andTestStorageReadARROWHighLevelis added as the real guard (reads viabigquery.Client+EnableStorageReadClient).Verification
go test ./server/... ./test/...