What happened?
Reading a table over the Storage Read API (Arrow) with the official cloud.google.com/go/bigquery client returns no rows. With the Storage Read client enabled, RowIterator reads zero rows, and on the current main it panics in RowIterator.Next (index out of range [0] with length 0).
The cause is the Arrow framing: each field should be a single bare IPC message, but the emulator writes a whole IPC stream into it.
serialized_schema: [schema][empty record batch][EOS] — should be just a schema message.
serialized_record_batch: [schema][record][EOS] — should be just a record-batch message.
The client reads the schema field first, consumes its empty batch (zero rows), and stops at the EOS before the real data — so RowIterator.Next panics.
Low-level access (CreateReadSession / ReadRows decoded by hand) tolerates this, which is why it goes unnoticed.
What did you expect to happen?
Per the Storage v1 proto (google/cloud/bigquery/storage/v1/arrow.proto), serialized_schema is an "IPC serialized Arrow schema" and serialized_record_batch is an "IPC-serialized Arrow RecordBatch" — i.e. each field is a single Arrow IPC-encapsulated message (no leading schema, no end-of-stream marker), as real BigQuery sends them. With that framing the official client's RowIterator returns the table rows.
Why a single message, and not an IPC stream
Arrow IPC has two layers: a single encapsulated message (one schema, or one record batch), and a stream — the sequence <schema><record batch>…<EOS optional> built from those messages.
These proto fields each carry one encapsulated message. The consumer builds the stream itself by concatenating serialized_schema + each serialized_record_batch — that concatenation is the stream.
The emulator instead writes a whole stream into every field (serialized_schema = [schema][empty batch][EOS], each serialized_record_batch = [schema][record][EOS]). Concatenating those gives two schemas and a mid-stream EOS — not a valid stream.
Note on dictionaries: a dictionary-encoded column requires DictionaryBatch message(s) before the record batch. The emulator emits plain (non-dictionary) batches, so a single record-batch message is complete here; dictionary support would be a separate change.
How can we reproduce it (as minimally and precisely as possible)?
A minimal failing test reproduces it in two steps.
- Save this as
server/storage_read_repro_test.go:
storage_read_repro_test.go
package server_test
import (
"context"
"testing"
"cloud.google.com/go/bigquery"
"github.com/goccy/bigquery-emulator/server"
"github.com/goccy/bigquery-emulator/types"
"google.golang.org/api/iterator"
"google.golang.org/api/option"
)
func TestStorageReadRepro(t *testing.T) {
ctx := context.Background()
// Emulator holding one table: dataset1.users = {id: 42, name: "alice"}.
emulator, err := server.New(server.TempStorage)
if err != nil {
t.Fatal(err)
}
source := server.StructSource(
types.NewProject("test",
types.NewDataset("dataset1",
types.NewTable("users",
[]*types.Column{
types.NewColumn("id", types.INT64),
types.NewColumn("name", types.STRING),
},
types.Data{
{"id": 42, "name": "alice"},
},
),
),
),
)
if err := emulator.Load(source); err != nil {
t.Fatal(err)
}
testServer := emulator.TestServer() // in-process REST + gRPC, no TLS
defer testServer.Close()
// Official client with the Storage Read API enabled.
client, err := bigquery.NewClient(ctx, "test",
option.WithEndpoint(testServer.URL), option.WithoutAuthentication())
if err != nil {
t.Fatal(err)
}
defer client.Close()
grpcOptions, err := testServer.GRPCClientOptions(ctx)
if err != nil {
t.Fatal(err)
}
if err := client.EnableStorageReadClient(ctx, grpcOptions...); err != nil {
t.Fatal(err)
}
// Read the table over the Storage Read API.
rowIterator := client.Dataset("dataset1").Table("users").Read(ctx)
var rows []map[string]bigquery.Value
for {
row := map[string]bigquery.Value{}
err := rowIterator.Next(&row)
if err == iterator.Done {
break
}
if err != nil {
t.Fatalf("read failed: %v", err)
}
rows = append(rows, row)
}
// Expect exactly the row we stored to come back.
if len(rows) != 1 {
t.Fatalf("want 1 row, got %d: %v", len(rows), rows)
}
gotID := rows[0]["id"]
gotName := rows[0]["name"]
if gotID != int64(42) || gotName != "alice" {
t.Fatalf("want {id: 42, name: alice}, got {id: %v, name: %v}", gotID, gotName)
}
}
- Run it:
go test ./server/ -run TestStorageReadRepro -count=1 -v
Expected
PASS — the test reads {id: 42, name: "alice"} back.
Actual (on current main)
FAIL — the official client decodes the non-conformant framing into zero rows, then RowIterator.Next panics:
=== RUN TestStorageReadRepro
--- FAIL: TestStorageReadRepro (0.12s)
panic: runtime error: index out of range [0] with length 0 [recovered, repanicked]
...
cloud.google.com/go/bigquery.(*RowIterator).Next(...)
cloud.google.com/go/bigquery@v1.77.0/iterator.go:177
github.com/goccy/bigquery-emulator/server_test.TestStorageReadRepro(...)
server/storage_read_repro_test.go
FAIL github.com/goccy/bigquery-emulator/server
Anything else we need to know?
References:
What happened?
Reading a table over the Storage Read API (Arrow) with the official
cloud.google.com/go/bigqueryclient returns no rows. With the Storage Read client enabled,RowIteratorreads zero rows, and on the currentmainit panics inRowIterator.Next(index out of range [0] with length 0).The cause is the Arrow framing: each field should be a single bare IPC message, but the emulator writes a whole IPC stream into it.
serialized_schema:[schema][empty record batch][EOS]— should be just a schema message.serialized_record_batch:[schema][record][EOS]— should be just a record-batch message.The client reads the schema field first, consumes its empty batch (zero rows), and stops at the EOS before the real data — so
RowIterator.Nextpanics.Low-level access (
CreateReadSession/ReadRowsdecoded by hand) tolerates this, which is why it goes unnoticed.What did you expect to happen?
Per the Storage v1 proto (
google/cloud/bigquery/storage/v1/arrow.proto),serialized_schemais an "IPC serialized Arrow schema" andserialized_record_batchis an "IPC-serialized Arrow RecordBatch" — i.e. each field is a single Arrow IPC-encapsulated message (no leading schema, no end-of-stream marker), as real BigQuery sends them. With that framing the official client'sRowIteratorreturns the table rows.Why a single message, and not an IPC stream
Arrow IPC has two layers: a single encapsulated message (one schema, or one record batch), and a stream — the sequence
<schema><record batch>…<EOS optional>built from those messages.These proto fields each carry one encapsulated message. The consumer builds the stream itself by concatenating
serialized_schema+ eachserialized_record_batch— that concatenation is the stream.The emulator instead writes a whole stream into every field (
serialized_schema=[schema][empty batch][EOS], eachserialized_record_batch=[schema][record][EOS]). Concatenating those gives two schemas and a mid-stream EOS — not a valid stream.Note on dictionaries: a dictionary-encoded column requires
DictionaryBatchmessage(s) before the record batch. The emulator emits plain (non-dictionary) batches, so a single record-batch message is complete here; dictionary support would be a separate change.How can we reproduce it (as minimally and precisely as possible)?
A minimal failing test reproduces it in two steps.
server/storage_read_repro_test.go:storage_read_repro_test.go
Expected
PASS — the test reads
{id: 42, name: "alice"}back.Actual (on current
main)FAIL — the official client decodes the non-conformant framing into zero rows, then
RowIterator.Nextpanics:Anything else we need to know?
References: