fix: resolve issues #470, #482, #483 and #493 by lraigosov · Pull Request #492 · goccy/bigquery-emulator

lraigosov · 2026-06-15T06:16:07Z

This PR addresses two regressions, a Storage Read API framing bug, and a REPEATED RECORD field ordering issue.

Fix streaming insert visibility (#470)

Streaming inserts (tabledata.insertAll) were resolving table names inconsistently, leading to empty results in unqualified queries when same-named tables existed across multiple projects. Updated the metadata handler to use fully qualified names (project.dataset.table) for storage and resolution.

Closes #470

Fix nested TIMESTAMP formatting (#482)

Timestamp fields inside STRUCT (RECORD) or ARRAY were returned as RFC3339 strings instead of integer microseconds. Updated recursive formatting logic in internal/types/types.go to ensure consistency.

Closes #482

Fix Storage Read API — bare Arrow IPC messages (#493)

The Storage Read API contract requires serialized_schema and serialized_record_batch to be bare Arrow IPC encapsulated messages, not complete IPC streams. The previous implementation called ipc.Writer.Close() which appends an EOS marker, producing [schema_msg][record_batch_msg][EOS] instead of the required bare messages.

Added splitIPCStream to extract individual IPC messages from a two-message stream.
getSerializedARROWSchema now returns only the schema message bytes.
sendARROWRows now returns only the record-batch message bytes.

Closes #493

Fix REPEATED RECORD field access (intermittent wrong-field values) (#483)

convertValueToCell previously iterated MapKeys() on the STRUCT value returned by googlesqlite, producing non-deterministic field assignment for multi-field structs. The updated implementation reads fields positionally from the []any slice that googlesqlite returns.

Closes #483

Copilot review — all feedback addressed

The following issues raised in automated review were resolved in commit 0160fc5:

formatCell nil guard — formatCell now guards against a nil field pointer before dereferencing, preventing a panic when a column has no schema entry.
*TableRow handling — *TableRow is handled alongside TableRow in the type switch so pointer values are not silently dropped during cell formatting.
terminalTableID for qualified IDs — terminalTableID correctly parses fully qualified IDs in project:dataset.table form, not just bare table names.
SetNamePath([]string{}) intent — SetNamePath([]string{}) in MetadataRepoMode is intentional: an empty name path targets the metadata catalog layer directly. Documented in code.

Test plan

New tests added by this PR:

Test	File	Covers
`TestTerminalTableID`	`server/handler_tableid_test.go`	Copilot finding — qualified ID parsing
`TestQueryProjectAndDataset`	`server/handler_tableid_test.go`	Copilot finding — project/dataset resolution
`TestNewTableWithSchemaCaseInsensitiveColumns`	`internal/types/types_test.go`	Copilot finding — case-insensitive column lookup
`TestStorageReadARROW`	`server/storage_test.go`	Bare Arrow IPC framing (#493)
`TestRepeatedRecordFieldOrder`	`server/server_test.go`	REPEATED RECORD field ordering (#483)

Fixes for #470 and #482 are covered by the existing integration test suite (TestFetchData, TestQuery, TestTable), which pass without regressions.

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR improves BigQuery table handling and row formatting by normalizing table references, making TIMESTAMP formatting recursive across nested/repeated schemas, and adjusting transaction naming/repo mode setup.

Changes:

Populate missing TableReference fields during table creation/insert handling.
Refactor row formatting to recursively format TIMESTAMP cells in repeated/record fields.
Adjust repository transaction mode usage and make SetNamePath construction conditional.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
server/handler.go	Fills missing `TableReference` components before computing table IDs / handling inserts.
internal/types/types.go	Introduces recursive `formatCell` to format nested/repeated TIMESTAMP values.
internal/contentdata/repository.go	Switches transaction mode calls from `ContentRepoMode` to `MetadataRepoMode`.
internal/connection/manager.go	Changes `SetNamePath` to build the path from non-empty project/dataset IDs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

lraigosov · 2026-06-17T02:40:19Z

+func formatCell(field *bigqueryv2.TableFieldSchema, cell *TableCell, useInt64Timestamp bool) *TableCell {
+	if cell.V == nil {
+		return cell
+	}
+
+	if field.Mode == "REPEATED" {


Fixed in commit 0160fc5. Added an early if field == nil { return cell } guard at the top of formatCell in internal/types/types.go before any dereference of field.Mode or field.Type.

lraigosov · 2026-06-17T02:40:28Z

+	if field.Type == "RECORD" {
+		if row, ok := cell.V.(TableRow); ok {
+			formattedF := make([]*TableCell, 0, len(row.F))
+			for i, c := range row.F {


Fixed in commit 0160fc5. The type switch in formatCell now handles both TableRow and *TableRow, dereferencing the pointer in the latter case so nested TIMESTAMP recursion applies consistently regardless of whether the value arrives as a value type or a pointer.

lraigosov · 2026-06-17T02:40:34Z

+	if r.table.TableReference == nil {
+		r.table.TableReference = &bigqueryv2.TableReference{}
+	}
+	if r.table.TableReference.ProjectId == "" {
+		r.table.TableReference.ProjectId = r.project.ID
+	}
+	if r.table.TableReference.DatasetId == "" {
+		r.table.TableReference.DatasetId = r.dataset.ID
+	}
+	if r.table.TableReference.TableId == "" {
+		r.table.TableReference.TableId = r.table.Id
+	}


Fixed in commit 0160fc5. Introduced terminalTableID in server/handler.go that parses fully qualified identifiers (project:dataset.table) and extracts only the terminal table name. All callers that previously used r.table.Id directly now go through this helper. Covered by TestTerminalTableID in server/handler_tableid_test.go.

lraigosov · 2026-06-17T02:40:49Z

+		namePath := []string{}
+		if t.conn.ProjectID != "" {
+			namePath = append(namePath, t.conn.ProjectID)
 		}
+		if t.conn.DatasetID != "" {
+			namePath = append(namePath, t.conn.DatasetID)
+		}
+		_ = gsqlConn.SetNamePath(namePath)


Fixed in commit 0160fc5. ContentRepoMode now guards against an empty ProjectID before constructing the name path, so SetNamePath is never called with an empty slice from that path. MetadataRepoMode's SetNamePath([]string{}) is intentional — the metadata layer operates on a nameless root catalog. A comment was added to make this invariant explicit.

tc-lraigoso · 2026-06-15T06:46:40Z

Applied a follow-up fix commit to address review feedback and pushed updates to this branch.

Summary of changes:

Guard against nil field access in recursive cell formatting.
Handle nested RECORD values for both TableRow and *TableRow.
Normalize fallback table ID extraction when table identifiers are qualified.
Keep content repository name path initialization strict and consistent.
Added focused regression tests for the updated behaviors.

Validation:

Ran targeted tests and full server package tests in WSL Ubuntu-24.04.

lraigosov · 2026-06-17T00:13:59Z

Added fix for #481 (ALTER TABLE ADD COLUMN — new columns invisible to subsequent queries) in the latest commit. Full test suite passes. PR description has been updated. Ready for review.

lraigosov · 2026-06-17T02:01:56Z

Note on reflect+unsafe usage in the ALTER TABLE fix

The updateZetaSQLCatalogForAlter function in server/handler.go accesses private fields of googlesqlite.Conn via reflect + unsafe to reach the internal *googlesql.SimpleCatalog and call AddColumn on the target SimpleTable. This is the most invasive part of the PR and worth explaining.

Why it is necessary

googlesqlite registers ALTER TABLE as a NoopStmtAction — it executes the DDL on SQLite but does not update its internal ZetaSQL catalog. There is no public API to:

retrieve the live SimpleCatalog from a Conn, or
add a column to an already-registered SimpleTable.

The only alternative would be to drop and re-register the entire table in the catalog on every ALTER, which risks race conditions and discards any in-flight state.

Protections in place

Both field lookups (catalog, catalog.catalog) use reflect.Value.IsValid() checks; the function returns a descriptive error rather than panicking if the struct layout changes.
The function is called only within an explicit WithGSQLConn callback, scoped to the active transaction.
The unit test TestDDLAlterTable exercises the full round-trip (ALTER → INSERT with new column → SELECT), so a silent breakage after a googlesqlite version bump would surface immediately.

Preferred long-term path

The cleanest resolution would be to open a PR in the goccy/go-sql-driver / googlesqlite repository to expose Conn.Catalog() *googlesql.SimpleCatalog or a Conn.AddColumnToTable(table, column) helper. Until that API exists, the reflect approach is the only option that does not require rewriting the DDL execution path entirely.

Happy to discuss or adjust the approach based on maintainer preference.

Storage Read API was returning full Arrow IPC streams (schema + record batch + EOS marker) instead of the bare messages expected by BigQuery. Split the stream to extract the schema message and the record batch message separately. Updated the test helper to reconstruct a full IPC stream per batch for decoding. Also adds a regression test for repeated RECORD field ordering to confirm that struct field values are read from the correct positional column after the fix in convertValueToCell.

lraigosov · 2026-06-17T02:40:59Z

Note: The two comments above about ALTER TABLE / reflect+unsafe (#481) were written before this work was split into separate PRs. That fix has since moved to PR #495. This PR (#492) covers only #470, #482, #483, and #493.

lraigosov added 2 commits June 15, 2026 00:56

fix: resolve unqualified query results for streaming inserts

27a407f

fix: nested timestamp format in record fields

3c1be50

Copilot AI review requested due to automatic review settings June 15, 2026 06:16

This was referenced Jun 15, 2026

v0.7.x: Data inserted via tabledata.insertAll is not visible to subsequent queries #470

Open

Nested RECORD TIMESTAMP fields returned as RFC3339 strings instead of epoch microseconds #482

Open

Copilot AI reviewed Jun 15, 2026

View reviewed changes

lraigosov added 2 commits June 15, 2026 01:19

chore: ignore local repro directory

e3d09e7

fix: address review feedback for table handling

0160fc5

lraigosov changed the title ~~fix: streaming insert visibility and nested timestamp formatting (#470, #482)~~ fix: streaming insert visibility, nested timestamp formatting, and ALTER TABLE ADD COLUMN Jun 17, 2026

lraigosov changed the title ~~fix: streaming insert visibility, nested timestamp formatting, and ALTER TABLE ADD COLUMN~~ fix: resolve issues #470, #481, #482, #483 and #493 Jun 17, 2026

This was referenced Jun 17, 2026

Alter table does not seem to work #481

Open

Repeated RECORD field access intermittently reads values from the wrong field #483

Open

Storage Read API returns full Arrow IPC streams instead of bare messages #493

Open

lraigosov force-pushed the feat/maintenance-and-fixes branch from 817932f to 99f1a50 Compare June 17, 2026 02:18

lraigosov changed the title ~~fix: resolve issues #470, #481, #482, #483 and #493~~ fix: resolve issues #470, #482, #483 and #493 Jun 17, 2026

lraigosov mentioned this pull request Jun 17, 2026

fix(#481): ALTER TABLE ADD COLUMN — new columns invisible to subsequent queries #495

Open

chore: exclude devlog.md from version control

dc507e4

lraigosov mentioned this pull request Jun 17, 2026

Storage API returns records byte array containing schema bytes #398

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: resolve issues #470, #482, #483 and #493#492

fix: resolve issues #470, #482, #483 and #493#492
lraigosov wants to merge 6 commits into
goccy:mainfrom
lraigosov:feat/maintenance-and-fixes

lraigosov commented Jun 15, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

lraigosov Jun 17, 2026

Uh oh!

lraigosov Jun 17, 2026

Uh oh!

lraigosov Jun 17, 2026

Uh oh!

lraigosov Jun 17, 2026

Uh oh!

tc-lraigoso commented Jun 15, 2026

Uh oh!

lraigosov commented Jun 17, 2026

Uh oh!

lraigosov commented Jun 17, 2026

Uh oh!

lraigosov commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

lraigosov commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fix streaming insert visibility (#470)

Fix nested TIMESTAMP formatting (#482)

Fix Storage Read API — bare Arrow IPC messages (#493)

Fix REPEATED RECORD field access (intermittent wrong-field values) (#483)

Copilot review — all feedback addressed

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

lraigosov Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

lraigosov Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

lraigosov Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

lraigosov Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

tc-lraigoso commented Jun 15, 2026

Uh oh!

lraigosov commented Jun 17, 2026

Uh oh!

lraigosov commented Jun 17, 2026

Note on reflect+unsafe usage in the ALTER TABLE fix

Uh oh!

lraigosov commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lraigosov commented Jun 15, 2026 •

edited

Loading