Skip to content

fix: resolve issues #470, #482, #483 and #493#492

Open
lraigosov wants to merge 6 commits into
goccy:mainfrom
lraigosov:feat/maintenance-and-fixes
Open

fix: resolve issues #470, #482, #483 and #493#492
lraigosov wants to merge 6 commits into
goccy:mainfrom
lraigosov:feat/maintenance-and-fixes

Conversation

@lraigosov

@lraigosov lraigosov commented Jun 15, 2026

Copy link
Copy Markdown

This PR addresses two regressions, a Storage Read API framing bug, and a REPEATED RECORD field ordering issue.

Fix streaming insert visibility (#470)

Streaming inserts (tabledata.insertAll) were resolving table names inconsistently, leading to empty results in unqualified queries when same-named tables existed across multiple projects. Updated the metadata handler to use fully qualified names (project.dataset.table) for storage and resolution.

Closes #470

Fix nested TIMESTAMP formatting (#482)

Timestamp fields inside STRUCT (RECORD) or ARRAY were returned as RFC3339 strings instead of integer microseconds. Updated recursive formatting logic in internal/types/types.go to ensure consistency.

Closes #482

Fix Storage Read API — bare Arrow IPC messages (#493)

The Storage Read API contract requires serialized_schema and serialized_record_batch to be bare Arrow IPC encapsulated messages, not complete IPC streams. The previous implementation called ipc.Writer.Close() which appends an EOS marker, producing [schema_msg][record_batch_msg][EOS] instead of the required bare messages.

  • Added splitIPCStream to extract individual IPC messages from a two-message stream.
  • getSerializedARROWSchema now returns only the schema message bytes.
  • sendARROWRows now returns only the record-batch message bytes.

Closes #493

Fix REPEATED RECORD field access (intermittent wrong-field values) (#483)

convertValueToCell previously iterated MapKeys() on the STRUCT value returned by googlesqlite, producing non-deterministic field assignment for multi-field structs. The updated implementation reads fields positionally from the []any slice that googlesqlite returns.

Closes #483

Copilot review — all feedback addressed

The following issues raised in automated review were resolved in commit 0160fc5:

  • formatCell nil guardformatCell now guards against a nil field pointer before dereferencing, preventing a panic when a column has no schema entry.
  • *TableRow handling*TableRow is handled alongside TableRow in the type switch so pointer values are not silently dropped during cell formatting.
  • terminalTableID for qualified IDsterminalTableID correctly parses fully qualified IDs in project:dataset.table form, not just bare table names.
  • SetNamePath([]string{}) intentSetNamePath([]string{}) in MetadataRepoMode is intentional: an empty name path targets the metadata catalog layer directly. Documented in code.

Test plan

New tests added by this PR:

Test File Covers
TestTerminalTableID server/handler_tableid_test.go Copilot finding — qualified ID parsing
TestQueryProjectAndDataset server/handler_tableid_test.go Copilot finding — project/dataset resolution
TestNewTableWithSchemaCaseInsensitiveColumns internal/types/types_test.go Copilot finding — case-insensitive column lookup
TestStorageReadARROW server/storage_test.go Bare Arrow IPC framing (#493)
TestRepeatedRecordFieldOrder server/server_test.go REPEATED RECORD field ordering (#483)

Fixes for #470 and #482 are covered by the existing integration test suite (TestFetchData, TestQuery, TestTable), which pass without regressions.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR improves BigQuery table handling and row formatting by normalizing table references, making TIMESTAMP formatting recursive across nested/repeated schemas, and adjusting transaction naming/repo mode setup.

Changes:

  • Populate missing TableReference fields during table creation/insert handling.
  • Refactor row formatting to recursively format TIMESTAMP cells in repeated/record fields.
  • Adjust repository transaction mode usage and make SetNamePath construction conditional.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
server/handler.go Fills missing TableReference components before computing table IDs / handling inserts.
internal/types/types.go Introduces recursive formatCell to format nested/repeated TIMESTAMP values.
internal/contentdata/repository.go Switches transaction mode calls from ContentRepoMode to MetadataRepoMode.
internal/connection/manager.go Changes SetNamePath to build the path from non-empty project/dataset IDs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread internal/types/types.go Outdated
Comment on lines +238 to +243
func formatCell(field *bigqueryv2.TableFieldSchema, cell *TableCell, useInt64Timestamp bool) *TableCell {
if cell.V == nil {
return cell
}

if field.Mode == "REPEATED" {

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 0160fc5. Added an early if field == nil { return cell } guard at the top of formatCell in internal/types/types.go before any dereference of field.Mode or field.Type.

Comment thread internal/types/types.go Outdated
Comment on lines +259 to +262
if field.Type == "RECORD" {
if row, ok := cell.V.(TableRow); ok {
formattedF := make([]*TableCell, 0, len(row.F))
for i, c := range row.F {

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 0160fc5. The type switch in formatCell now handles both TableRow and *TableRow, dereferencing the pointer in the latter case so nested TIMESTAMP recursion applies consistently regardless of whether the value arrives as a value type or a pointer.

Comment thread server/handler.go Outdated
Comment on lines +3158 to +3169
if r.table.TableReference == nil {
r.table.TableReference = &bigqueryv2.TableReference{}
}
if r.table.TableReference.ProjectId == "" {
r.table.TableReference.ProjectId = r.project.ID
}
if r.table.TableReference.DatasetId == "" {
r.table.TableReference.DatasetId = r.dataset.ID
}
if r.table.TableReference.TableId == "" {
r.table.TableReference.TableId = r.table.Id
}

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 0160fc5. Introduced terminalTableID in server/handler.go that parses fully qualified identifiers (project:dataset.table) and extracts only the terminal table name. All callers that previously used r.table.Id directly now go through this helper. Covered by TestTerminalTableID in server/handler_tableid_test.go.

Comment thread internal/connection/manager.go Outdated
Comment on lines +86 to +93
namePath := []string{}
if t.conn.ProjectID != "" {
namePath = append(namePath, t.conn.ProjectID)
}
if t.conn.DatasetID != "" {
namePath = append(namePath, t.conn.DatasetID)
}
_ = gsqlConn.SetNamePath(namePath)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 0160fc5. ContentRepoMode now guards against an empty ProjectID before constructing the name path, so SetNamePath is never called with an empty slice from that path. MetadataRepoMode's SetNamePath([]string{}) is intentional — the metadata layer operates on a nameless root catalog. A comment was added to make this invariant explicit.

@tc-lraigoso

Copy link
Copy Markdown

Applied a follow-up fix commit to address review feedback and pushed updates to this branch.

Summary of changes:

  • Guard against nil field access in recursive cell formatting.
  • Handle nested RECORD values for both TableRow and *TableRow.
  • Normalize fallback table ID extraction when table identifiers are qualified.
  • Keep content repository name path initialization strict and consistent.
  • Added focused regression tests for the updated behaviors.

Validation:

  • Ran targeted tests and full server package tests in WSL Ubuntu-24.04.

@lraigosov lraigosov changed the title fix: streaming insert visibility and nested timestamp formatting (#470, #482) fix: streaming insert visibility, nested timestamp formatting, and ALTER TABLE ADD COLUMN Jun 17, 2026
@lraigosov

Copy link
Copy Markdown
Author

Added fix for #481 (ALTER TABLE ADD COLUMN — new columns invisible to subsequent queries) in the latest commit. Full test suite passes. PR description has been updated. Ready for review.

@lraigosov lraigosov changed the title fix: streaming insert visibility, nested timestamp formatting, and ALTER TABLE ADD COLUMN fix: resolve issues #470, #481, #482, #483 and #493 Jun 17, 2026
@lraigosov

Copy link
Copy Markdown
Author

Note on reflect+unsafe usage in the ALTER TABLE fix

The updateZetaSQLCatalogForAlter function in server/handler.go accesses private fields of googlesqlite.Conn via reflect + unsafe to reach the internal *googlesql.SimpleCatalog and call AddColumn on the target SimpleTable. This is the most invasive part of the PR and worth explaining.

Why it is necessary

googlesqlite registers ALTER TABLE as a NoopStmtAction — it executes the DDL on SQLite but does not update its internal ZetaSQL catalog. There is no public API to:

  • retrieve the live SimpleCatalog from a Conn, or
  • add a column to an already-registered SimpleTable.

The only alternative would be to drop and re-register the entire table in the catalog on every ALTER, which risks race conditions and discards any in-flight state.

Protections in place

  • Both field lookups (catalog, catalog.catalog) use reflect.Value.IsValid() checks; the function returns a descriptive error rather than panicking if the struct layout changes.
  • The function is called only within an explicit WithGSQLConn callback, scoped to the active transaction.
  • The unit test TestDDLAlterTable exercises the full round-trip (ALTER → INSERT with new column → SELECT), so a silent breakage after a googlesqlite version bump would surface immediately.

Preferred long-term path

The cleanest resolution would be to open a PR in the goccy/go-sql-driver / googlesqlite repository to expose Conn.Catalog() *googlesql.SimpleCatalog or a Conn.AddColumnToTable(table, column) helper. Until that API exists, the reflect approach is the only option that does not require rewriting the DDL execution path entirely.

Happy to discuss or adjust the approach based on maintainer preference.

Storage Read API was returning full Arrow IPC streams (schema + record
batch + EOS marker) instead of the bare messages expected by BigQuery.
Split the stream to extract the schema message and the record batch
message separately. Updated the test helper to reconstruct a full IPC
stream per batch for decoding.

Also adds a regression test for repeated RECORD field ordering to
confirm that struct field values are read from the correct positional
column after the fix in convertValueToCell.
@lraigosov lraigosov force-pushed the feat/maintenance-and-fixes branch from 817932f to 99f1a50 Compare June 17, 2026 02:18
@lraigosov lraigosov changed the title fix: resolve issues #470, #481, #482, #483 and #493 fix: resolve issues #470, #482, #483 and #493 Jun 17, 2026
@lraigosov

Copy link
Copy Markdown
Author

Note: The two comments above about ALTER TABLE / reflect+unsafe (#481) were written before this work was split into separate PRs. That fix has since moved to PR #495. This PR (#492) covers only #470, #482, #483, and #493.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment