fix: resolve issues #470, #482, #483 and #493#492
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR improves BigQuery table handling and row formatting by normalizing table references, making TIMESTAMP formatting recursive across nested/repeated schemas, and adjusting transaction naming/repo mode setup.
Changes:
- Populate missing
TableReferencefields during table creation/insert handling. - Refactor row formatting to recursively format TIMESTAMP cells in repeated/record fields.
- Adjust repository transaction mode usage and make
SetNamePathconstruction conditional.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| server/handler.go | Fills missing TableReference components before computing table IDs / handling inserts. |
| internal/types/types.go | Introduces recursive formatCell to format nested/repeated TIMESTAMP values. |
| internal/contentdata/repository.go | Switches transaction mode calls from ContentRepoMode to MetadataRepoMode. |
| internal/connection/manager.go | Changes SetNamePath to build the path from non-empty project/dataset IDs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| func formatCell(field *bigqueryv2.TableFieldSchema, cell *TableCell, useInt64Timestamp bool) *TableCell { | ||
| if cell.V == nil { | ||
| return cell | ||
| } | ||
|
|
||
| if field.Mode == "REPEATED" { |
There was a problem hiding this comment.
Fixed in commit 0160fc5. Added an early if field == nil { return cell } guard at the top of formatCell in internal/types/types.go before any dereference of field.Mode or field.Type.
| if field.Type == "RECORD" { | ||
| if row, ok := cell.V.(TableRow); ok { | ||
| formattedF := make([]*TableCell, 0, len(row.F)) | ||
| for i, c := range row.F { |
There was a problem hiding this comment.
Fixed in commit 0160fc5. The type switch in formatCell now handles both TableRow and *TableRow, dereferencing the pointer in the latter case so nested TIMESTAMP recursion applies consistently regardless of whether the value arrives as a value type or a pointer.
| if r.table.TableReference == nil { | ||
| r.table.TableReference = &bigqueryv2.TableReference{} | ||
| } | ||
| if r.table.TableReference.ProjectId == "" { | ||
| r.table.TableReference.ProjectId = r.project.ID | ||
| } | ||
| if r.table.TableReference.DatasetId == "" { | ||
| r.table.TableReference.DatasetId = r.dataset.ID | ||
| } | ||
| if r.table.TableReference.TableId == "" { | ||
| r.table.TableReference.TableId = r.table.Id | ||
| } |
There was a problem hiding this comment.
Fixed in commit 0160fc5. Introduced terminalTableID in server/handler.go that parses fully qualified identifiers (project:dataset.table) and extracts only the terminal table name. All callers that previously used r.table.Id directly now go through this helper. Covered by TestTerminalTableID in server/handler_tableid_test.go.
| namePath := []string{} | ||
| if t.conn.ProjectID != "" { | ||
| namePath = append(namePath, t.conn.ProjectID) | ||
| } | ||
| if t.conn.DatasetID != "" { | ||
| namePath = append(namePath, t.conn.DatasetID) | ||
| } | ||
| _ = gsqlConn.SetNamePath(namePath) |
There was a problem hiding this comment.
Fixed in commit 0160fc5. ContentRepoMode now guards against an empty ProjectID before constructing the name path, so SetNamePath is never called with an empty slice from that path. MetadataRepoMode's SetNamePath([]string{}) is intentional — the metadata layer operates on a nameless root catalog. A comment was added to make this invariant explicit.
|
Applied a follow-up fix commit to address review feedback and pushed updates to this branch. Summary of changes:
Validation:
|
|
Added fix for #481 ( |
Note on reflect+unsafe usage in the ALTER TABLE fixThe Why it is necessary
The only alternative would be to drop and re-register the entire table in the catalog on every ALTER, which risks race conditions and discards any in-flight state. Protections in place
Preferred long-term path The cleanest resolution would be to open a PR in the Happy to discuss or adjust the approach based on maintainer preference. |
Storage Read API was returning full Arrow IPC streams (schema + record batch + EOS marker) instead of the bare messages expected by BigQuery. Split the stream to extract the schema message and the record batch message separately. Updated the test helper to reconstruct a full IPC stream per batch for decoding. Also adds a regression test for repeated RECORD field ordering to confirm that struct field values are read from the correct positional column after the fix in convertValueToCell.
817932f to
99f1a50
Compare
This PR addresses two regressions, a Storage Read API framing bug, and a REPEATED RECORD field ordering issue.
Fix streaming insert visibility (#470)
Streaming inserts (
tabledata.insertAll) were resolving table names inconsistently, leading to empty results in unqualified queries when same-named tables existed across multiple projects. Updated the metadata handler to use fully qualified names (project.dataset.table) for storage and resolution.Closes #470
Fix nested TIMESTAMP formatting (#482)
Timestamp fields inside
STRUCT(RECORD) orARRAYwere returned as RFC3339 strings instead of integer microseconds. Updated recursive formatting logic ininternal/types/types.goto ensure consistency.Closes #482
Fix Storage Read API — bare Arrow IPC messages (#493)
The Storage Read API contract requires
serialized_schemaandserialized_record_batchto be bare Arrow IPC encapsulated messages, not complete IPC streams. The previous implementation calledipc.Writer.Close()which appends an EOS marker, producing[schema_msg][record_batch_msg][EOS]instead of the required bare messages.splitIPCStreamto extract individual IPC messages from a two-message stream.getSerializedARROWSchemanow returns only the schema message bytes.sendARROWRowsnow returns only the record-batch message bytes.Closes #493
Fix REPEATED RECORD field access (intermittent wrong-field values) (#483)
convertValueToCellpreviously iteratedMapKeys()on the STRUCT value returned by googlesqlite, producing non-deterministic field assignment for multi-field structs. The updated implementation reads fields positionally from the[]anyslice that googlesqlite returns.Closes #483
Copilot review — all feedback addressed
The following issues raised in automated review were resolved in commit
0160fc5:formatCellnil guard —formatCellnow guards against a nilfieldpointer before dereferencing, preventing a panic when a column has no schema entry.*TableRowhandling —*TableRowis handled alongsideTableRowin the type switch so pointer values are not silently dropped during cell formatting.terminalTableIDfor qualified IDs —terminalTableIDcorrectly parses fully qualified IDs inproject:dataset.tableform, not just bare table names.SetNamePath([]string{})intent —SetNamePath([]string{})inMetadataRepoModeis intentional: an empty name path targets the metadata catalog layer directly. Documented in code.Test plan
New tests added by this PR:
TestTerminalTableIDserver/handler_tableid_test.goTestQueryProjectAndDatasetserver/handler_tableid_test.goTestNewTableWithSchemaCaseInsensitiveColumnsinternal/types/types_test.goTestStorageReadARROWserver/storage_test.goTestRepeatedRecordFieldOrderserver/server_test.goFixes for #470 and #482 are covered by the existing integration test suite (
TestFetchData,TestQuery,TestTable), which pass without regressions.