Skip to content

Fix: return HTTP 409 for duplicate datasets and jobs#440

Open
shollands-sc wants to merge 1 commit into
goccy:mainfrom
shollands-sc:fix/duplicate-dataset-job-409
Open

Fix: return HTTP 409 for duplicate datasets and jobs#440
shollands-sc wants to merge 1 commit into
goccy:mainfrom
shollands-sc:fix/duplicate-dataset-job-409

Conversation

@shollands-sc

Copy link
Copy Markdown

Summary

When creating a dataset or job that already exists, the emulator returns HTTP 500 (InternalServerError) instead of the correct HTTP 409 (Conflict). This causes two problems for BigQuery client libraries:

  1. Retry storm: The Python client treats 500 as a transient error and retries with exponential backoff for up to 600 seconds (~10 minutes per affected call)
  2. exists_ok=True bypass: The Python client's exists_ok parameter only suppresses 409 errors, so the 500 bypasses it entirely

Approach

This follows the existing ErrDuplicatedTable pattern already in the codebase (internal/metadata/dataset.go + server/handler.go:2609):

  • Add ErrDuplicatedDataset and ErrDuplicatedJob sentinel errors in internal/metadata/project.go
  • Wrap with %w in AddDataset / AddJob for errors.Is support
  • Check errors.Is in ServeHTTP and use errDuplicate() for the HTTP response

Addressing feedback from #184

Per @goccy's review comments on #184:

  • "Please use errDuplicate function in error.go" — Done. We use the existing errDuplicate() in the handlers, consistent with how ErrDuplicatedTable is already handled
  • "It does not seem necessary to change to *ServerError type" — Done. Handle() method signatures are unchanged; the errors.Is check happens in ServeHTTP

Changes

File Change
internal/metadata/project.go Add ErrDuplicatedDataset, ErrDuplicatedJob sentinels; wrap with %w
server/handler.go datasetsInsertHandler.ServeHTTP and jobsInsertHandler.ServeHTTP check errors.IserrDuplicate()
server/server_test.go Add TestDuplicateDataset (mirrors existing TestDuplicateTable)

Fixes #256
Supersedes #184

Test plan

  • TestDuplicateDataset — verifies HTTP 409 on duplicate dataset creation
  • TestDuplicateTable — existing test still passes (no regression)
  • TestDataset — existing test still passes (no regression)
  • go build ./... — compiles cleanly

When creating a dataset or job that already exists, the emulator returns
HTTP 500 (InternalServerError) instead of HTTP 409 (Conflict). This
causes the BigQuery Python client to retry with exponential backoff for
up to 600 seconds, since it treats 500 as transient. The exists_ok=True
parameter also fails to suppress the error because it only checks for 409.

This fix follows the existing ErrDuplicatedTable pattern already in the
codebase: sentinel errors in the metadata package, checked with
errors.Is in ServeHTTP, mapped to errDuplicate() for the HTTP response.

Handle() method signatures are unchanged, addressing the feedback on goccy#184.

Fixes goccy#256
Supersedes goccy#184

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@shollands-sc shollands-sc marked this pull request as draft March 10, 2026 21:49
@shollands-sc shollands-sc marked this pull request as ready for review March 12, 2026 21:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

return 409 on dataset duplicate

1 participant