Feat: Add GCS sink to bulk migration pipeline #3139

manitgupta · 2025-12-22T10:14:13Z

Changes

Adds a GCS sink to the bulk migration pipeline. When enabled, the template writes out the records read from the source as avro files to GCS. Files are organized in the structure of folder -> shard-level folder -> table-level folder -> .avro files.
Updated Terraform templates for bulk migration with the new parameter along with some missing ones.

Testing

Basic smoke testing complete.
Manual scaling testing complete.
ITs will be added in following PRs.

gemini-code-assist · 2025-12-22T10:14:31Z

Summary of Changes

Hello @manitgupta, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant enhancement to the bulk migration pipeline by providing an option to output migrated data directly to Google Cloud Storage in Avro format. This functionality allows users to archive or further process the source records outside of the primary Spanner migration, offering greater flexibility in data handling. The changes also ensure that the infrastructure-as-code definitions are aligned with these new pipeline capabilities.

Highlights

GCS Sink Integration: A new GCS sink has been added to the bulk migration pipeline, enabling the pipeline to write records read from the source as Avro files to a specified Google Cloud Storage directory.
Avro File Organization: The Avro files written to GCS are organized hierarchically, following a structure of 'folder -> shard-level folder -> table-level folder -> .avro files' for better management and retrieval.
Terraform Template Updates: The Terraform templates for both sharded and single-job bulk migration have been updated to include the new gcsOutputDirectory parameter, along with fetchSize and additional_pipeline_options.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

codecov · 2025-12-22T10:31:58Z

Codecov Report

❌ Patch coverage is 17.02128% with 39 lines in your changes missing coverage. Please review.
✅ Project coverage is 56.08%. Comparing base (b45e086) to head (e98e24a).
⚠️ Report is 22 commits behind head on main.

Files with missing lines	Patch %	Lines
...d/teleport/v2/templates/MigrateTableTransform.java	0.00%	34 Missing ⚠️
...e/cloud/teleport/v2/templates/AvroDestination.java	61.53%	3 Missing and 2 partials ⚠️

❌ Your patch check has failed because the patch coverage (17.02%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #3139      +/-   ##
============================================
+ Coverage     50.70%   56.08%   +5.38%     
+ Complexity     5070     1713    -3357     
============================================
  Files           974      474     -500     
  Lines         59872    26738   -33134     
  Branches       6536     2814    -3722     
============================================
- Hits          30360    14997   -15363     
+ Misses        27370    10847   -16523     
+ Partials       2142      894    -1248

Components	Coverage Δ
spanner-templates	`71.99% <17.02%> (+1.12%)`	⬆️
spanner-import-export	`∅ <ø> (∅)`
spanner-live-forward-migration	`80.03% <ø> (-0.03%)`	⬇️
spanner-live-reverse-replication	`77.74% <ø> (-0.01%)`	⬇️
spanner-bulk-migration	`88.10% <17.02%> (-0.25%)`	⬇️

Files with missing lines	Coverage Δ
...e/cloud/teleport/v2/templates/AvroDestination.java	`61.53% <61.53%> (ø)`
...d/teleport/v2/templates/MigrateTableTransform.java	`0.00% <0.00%> (ø)`

... and 537 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

rohitwali

thanks for the changes. could you also share links to success jobs during integration and scale tests?

rohitwali · 2025-12-30T10:35:33Z

...b-to-spanner/src/main/java/com/google/cloud/teleport/v2/templates/MigrateTableTransform.java

+            FileSystems.matchNewResource(options.getGcsOutputDirectory(), true)
+                .resolve(shardId, StandardResolveOptions.RESOLVE_DIRECTORY)


what are the pros/cons of choosing this over gcs_dir/table/shardId? are there other folder structures considered?

I don't think there is a large advantage of one over the other, and this change-able if we see most customers asking for the other.
gcs_dir/shardId/table felt natural to me since today we shard jobs by shards (if we have to) instead of by table.

...b-to-spanner/src/main/java/com/google/cloud/teleport/v2/templates/MigrateTableTransform.java

...ourcedb-to-spanner/src/main/java/com/google/cloud/teleport/v2/templates/AvroDestination.java

...b-to-spanner/src/main/java/com/google/cloud/teleport/v2/templates/MigrateTableTransform.java

rohitwali · 2025-12-30T11:07:37Z

...b-to-spanner/src/main/java/com/google/cloud/teleport/v2/templates/MigrateTableTransform.java

    PCollectionTuple rowsAndTables = input.apply("Read_rows", readerTransform.readTransform());
    PCollection<SourceRow> sourceRows = rowsAndTables.get(readerTransform.sourceRowTag());

+    if (options.getGcsOutputDirectory() != null && !options.getGcsOutputDirectory().isEmpty()) {


thinking out loud, should this be behind a enable data validation flag instead of GCS DIR flag?

I am open to it, although GCS sink functionality is a more general one, with a usage in validation. We can rename to something like gcsValidationOutputDirectory but I would advise against that.

manitgupta · 2025-12-30T11:29:46Z

could you also share links to success jobs during integration and scale tests?

Links and observations during testing are documented here: b/470879633

pull-request-size bot added the size/L label Dec 22, 2025

manitgupta added the improvement label Dec 22, 2025

manitgupta added 2 commits December 23, 2025 10:28

GCS sink in bulk migration

b4a8eee

Add metrics

4ffaf30

manitgupta force-pushed the bulk-gcs-sink branch from ea9335a to 4ffaf30 Compare December 23, 2025 06:07

manitgupta marked this pull request as ready for review December 29, 2025 06:59

manitgupta requested a review from a team as a code owner December 29, 2025 06:59

manitgupta requested review from bharadwaj-aditya, shreyakhajanchi and sm745052 and removed request for sm745052 December 29, 2025 06:59

manitgupta changed the title ~~[Draft] Feat: Add GCS sink to bulk migration pipeline~~ Feat: Add GCS sink to bulk migration pipeline Dec 29, 2025

rohitwali reviewed Dec 30, 2025

View reviewed changes

Address comments

3e4a96d

Spotless

e98e24a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat: Add GCS sink to bulk migration pipeline #3139

Feat: Add GCS sink to bulk migration pipeline #3139

Uh oh!

manitgupta commented Dec 22, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Dec 22, 2025

Uh oh!

codecov bot commented Dec 22, 2025 •

edited

Loading

Uh oh!

rohitwali left a comment

Uh oh!

rohitwali Dec 30, 2025

Uh oh!

manitgupta Dec 30, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rohitwali Dec 30, 2025

Uh oh!

manitgupta Dec 30, 2025

Uh oh!

manitgupta commented Dec 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		FileSystems.matchNewResource(options.getGcsOutputDirectory(), true)
		.resolve(shardId, StandardResolveOptions.RESOLVE_DIRECTORY)

Feat: Add GCS sink to bulk migration pipeline #3139

Are you sure you want to change the base?

Feat: Add GCS sink to bulk migration pipeline #3139

Uh oh!

Conversation

manitgupta commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Testing

Uh oh!

gemini-code-assist bot commented Dec 22, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

codecov bot commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

rohitwali left a comment

Choose a reason for hiding this comment

Uh oh!

rohitwali Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

manitgupta Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rohitwali Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

manitgupta Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

manitgupta commented Dec 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

manitgupta commented Dec 22, 2025 •

edited

Loading

codecov bot commented Dec 22, 2025 •

edited

Loading

manitgupta Dec 30, 2025 •

edited

Loading