Migrate PlaceHolderDataIterator from tf to np by aireenmei · Pull Request #3149 · AI-Hypercomputer/maxtext

aireenmei · 2026-02-17T22:54:37Z

Description

This is part of the plan of removing TF dependency in MaxText.
Also fix a bug caused by expansion_factor_real_data is type float but some places expect int.

Tests

PlaceHolderDataIterator is only used when expansion_factor_real_data > 1, on the hosts that are NOT loading real data. The data produced by PlaceHolderDataIterator are all -1 and will be discarded.
Tested with the following script, set max_checkify=true to error out if training batch contains -1 batch produced by PlaceHolderDataIterator

 python3 /mnt/disks/pd2/xpk/xpk.py workload create \
  --project ${PROJECT} \
  --zone ${ZONE} \
  --cluster ${CLUSTER_NAME} \
  --workload ${RUN_NAME} \
  --base-docker-image ${BASE_IMAGE} \
  --tpu-type ${TPU_TYPE} \
  --num-slices 1 \
  --command "pip install --no-deps -e . && python3 -m MaxText.train src/maxtext/configs/base.yml \
  run_name=${RUN_NAME} base_output_directory=${OUTPUT_DIR} \
  dataset_type=grain dataset_path=gs://maxtext-dataset \
  grain_train_files=gs://maxtext-dataset/array-record/c4/en/3.0.1/c4-train* \
  expansion_factor_real_data=2 max_checkify=true \
  steps=10 enable_checkpointing=false"

cloud log

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

codecov · 2026-02-17T23:02:02Z

Codecov Report

❌ Patch coverage is 0% with 9 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...axtext/input_pipeline/synthetic_data_processing.py	0.00%	7 Missing ⚠️
...rc/maxtext/input_pipeline/grain_data_processing.py	0.00%	1 Missing ⚠️
...rc/maxtext/input_pipeline/multihost_dataloading.py	0.00%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

github-actions · 2026-02-17T23:02:11Z

🤖 Hi @aireenmei, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

github-actions

📋 Review Summary

This pull request effectively migrates the synthetic data generator from TensorFlow to NumPy, successfully removing a key dependency. The implementation is clean, straightforward, and achieves the stated goal.

🔍 General Feedback

The change is well-executed and improves the project by decoupling it from TensorFlow for this component.
The new implementation using a Python generator is a simple and efficient solution for providing placeholder data.

aireenmei marked this pull request as ready for review February 17, 2026 22:57

aireenmei requested review from A9isha, NicoGrande, NuojCheng, RissyRan, SurbhiJainUSC, bvandermoon, gagika, gobbleturk, hengtaoguo, jesselu-google, jiangjy1982, khatwanimohit, richjames0, shralex, suexu1025 and vipannalla as code owners February 17, 2026 22:57

aireenmei added the gemini-review label Feb 17, 2026

github-actions bot reviewed Feb 17, 2026

View reviewed changes

aireenmei changed the title ~~Migrate synthetic generator from tf to np~~ Migrate PlaceHolderDataIterator from tf to np Feb 17, 2026

Migrate synthetic generator from tf to np

c353668

aireenmei force-pushed the aireen/syn_np branch from 7e3e974 to c353668 Compare February 18, 2026 04:55

NuojCheng approved these changes Feb 18, 2026

View reviewed changes

NicoGrande approved these changes Feb 18, 2026

View reviewed changes

aireenmei added the pull ready label Feb 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate PlaceHolderDataIterator from tf to np#3149

Migrate PlaceHolderDataIterator from tf to np#3149
aireenmei wants to merge 1 commit intomainfrom
aireen/syn_np

aireenmei commented Feb 17, 2026 •

edited

Loading

Uh oh!

codecov bot commented Feb 17, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 17, 2026

Uh oh!

github-actions bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

aireenmei commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

codecov bot commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Feb 17, 2026

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

📋 Review Summary

🔍 General Feedback

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

aireenmei commented Feb 17, 2026 •

edited

Loading

codecov bot commented Feb 17, 2026 •

edited

Loading