Skip to content

[BOUNTY] Add deterministic seed support to data_generator.py (#4)#29

Open
leo202000 wants to merge 2 commits into
thanhle74:mainfrom
leo202000:feat/data-generator-seed
Open

[BOUNTY] Add deterministic seed support to data_generator.py (#4)#29
leo202000 wants to merge 2 commits into
thanhle74:mainfrom
leo202000:feat/data-generator-seed

Conversation

@leo202000

Copy link
Copy Markdown

Summary

Makes DataGenerator fully deterministic by routing all randomness through the seeded random.Random instance, addressing bounty #4. The existing seed parameter now guarantees identical output across runs.

Changes

  • tools/data_generator.py:
    • Module-level helpers (random_phone, random_email, random_datetime, gaussian_random) now accept an optional rng argument so callers can inject a seeded generator.
    • DataGenerator passes self.random to every helper call site in generate_users, generate_orders, and generate_trades, closing the gap where email/phone/datetime/timestamp used the unseeded global random.
    • Added a DataGenerator class docstring documenting seed-based reproducibility.
  • tests/test_data_generator_seed.py: 11 unit tests verifying same-seed reproducibility (users/orders/trades), seed differentiation, cross-run stability, default seed, and helper-level rng injection.
  • diagnostic/build-23f043a7.logd + .json: required diagnostic bundle.

Testing

  • python3 tests/test_data_generator_seed.py -v -> 11 tests pass.
  • python3 build.py -> diagnostic bundle generated and committed (diagnostic/build-23f043a7.logd, 15044 bytes, DIAG magic).
  • Smoke test: DataGenerator(seed=42) produces byte-identical users across two instances; different seeds produce different data.

Checklist

  • Relevant modules affected by these changes build locally
  • Tests pass locally
  • Diagnostic build log is committed in this PR
  • Documentation has been updated, if applicable
  • Configuration or schema changes are documented, if applicable
  • No generated build artifacts are committed, except the required diagnostic build log
  • Changes are scoped to the PR purpose and avoid unrelated cleanup
  • Security, privacy, and error-handling implications have been considered

  • I would like to request that my diagnostic build log is removed before merging

Addresses bounty issue #4. Please let me know the process for claiming the $25 bounty once merged.

Make the existing seed parameter fully deterministic by routing all
randomness through the seeded random.Random instance. The module-level
helpers (random_phone, random_email, random_datetime, gaussian_random)
now accept an optional rng argument, and DataGenerator passes its
self.random to every call site so the same seed reproduces identical
users, orders, trades, ticks, and candles. Adds a DataGenerator
docstring and unit tests verifying reproducibility across runs, seed
differentiation, and helper-level rng injection.

Addresses bounty mannowell#4.
@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@leo202000, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 4 minutes and 13 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6ae9cdc2-0d80-4ed7-a2ad-650a129334ca

📥 Commits

Reviewing files that changed from the base of the PR and between 94e0fb0 and 1016565.

📒 Files selected for processing (4)
  • diagnostic/build-23f043a7.json
  • diagnostic/build-23f043a7.logd
  • tests/test_data_generator_seed.py
  • tools/data_generator.py
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant