[BOUNTY] Add deterministic seed support to data_generator.py (#4)#29
[BOUNTY] Add deterministic seed support to data_generator.py (#4)#29leo202000 wants to merge 2 commits into
Conversation
Make the existing seed parameter fully deterministic by routing all randomness through the seeded random.Random instance. The module-level helpers (random_phone, random_email, random_datetime, gaussian_random) now accept an optional rng argument, and DataGenerator passes its self.random to every call site so the same seed reproduces identical users, orders, trades, ticks, and candles. Adds a DataGenerator docstring and unit tests verifying reproducibility across runs, seed differentiation, and helper-level rng injection. Addresses bounty mannowell#4.
|
Warning Review limit reached
More reviews will be available in 4 minutes and 13 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary
Makes
DataGeneratorfully deterministic by routing all randomness through the seededrandom.Randominstance, addressing bounty #4. The existingseedparameter now guarantees identical output across runs.Changes
tools/data_generator.py:random_phone,random_email,random_datetime,gaussian_random) now accept an optionalrngargument so callers can inject a seeded generator.DataGeneratorpassesself.randomto every helper call site ingenerate_users,generate_orders, andgenerate_trades, closing the gap where email/phone/datetime/timestamp used the unseeded globalrandom.DataGeneratorclass docstring documenting seed-based reproducibility.tests/test_data_generator_seed.py: 11 unit tests verifying same-seed reproducibility (users/orders/trades), seed differentiation, cross-run stability, default seed, and helper-levelrnginjection.diagnostic/build-23f043a7.logd+.json: required diagnostic bundle.Testing
python3 tests/test_data_generator_seed.py -v-> 11 tests pass.python3 build.py-> diagnostic bundle generated and committed (diagnostic/build-23f043a7.logd, 15044 bytes,DIAGmagic).DataGenerator(seed=42)produces byte-identical users across two instances; different seeds produce different data.Checklist
Addresses bounty issue #4. Please let me know the process for claiming the $25 bounty once merged.