Skip to content

#119 Add synthetic Data generator for request_comments and volunteer_rating#133

Merged
Nishu2000-hub merged 2 commits intomainfrom
Nishu2000-hub_feature/synthetic-data-119
Apr 17, 2026
Merged

#119 Add synthetic Data generator for request_comments and volunteer_rating#133
Nishu2000-hub merged 2 commits intomainfrom
Nishu2000-hub_feature/synthetic-data-119

Conversation

@Nishu2000-hub
Copy link
Copy Markdown
Contributor

Synthetic data generator for two tables: request_comments and volunteer_rating.

What's included

  • database/mock-data-generation/generate_119.py — the generator script
  • database/mock-data-generation/README.md — documentation
  • database/mock_db/request_comments.csv — 100 rows
  • database/mock_db/volunteer_rating.csv — 100 rows

How to run

python generate_119.py              # 100 rows (default)
python generate_119.py --rows 500   # custom count
python generate_119.py --rows 40000 # scale test

# Once users.csv and request.csv exist from other teams:
python generate_119.py \
  --users-csv database/mock_db/users.csv \
  --requests-csv database/mock_db/request.csv

@Nishu2000-hub
Copy link
Copy Markdown
Contributor Author

After submission, discovered the official help_categories table schema
at: github.com/saayam-for-all/request/wiki/1.0-MVP-Help-Categories
So, Follow up can be :
-Update generate_119.py to use official cat_ids from help_categories table

  • Extend to generate req_add_info synthetic data using the
    req_add_info_metadata field definitions

- Added 3-tier text generation: compositional grammar, stochastic perturbation, and diversity validator for ML-training-grade diversity
- Added standalone usage mode (--rows flag)
- Regenerated request_comments.csv and volunteer_rating.csv with higher-quality diverse data

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Nishu2000-hub Nishu2000-hub merged commit 93ac4c6 into main Apr 17, 2026
Nishu2000-hub added a commit that referenced this pull request Apr 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant