Generate Synthetic CSV Data for "fraud_requests" and "notifications" tables from Database Schema

Table1 : **fraud_requests**
References: user_id → users

Table2 : **notifications**
References: user_id → users, type_id → notification_types, channel_id → notification_channels

**Objective:**
Create CSV files containing synthetic (mock) data for the above tables, following the provided schema structure. This is useful for testing, development, and demonstrations without using real/sensitive data.

Key Details
**Input:**
Database schema structure containing all tables names, their respective column names and data types.

Input file : **https://github.com/saayam-for-all/data/tree/main/database/Saayam_Table.column.names_data.xlsx**

And the same is available in the programmatically extractable way in the **https://github.com/saayam-for-all/data/tree/main/database/mock-data-generation/db_info.json**

Lookup table/Reference table file path : **https://github.com/saayam-for-all/data/tree/main/database/lookup_tables**

**Output:**
One CSV file per table with realistic synthetic data
Adheres to data types and constraints (string lengths, date formats, relationships)
Typically ~100 records per table (configurable)
Output File path : **https://github.com/saayam-for-all/data/tree/main/database/mock_db/file_name**

**Data Quality Requirements:**
String/Text fields: Plausible names, emails, addresses, etc.
Numeric fields: Reasonable ranges and distributions
Date/Time fields: Valid and relevant dates
Foreign keys: Respect relationships between tables (valid ID references)
Relationships between columns are maintained: ex: if there are state and city columns, the city column values are based off on the state values etc

**Implementation Steps:**
**Analyze Schema**:
Extract all table names, field names, data types provided in the xls sheet
Identify constraints (primary keys, foreign keys, unique constraints)

**Select Data Generation Tool**:
Explore different fake data generating libraries, hugging face, or via LLMs

**Develop Generation Scripts:**
Write code to generate CSVs matching your schema
Ensure correct field naming, ordering, and data types
Enforce referential integrity for foreign keys
Output :
The scripts should go in **data/tree/main/database/mock-data-generation**
Update README.md documenting how to run scripts and what each file represents

Store CSVs in database/mock_db (e.g., users.csv, orders.csv)
The csv files created with the mock data should be stored in **/data/tree/main/database/mock_db folder**

Quality Review & Commit
Validate CSV structure and completeness
Commit all scripts and generated files to repository

Acceptance Criteria:
✅ CSV files exist for all tables in the schema
✅ Each CSV contains at most 100 rows of realistic synthetic data which can be scalable to at least 40000 rows later
✅ Field types, formats, and relationships are respected
✅ Documentation (README) with reproduction instructions included
✅ Scripts are properly documented and reusable

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate Synthetic CSV Data for "fraud_requests" and "notifications" tables from Database Schema #121

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Generate Synthetic CSV Data for "fraud_requests" and "notifications" tables from Database Schema #121

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions